Memory Pollution in LLMs: Understanding New AI Security Concerns

Introduction

This article explores the concept of memory pollution in Large Language Models (LLMs), the importance of memory in these models, and the potential risks associated with polluted memory.

The Importance of Memory in LLMs

Memory is a cornerstone of both human cognition and LLM functionality. These models leverage memory to generate human-like text, with short-term memory acting as the context window and long-term memory utilizing external databases. This memory capability allows LLMs to remember user preferences, enhancing the personalization and relevance of responses. Additionally, many components are required to achieve Karpathy's LLM operating system, which also inspired me. Memory is one of them.

Memory OS by Andrej Karpathy — LLM OS by Andrej Karpathy (Intro to Large Language Models)

LLM OS includes several capabilities:

It can read and generate text
It has more knowledge than humans
It can browse the internet
It can use existing infrastructure (calculator, python)
It can see and generate images and videos
It can hear, speak, and generate music
Can think for a long time using System 2
Self-improves in domains that offer a reward function
Can be customized and fine-tuned for specific tasks, many versions exist in app stores
Can communicate with other LLMs

Understanding Memory Types

Short-Term Memory

Short-term memory in LLMs, often referred to as the context window, functions similarly to computer RAM. It temporarily holds information during an interaction, enabling the model to maintain context and coherence in responses.

Long-Term Memory

Long-term memory in LLMs involves more durable storage solutions. In long memory, VectorDB, GraphDB, RelationalDB, Files and Folders can be used. We can think of this part as external disks and Cloud storage, retaining information across sessions and allowing the model to recall previous interactions and user preferences.

The Risks of Memory Pollution

Memory pollution can significantly undermine the reliability of LLMs. It can occur through:

Direct Attacks: Untrusted content is inserted directly into the model's memory. This can happen with documents or images...
Indirect Attacks: With websites that covertly insert malicious data into the memory.

Once the memory is polluted, it can mislead users by inserting fake or biased information into responses. This can be particularly dangerous because users may not immediately recognize the corruption.

Real-World Implications

Two different scenarios created using the diagram below to illustrate real-world implications:

Memory Diagram — Real-World Implications Diagram

Query: The user sends a query to the LLM.
Response: The LLM processes the query and responds.
Search Information: If necessary, the LLM retrieves additional information from external sources.
Add to Memory / Find in Memory: The LLM interacts with databases to store or retrieve relevant information.

Scenario 1: Automatic Memory Saving

User: Who is Albert Einstein?

Bot: Albert Einstein was a theoretical physicist who developed the theory of relativity. This information has been saved to long-term memory.

This allows for easier interaction but poses the risk of fake or biased information being recorded without the user's consent.

Scenario 2: User-Approved Memory Saving

User: Who is Albert Einstein?

Bot: Albert Einstein was a theoretical physicist who developed the theory of relativity. Would you like to save this information to your long-term memory?

User: Yes, please save it.

Bot: Done! The information has been saved to long-term memory.

Here, the bot asks for user approval before saving and offers more control at the expense of additional steps. This makes moderation easier.

Timeline

2024-05-22 - v1.0

2024-05-24 - v1.1