Back to Blog

Memory for AI Agents: Short-term, Long-term, and RAG

AI AgentsAlin Dobra16 min read

Memory for AI Agents: Short-term, Long-term, and RAG

Every conversation with ChatGPT starts fresh. It doesn't remember you, your preferences, or your previous conversations. For a chatbot, that's fine. For an agent that's supposed to work with you over time? It's a fatal flaw.

Memory transforms agents from stateless tools into intelligent assistants that learn, adapt, and improve.

This guide shows you how to implement memory in AI agents—from simple conversation buffers to sophisticated retrieval systems that give agents access to vast knowledge bases.

Why Agents Need Memory

Without memory, agents:

  • Forget context mid-conversation
  • Can't learn from past mistakes
  • Have no access to private knowledge
  • Repeat the same errors endlessly
  • Can't personalize to users

With memory, agents:

  • Maintain context across sessions
  • Learn from experience
  • Access company knowledge bases
  • Improve over time
  • Personalize responses

The Three Types of Agent Memory

text
1
2
                     AGENT MEMORY                            
3
4
                                                             
5
       
6
     SHORT-TERM          LONG-TERM        EXTERNAL    
7
      MEMORY              MEMORY          KNOWLEDGE   
8
       
9
                                                      
10
    Context window    Past sessions     Documents  
11
    Current chat      User prefs        Databases  
12
    Working state     Learned facts     APIs       
13
                       Experiences       Web        
14
                                                      
15
    Volatile           Persistent         Retrieved   
16
    ~128K tokens       Unlimited          On-demand   
17
                                                      
18
       
19
                                                             
20
21
 

1. Short-Term Memory (Context Window)

The conversation history within a single session. Limited by the model's context window (4K to 128K+ tokens).

2. Long-Term Memory (Persistent)

Information that persists across sessions—user preferences, past interactions, learned facts. Stored externally and retrieved when needed.

3. External Knowledge (RAG)

Access to documents, databases, and knowledge bases that weren't in the model's training data. Retrieved dynamically based on the current query.

Short-Term Memory: Managing Context

Basic Conversation Buffer

The simplest memory—just keep the full conversation:

python
1
class ConversationBuffer:
2
    def __init__(self, max_tokens: int = 8000):
3
        self.messages = []
4
        self.max_tokens = max_tokens
5
    
6
    def add(self, role: str, content: str):
7
        self.messages.append({"role": role, "content": content})
8
        self._trim_if_needed()
9
    
10
    def _trim_if_needed(self):
11
        """Remove oldest messages if we exceed token limit"""
12
        while self._estimate_tokens() > self.max_tokens and len(self.messages) > 1:
13
            # Keep system message, remove oldest user/assistant pair
14
            if self.messages[0]["role"] == "system":
15
                self.messages.pop(1)
16
            else:
17
                self.messages.pop(0)
18
    
19
    def _estimate_tokens(self) -> int:
20
        # Rough estimate: 4 chars per token
21
        return sum(len(m["content"]) // 4 for m in self.messages)
22
    
23
    def get_messages(self) -> list:
24
        return self.messages.copy()
25
 
26
 
27
# Usage
28
memory = ConversationBuffer()
29
memory.add("system", "You are a helpful assistant.")
30
memory.add("user", "What's the capital of France?")
31
memory.add("assistant", "The capital of France is Paris.")
32
memory.add("user", "What's its population?")  # Agent remembers we're talking about Paris
33
 

Sliding Window with Summary

For longer conversations, summarize old messages instead of discarding them:

python
1
import openai
2
 
3
class SummarizingMemory:
4
    def __init__(self, window_size: int = 10, max_tokens: int = 4000):
5
        self.client = openai.OpenAI()
6
        self.messages = []
7
        self.summary = ""
8
        self.window_size = window_size
9
        self.max_tokens = max_tokens
10
    
11
    def add(self, role: str, content: str):
12
        self.messages.append({"role": role, "content": content})
13
        
14
        # Summarize when window is exceeded
15
        if len(self.messages) > self.window_size * 2:
16
            self._summarize_old_messages()
17
    
18
    def _summarize_old_messages(self):
19
        """Compress old messages into summary"""
20
        # Take oldest half of messages
21
        to_summarize = self.messages[:self.window_size]
22
        self.messages = self.messages[self.window_size:]
23
        
24
        # Generate summary
25
        summary_prompt = f"""Summarize this conversation, preserving key facts and decisions:
26
 
27
Previous summary: {self.summary}
28
 
29
New messages:
30
{self._format_messages(to_summarize)}
31
 
32
Provide a concise summary."""
33
        
34
        response = self.client.chat.completions.create(
35
            model="gpt-4o-mini",  # Use cheaper model for summarization
36
            messages=[{"role": "user", "content": summary_prompt}]
37
        )
38
        
39
        self.summary = response.choices[0].message.content
40
    
41
    def get_messages(self) -> list:
42
        """Get messages with summary as context"""
43
        result = []
44
        
45
        if self.summary:
46
            result.append({
47
                "role": "system",
48
                "content": f"Previous conversation summary:\n{self.summary}"
49
            })
50
        
51
        result.extend(self.messages)
52
        return result
53
    
54
    def _format_messages(self, messages: list) -> str:
55
        return "\n".join(f"{m['role']}: {m['content']}" for m in messages)
56
 

Working Memory for Multi-Step Tasks

For agents executing multi-step tasks, maintain structured working memory:

python
1
from dataclasses import dataclass, field
2
from typing import Any
3
 
4
@dataclass
5
class WorkingMemory:
6
    """Structured memory for task execution"""
7
    goal: str = ""
8
    current_step: int = 0
9
    plan: list[str] = field(default_factory=list)
10
    completed_steps: list[dict] = field(default_factory=list)
11
    variables: dict[str, Any] = field(default_factory=dict)
12
    errors: list[str] = field(default_factory=list)
13
    
14
    def to_context(self) -> str:
15
        """Convert to context string for LLM"""
16
        return f"""Current Task State:
17
Goal: {self.goal}
18
Progress: Step {self.current_step + 1} of {len(self.plan)}
19
 
20
Plan:
21
{self._format_plan()}
22
 
23
Variables:
24
{self._format_variables()}
25
 
26
Recent Errors: {self.errors[-3:] if self.errors else 'None'}
27
"""
28
    
29
    def _format_plan(self) -> str:
30
        lines = []
31
        for i, step in enumerate(self.plan):
32
            status = "✓" if i < self.current_step else "→" if i == self.current_step else " "
33
            lines.append(f"  [{status}] {i+1}. {step}")
34
        return "\n".join(lines)
35
    
36
    def _format_variables(self) -> str:
37
        if not self.variables:
38
            return "  (none)"
39
        return "\n".join(f"  {k}: {v}" for k, v in self.variables.items())
40
 
41
 
42
# Usage in agent
43
class TaskAgent:
44
    def __init__(self):
45
        self.working_memory = WorkingMemory()
46
    
47
    def execute(self, goal: str):
48
        self.working_memory.goal = goal
49
        self.working_memory.plan = self._create_plan(goal)
50
        
51
        for i, step in enumerate(self.working_memory.plan):
52
            self.working_memory.current_step = i
53
            
54
            # Include working memory in context
55
            context = self.working_memory.to_context()
56
            result = self._execute_step(step, context)
57
            
58
            self.working_memory.completed_steps.append({
59
                "step": step,
60
                "result": result
61
            })
62
            
63
            # Store results as variables for later steps
64
            if "output" in result:
65
                self.working_memory.variables[f"step_{i}_output"] = result["output"]
66
 

Long-Term Memory: Persistence Across Sessions

The most common approach—store memories as embeddings and retrieve by semantic similarity:

python
1
import openai
2
import numpy as np
3
from dataclasses import dataclass
4
from datetime import datetime
5
 
6
@dataclass
7
class Memory:
8
    content: str
9
    embedding: list[float]
10
    metadata: dict
11
    timestamp: datetime
12
    
13
class VectorMemory:
14
    def __init__(self):
15
        self.client = openai.OpenAI()
16
        self.memories: list[Memory] = []
17
    
18
    def add(self, content: str, metadata: dict = None):
19
        """Store a memory with its embedding"""
20
        embedding = self._get_embedding(content)
21
        
22
        memory = Memory(
23
            content=content,
24
            embedding=embedding,
25
            metadata=metadata or {},
26
            timestamp=datetime.now()
27
        )
28
        
29
        self.memories.append(memory)
30
    
31
    def search(self, query: str, top_k: int = 5) -> list[Memory]:
32
        """Find memories most relevant to the query"""
33
        query_embedding = self._get_embedding(query)
34
        
35
        # Calculate similarities
36
        similarities = []
37
        for memory in self.memories:
38
            sim = self._cosine_similarity(query_embedding, memory.embedding)
39
            similarities.append((memory, sim))
40
        
41
        # Sort by similarity and return top_k
42
        similarities.sort(key=lambda x: x[1], reverse=True)
43
        return [m for m, _ in similarities[:top_k]]
44
    
45
    def _get_embedding(self, text: str) -> list[float]:
46
        response = self.client.embeddings.create(
47
            model="text-embedding-3-small",
48
            input=text
49
        )
50
        return response.data[0].embedding
51
    
52
    def _cosine_similarity(self, a: list, b: list) -> float:
53
        a = np.array(a)
54
        b = np.array(b)
55
        return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
56
 
57
 
58
# Usage
59
memory = VectorMemory()
60
 
61
# Store memories
62
memory.add("User prefers Python over JavaScript", {"type": "preference"})
63
memory.add("User's project is an e-commerce platform", {"type": "context"})
64
memory.add("User had trouble with authentication last week", {"type": "issue"})
65
 
66
# Retrieve relevant memories
67
relevant = memory.search("What programming language should I use?")
68
# Returns: "User prefers Python over JavaScript"
69
 

Production Vector Store with Pinecone/Weaviate

For production, use a managed vector database:

python
1
import pinecone
2
from pinecone import Pinecone
3
import openai
4
 
5
class ProductionMemory:
6
    def __init__(self, index_name: str):
7
        self.pc = Pinecone(api_key="your-api-key")
8
        self.index = self.pc.Index(index_name)
9
        self.openai = openai.OpenAI()
10
    
11
    def add(self, memory_id: str, content: str, metadata: dict = None):
12
        """Store memory in Pinecone"""
13
        embedding = self._get_embedding(content)
14
        
15
        self.index.upsert(vectors=[{
16
            "id": memory_id,
17
            "values": embedding,
18
            "metadata": {
19
                "content": content,
20
                **(metadata or {})
21
            }
22
        }])
23
    
24
    def search(self, query: str, top_k: int = 5, filter: dict = None) -> list[dict]:
25
        """Search memories with optional filtering"""
26
        query_embedding = self._get_embedding(query)
27
        
28
        results = self.index.query(
29
            vector=query_embedding,
30
            top_k=top_k,
31
            filter=filter,
32
            include_metadata=True
33
        )
34
        
35
        return [
36
            {
37
                "id": match.id,
38
                "score": match.score,
39
                "content": match.metadata.get("content"),
40
                "metadata": match.metadata
41
            }
42
            for match in results.matches
43
        ]
44
    
45
    def delete(self, memory_id: str):
46
        """Remove a memory"""
47
        self.index.delete(ids=[memory_id])
48
    
49
    def _get_embedding(self, text: str) -> list[float]:
50
        response = self.openai.embeddings.create(
51
            model="text-embedding-3-small",
52
            input=text
53
        )
54
        return response.data[0].embedding
55
 
56
 
57
# Usage with user-specific memories
58
memory = ProductionMemory("agent-memories")
59
 
60
# Store user-specific memory
61
memory.add(
62
    memory_id="user_123_pref_1",
63
    content="User prefers detailed technical explanations",
64
    metadata={"user_id": "123", "type": "preference"}
65
)
66
 
67
# Search only this user's memories
68
results = memory.search(
69
    query="How should I explain this concept?",
70
    filter={"user_id": "123"}
71
)
72
 

Structured Long-Term Memory

For specific types of information, use structured storage:

python
1
import json
2
from datetime import datetime
3
from pathlib import Path
4
 
5
class StructuredMemory:
6
    def __init__(self, storage_path: str):
7
        self.path = Path(storage_path)
8
        self.path.mkdir(parents=True, exist_ok=True)
9
    
10
    def get_user_profile(self, user_id: str) -> dict:
11
        """Get or create user profile"""
12
        profile_path = self.path / f"user_{user_id}.json"
13
        
14
        if profile_path.exists():
15
            return json.loads(profile_path.read_text())
16
        
17
        return {
18
            "user_id": user_id,
19
            "created_at": datetime.now().isoformat(),
20
            "preferences": {},
21
            "facts": [],
22
            "interaction_count": 0
23
        }
24
    
25
    def update_user_profile(self, user_id: str, updates: dict):
26
        """Update user profile"""
27
        profile = self.get_user_profile(user_id)
28
        profile.update(updates)
29
        profile["updated_at"] = datetime.now().isoformat()
30
        
31
        profile_path = self.path / f"user_{user_id}.json"
32
        profile_path.write_text(json.dumps(profile, indent=2))
33
    
34
    def add_fact(self, user_id: str, fact: str, source: str = None):
35
        """Store a learned fact about the user"""
36
        profile = self.get_user_profile(user_id)
37
        
38
        profile["facts"].append({
39
            "fact": fact,
40
            "learned_at": datetime.now().isoformat(),
41
            "source": source
42
        })
43
        
44
        self.update_user_profile(user_id, profile)
45
    
46
    def add_preference(self, user_id: str, key: str, value: str):
47
        """Store a user preference"""
48
        profile = self.get_user_profile(user_id)
49
        profile["preferences"][key] = value
50
        self.update_user_profile(user_id, profile)
51
 
52
 
53
# Usage
54
memory = StructuredMemory("./agent_memory")
55
 
56
# Learn about user
57
memory.add_fact("user_123", "Works at a fintech startup")
58
memory.add_preference("user_123", "communication_style", "concise")
59
memory.add_preference("user_123", "expertise_level", "senior developer")
60
 
61
# Later, personalize responses
62
profile = memory.get_user_profile("user_123")
63
# Use profile["preferences"]["communication_style"] to adjust response length
64
 

RAG: Retrieval Augmented Generation

RAG gives agents access to knowledge beyond their training:

text
1
2
                        RAG Pipeline                         
3
4
                                                             
5
  User Query                                                 
6
                                                            
7
                                                            
8
               
9
     Embed        Search       Retrieve       
10
     Query            Vector DB        Documents      
11
               
12
                                                            
13
                                                            
14
     
15
                      LLM Prompt                           
16
                                                           
17
    Context: [Retrieved documents]                        
18
    Question: [User query]                                
19
    Answer based on the context above.                    
20
                                                           
21
     
22
                                                            
23
                                                            
24
                      Response                               
25
                                                             
26
27
 

Basic RAG Implementation

python
1
import openai
2
from dataclasses import dataclass
3
 
4
@dataclass
5
class Document:
6
    content: str
7
    metadata: dict
8
    embedding: list[float] = None
9
 
10
class RAGAgent:
11
    def __init__(self):
12
        self.client = openai.OpenAI()
13
        self.documents: list[Document] = []
14
    
15
    def add_documents(self, docs: list[str], metadata: list[dict] = None):
16
        """Index documents for retrieval"""
17
        for i, content in enumerate(docs):
18
            embedding = self._get_embedding(content)
19
            doc = Document(
20
                content=content,
21
                metadata=metadata[i] if metadata else {},
22
                embedding=embedding
23
            )
24
            self.documents.append(doc)
25
    
26
    def query(self, question: str, top_k: int = 3) -> str:
27
        """Answer question using retrieved context"""
28
        
29
        # Step 1: Retrieve relevant documents
30
        relevant_docs = self._retrieve(question, top_k)
31
        
32
        # Step 2: Build context
33
        context = "\n\n---\n\n".join([doc.content for doc in relevant_docs])
34
        
35
        # Step 3: Generate answer
36
        response = self.client.chat.completions.create(
37
            model="gpt-4o",
38
            messages=[{
39
                "role": "system",
40
                "content": """Answer the question based on the provided context. 
41
If the context doesn't contain relevant information, say so.
42
Cite sources when possible."""
43
            }, {
44
                "role": "user",
45
                "content": f"""Context:
46
{context}
47
 
48
Question: {question}"""
49
            }]
50
        )
51
        
52
        return response.choices[0].message.content
53
    
54
    def _retrieve(self, query: str, top_k: int) -> list[Document]:
55
        """Find most relevant documents"""
56
        query_embedding = self._get_embedding(query)
57
        
58
        scored = []
59
        for doc in self.documents:
60
            similarity = self._cosine_similarity(query_embedding, doc.embedding)
61
            scored.append((doc, similarity))
62
        
63
        scored.sort(key=lambda x: x[1], reverse=True)
64
        return [doc for doc, _ in scored[:top_k]]
65
    
66
    def _get_embedding(self, text: str) -> list[float]:
67
        response = self.client.embeddings.create(
68
            model="text-embedding-3-small",
69
            input=text
70
        )
71
        return response.data[0].embedding
72
    
73
    def _cosine_similarity(self, a, b):
74
        import numpy as np
75
        a, b = np.array(a), np.array(b)
76
        return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
77
 
78
 
79
# Usage
80
agent = RAGAgent()
81
 
82
# Index company documentation
83
agent.add_documents([
84
    "Our API rate limit is 100 requests per minute for free tier users.",
85
    "Premium users get 1000 requests per minute and priority support.",
86
    "To upgrade, visit settings > billing > upgrade plan.",
87
    "API keys can be rotated in settings > security > API keys."
88
])
89
 
90
# Answer questions using documentation
91
answer = agent.query("How many API requests can I make?")
92
print(answer)
93
# "Based on your tier: Free users can make 100 requests/minute, 
94
#  Premium users can make 1000 requests/minute..."
95
 

Advanced RAG with Chunking and Re-ranking

python
1
from hopx import Sandbox
2
import openai
3
 
4
class AdvancedRAG:
5
    def __init__(self):
6
        self.client = openai.OpenAI()
7
        self.chunks = []
8
    
9
    def index_document(self, content: str, chunk_size: int = 500, overlap: int = 50):
10
        """Split document into overlapping chunks and index"""
11
        chunks = self._chunk_text(content, chunk_size, overlap)
12
        
13
        for i, chunk in enumerate(chunks):
14
            embedding = self._get_embedding(chunk)
15
            self.chunks.append({
16
                "id": f"chunk_{len(self.chunks)}",
17
                "content": chunk,
18
                "embedding": embedding,
19
                "position": i
20
            })
21
    
22
    def query(self, question: str, top_k: int = 5) -> str:
23
        # Step 1: Initial retrieval
24
        candidates = self._retrieve(question, top_k * 2)
25
        
26
        # Step 2: Re-rank with LLM
27
        reranked = self._rerank(question, candidates, top_k)
28
        
29
        # Step 3: Generate with best context
30
        context = "\n\n".join([c["content"] for c in reranked])
31
        
32
        return self._generate_answer(question, context)
33
    
34
    def _chunk_text(self, text: str, size: int, overlap: int) -> list[str]:
35
        """Split text into overlapping chunks"""
36
        words = text.split()
37
        chunks = []
38
        
39
        for i in range(0, len(words), size - overlap):
40
            chunk = " ".join(words[i:i + size])
41
            if chunk:
42
                chunks.append(chunk)
43
        
44
        return chunks
45
    
46
    def _retrieve(self, query: str, top_k: int) -> list[dict]:
47
        """Vector similarity search"""
48
        query_embedding = self._get_embedding(query)
49
        
50
        scored = []
51
        for chunk in self.chunks:
52
            sim = self._cosine_similarity(query_embedding, chunk["embedding"])
53
            scored.append({**chunk, "score": sim})
54
        
55
        scored.sort(key=lambda x: x["score"], reverse=True)
56
        return scored[:top_k]
57
    
58
    def _rerank(self, query: str, candidates: list[dict], top_k: int) -> list[dict]:
59
        """Use LLM to rerank candidates"""
60
        # Format candidates for reranking
61
        candidate_text = "\n".join([
62
            f"[{i}] {c['content'][:200]}..."
63
            for i, c in enumerate(candidates)
64
        ])
65
        
66
        response = self.client.chat.completions.create(
67
            model="gpt-4o-mini",
68
            messages=[{
69
                "role": "user",
70
                "content": f"""Rank these passages by relevance to the question.
71
Return only the indices of the top {top_k} most relevant, in order.
72
 
73
Question: {query}
74
 
75
Passages:
76
{candidate_text}
77
 
78
Return format: 3, 1, 5, 2, 4"""
79
            }]
80
        )
81
        
82
        # Parse ranking
83
        try:
84
            indices = [int(x.strip()) for x in response.choices[0].message.content.split(",")]
85
            return [candidates[i] for i in indices[:top_k] if i < len(candidates)]
86
        except:
87
            return candidates[:top_k]
88
    
89
    def _generate_answer(self, question: str, context: str) -> str:
90
        response = self.client.chat.completions.create(
91
            model="gpt-4o",
92
            messages=[{
93
                "role": "system",
94
                "content": "Answer based on the context. Be precise and cite relevant parts."
95
            }, {
96
                "role": "user",
97
                "content": f"Context:\n{context}\n\nQuestion: {question}"
98
            }]
99
        )
100
        return response.choices[0].message.content
101
 

Combining Memory Types

A complete agent uses all three memory types:

python
1
import openai
2
from datetime import datetime
3
 
4
class MemoryEnabledAgent:
5
    def __init__(self, user_id: str):
6
        self.client = openai.OpenAI()
7
        self.user_id = user_id
8
        
9
        # Short-term: Current conversation
10
        self.conversation = SummarizingMemory()
11
        
12
        # Long-term: User-specific memories
13
        self.user_memory = VectorMemory()
14
        
15
        # External: Knowledge base
16
        self.knowledge_base = RAGAgent()
17
        
18
        # Load user profile
19
        self.profile = self._load_profile()
20
    
21
    def chat(self, message: str) -> str:
22
        # Add user message to short-term memory
23
        self.conversation.add("user", message)
24
        
25
        # Retrieve relevant long-term memories
26
        relevant_memories = self.user_memory.search(message, top_k=3)
27
        memory_context = "\n".join([m.content for m in relevant_memories])
28
        
29
        # Retrieve relevant knowledge
30
        knowledge_context = ""
31
        if self._needs_knowledge(message):
32
            knowledge_results = self.knowledge_base._retrieve(message, top_k=3)
33
            knowledge_context = "\n".join([k["content"] for k in knowledge_results])
34
        
35
        # Build system prompt with context
36
        system_prompt = self._build_system_prompt(memory_context, knowledge_context)
37
        
38
        # Generate response
39
        messages = [{"role": "system", "content": system_prompt}]
40
        messages.extend(self.conversation.get_messages())
41
        
42
        response = self.client.chat.completions.create(
43
            model="gpt-4o",
44
            messages=messages
45
        )
46
        
47
        assistant_message = response.choices[0].message.content
48
        
49
        # Add to short-term memory
50
        self.conversation.add("assistant", assistant_message)
51
        
52
        # Extract and store any new facts about user
53
        self._extract_and_store_facts(message, assistant_message)
54
        
55
        return assistant_message
56
    
57
    def _build_system_prompt(self, memories: str, knowledge: str) -> str:
58
        prompt = f"""You are a helpful AI assistant with memory.
59
 
60
User Profile:
61
- Name: {self.profile.get('name', 'Unknown')}
62
- Preferences: {self.profile.get('preferences', {})}
63
 
64
Relevant memories about this user:
65
{memories if memories else '(No relevant memories)'}
66
 
67
Relevant knowledge:
68
{knowledge if knowledge else '(No external knowledge needed)'}
69
 
70
Use this context to personalize your responses."""
71
        
72
        return prompt
73
    
74
    def _needs_knowledge(self, message: str) -> bool:
75
        """Determine if we need to search knowledge base"""
76
        knowledge_triggers = ["how do", "what is", "explain", "help me", "documentation"]
77
        return any(trigger in message.lower() for trigger in knowledge_triggers)
78
    
79
    def _extract_and_store_facts(self, user_msg: str, assistant_msg: str):
80
        """Extract facts from conversation to store in long-term memory"""
81
        extraction_prompt = f"""Extract any new facts about the user from this exchange.
82
Return JSON: {{"facts": ["fact1", "fact2"]}} or {{"facts": []}} if none.
83
 
84
User: {user_msg}
85
Assistant: {assistant_msg}"""
86
        
87
        response = self.client.chat.completions.create(
88
            model="gpt-4o-mini",
89
            messages=[{"role": "user", "content": extraction_prompt}],
90
            response_format={"type": "json_object"}
91
        )
92
        
93
        import json
94
        result = json.loads(response.choices[0].message.content)
95
        
96
        for fact in result.get("facts", []):
97
            self.user_memory.add(
98
                content=fact,
99
                metadata={
100
                    "user_id": self.user_id,
101
                    "extracted_at": datetime.now().isoformat()
102
                }
103
            )
104
 
105
 
106
# Usage
107
agent = MemoryEnabledAgent(user_id="user_123")
108
 
109
# First conversation
110
agent.chat("Hi! I'm a Python developer working on machine learning projects.")
111
agent.chat("I prefer concise explanations.")
112
 
113
# Later session - agent remembers!
114
agent.chat("Can you help me with my code?")
115
# Agent responds knowing user is a Python ML developer who prefers concise answers
116
 

Memory with Code Execution

For agents that execute code, persist state across executions:

python
1
from hopx import Sandbox
2
import json
3
 
4
class StatefulCodeAgent:
5
    def __init__(self, session_id: str):
6
        self.session_id = session_id
7
        self.sandbox = None
8
        self.state_file = f"/app/state_{session_id}.json"
9
    
10
    def start_session(self):
11
        """Create sandbox and restore state"""
12
        self.sandbox = Sandbox.create(template="code-interpreter")
13
        
14
        # Check for existing state
15
        try:
16
            state_content = self.sandbox.files.read(self.state_file)
17
            self.state = json.loads(state_content)
18
            print(f"Restored state with {len(self.state.get('variables', {}))} variables")
19
        except:
20
            self.state = {"variables": {}, "history": []}
21
    
22
    def execute(self, code: str) -> str:
23
        """Execute code and persist state"""
24
        
25
        # Inject state restoration
26
        setup_code = f"""
27
import json
28
 
29
# Restore variables from previous session
30
_state = {json.dumps(self.state.get('variables', {}))}
31
globals().update(_state)
32
"""
33
        
34
        # Wrap code to capture new variables
35
        wrapped_code = f"""
36
{setup_code}
37
 
38
# User code
39
{code}
40
 
41
# Capture state
42
import json
43
_new_state = {{k: v for k, v in globals().items() 
44
               if not k.startswith('_') and k not in ['json', 'builtins']
45
               and isinstance(v, (int, float, str, list, dict, bool))}}
46
with open('{self.state_file}', 'w') as f:
47
    json.dump({{'variables': _new_state}}, f)
48
"""
49
        
50
        self.sandbox.files.write("/app/code.py", wrapped_code)
51
        result = self.sandbox.commands.run("python /app/code.py")
52
        
53
        # Update local state
54
        try:
55
            state_content = self.sandbox.files.read(self.state_file)
56
            self.state = json.loads(state_content)
57
        except:
58
            pass
59
        
60
        return result.stdout if result.exit_code == 0 else f"Error: {result.stderr}"
61
    
62
    def get_variables(self) -> dict:
63
        """Get current session variables"""
64
        return self.state.get("variables", {})
65
    
66
    def end_session(self):
67
        """Clean up but persist state for next session"""
68
        if self.sandbox:
69
            # State is already persisted in sandbox
70
            self.sandbox.kill()
71
 
72
 
73
# Usage
74
agent = StatefulCodeAgent("session_abc123")
75
agent.start_session()
76
 
77
# First execution
78
agent.execute("x = 10\ny = 20\nprint(x + y)")  # Output: 30
79
 
80
# Second execution - variables persist!
81
agent.execute("print(x * y)")  # Output: 200
82
 
83
# Check what's stored
84
print(agent.get_variables())  # {'x': 10, 'y': 20}
85
 
86
agent.end_session()
87
 

Best Practices

1. Separate Memory Concerns

python
1
# ❌ Don't: Mixing all memory in one place
2
memory = {"conversation": [...], "user_facts": [...], "documents": [...]}
3
 
4
# ✅ Do: Separate by type and lifecycle
5
class AgentMemory:
6
    def __init__(self):
7
        self.short_term = ConversationBuffer()   # Per-session
8
        self.long_term = VectorMemory()           # Persistent
9
        self.knowledge = RAGAgent()               # External
10
 

2. Implement Memory Decay

python
1
def search_with_decay(self, query: str, decay_days: int = 30):
2
    """Recent memories are weighted higher"""
3
    from datetime import datetime, timedelta
4
    
5
    results = self.search(query)
6
    
7
    now = datetime.now()
8
    for result in results:
9
        age_days = (now - result.timestamp).days
10
        decay_factor = max(0.5, 1 - (age_days / decay_days))
11
        result.score *= decay_factor
12
    
13
    return sorted(results, key=lambda x: x.score, reverse=True)
14
 

3. Limit Memory Scope

python
1
# Filter memories by relevance
2
def get_relevant_memories(self, query: str, context: str):
3
    all_memories = self.search(query)
4
    
5
    # Only include highly relevant memories
6
    return [m for m in all_memories if m.score > 0.7]
7
 

4. Handle Memory Conflicts

python
1
def add_with_conflict_resolution(self, fact: str):
2
    # Check for conflicting memories
3
    similar = self.search(fact, top_k=3)
4
    
5
    for existing in similar:
6
        if self._is_contradiction(fact, existing.content):
7
            # New information replaces old
8
            self.delete(existing.id)
9
    
10
    self.add(fact)
11
 

Conclusion

Memory transforms agents from forgetful assistants into intelligent systems that:

  • Maintain context within and across sessions
  • Learn preferences and personalize over time
  • Access knowledge beyond training data
  • Build expertise through accumulated experience

Start with simple conversation memory. Add long-term storage when you need persistence. Implement RAG when you have knowledge bases to query.

The agent that remembers outperforms the agent that forgets. Every time.


Ready to build agents with persistent memory and code execution? Get started with HopX — sandboxes that maintain state across sessions.

Further Reading