Back to Blog

Build a Code Interpreter Agent with OpenAI and HopX

AI AgentsAmin Al Ali Al Darwish10 min read

Build a Code Interpreter Agent with OpenAI and HopX

OpenAI's Code Interpreter is powerful but limited: you can't customize the environment, install arbitrary packages, or integrate it with your own data. Let's build our own version using GPT-4 and HopX.

By the end of this tutorial, you'll have an AI agent that:

  • Writes Python code based on natural language requests
  • Executes code in a secure, isolated sandbox
  • Handles errors and iterates until success
  • Returns results to the user

Architecture Overview

text
1
2
                      User Request                        
3
              "Analyze this CSV and plot sales"           
4
5
                            
6
                            
7
8
                     OpenAI GPT-4                         
9
              (with tool/function calling)                
10
11
                            
12
                    Generates code
13
                            
14
                            
15
16
                    HopX Sandbox                          
17
              (isolated microVM execution)                
18
19
                            
20
                     Returns output
21
                            
22
                            
23
24
                  Agent Decision                          
25
         Success? Return to user : Retry with fix         
26
27
 

Prerequisites

Step 1: Set Up the Project

Create a new directory and install dependencies:

bash
1
mkdir code-interpreter
2
cd code-interpreter
3
pip install openai hopx-ai
4
 

Set your API keys:

bash
1
export OPENAI_API_KEY="sk-..."
2
export HOPX_API_KEY="..."
3
 

Step 2: Define the Execute Code Tool

OpenAI's function calling feature lets GPT-4 request code execution. First, define the tool schema:

python
1
# tools.py
2
 
3
EXECUTE_CODE_TOOL = {
4
    "type": "function",
5
    "function": {
6
        "name": "execute_python",
7
        "description": "Execute Python code in a secure sandbox. Use this to run calculations, process data, create visualizations, or test code. The sandbox has pandas, numpy, matplotlib, and other common libraries installed.",
8
        "parameters": {
9
            "type": "object",
10
            "properties": {
11
                "code": {
12
                    "type": "string",
13
                    "description": "The Python code to execute. Must be valid Python 3 code."
14
                },
15
                "description": {
16
                    "type": "string", 
17
                    "description": "Brief description of what this code does"
18
                }
19
            },
20
            "required": ["code", "description"]
21
        }
22
    }
23
}
24
 
25
TOOLS = [EXECUTE_CODE_TOOL]
26
 

Step 3: Create the Sandbox Executor

This function handles code execution in HopX:

python
1
# executor.py
2
 
3
from hopx_ai import Sandbox
4
from typing import Optional
5
import base64
6
 
7
class CodeExecutor:
8
    def __init__(self, template: str = "code-interpreter"):
9
        self.template = template
10
        self.sandbox: Optional[Sandbox] = None
11
    
12
    def __enter__(self):
13
        # Create sandbox on entry
14
        self.sandbox = Sandbox.create(template=self.template)
15
        return self
16
    
17
    def __exit__(self, *args):
18
        # Clean up sandbox on exit
19
        if self.sandbox:
20
            self.sandbox.kill()
21
    
22
    def execute(self, code: str) -> dict:
23
        """Execute code and return structured result."""
24
        if not self.sandbox:
25
            raise RuntimeError("Executor not initialized. Use 'with' statement.")
26
        
27
        result = self.sandbox.run_code(code)
28
        
29
        return {
30
            "success": result.exit_code == 0,
31
            "stdout": result.stdout,
32
            "stderr": result.stderr,
33
            "exit_code": result.exit_code
34
        }
35
    
36
    def upload_file(self, local_path: str, sandbox_path: str):
37
        """Upload a file to the sandbox."""
38
        with open(local_path, 'rb') as f:
39
            content = f.read()
40
        self.sandbox.files.write(sandbox_path, content)
41
    
42
    def download_file(self, sandbox_path: str) -> bytes:
43
        """Download a file from the sandbox."""
44
        return self.sandbox.files.read(sandbox_path)
45
    
46
    def list_files(self, path: str = "/app") -> list:
47
        """List files in sandbox directory."""
48
        return self.sandbox.files.list(path)
49
 

Step 4: Build the Agent Loop

The agent orchestrates between GPT-4 and the sandbox:

python
1
# agent.py
2
 
3
from openai import OpenAI
4
from executor import CodeExecutor
5
from tools import TOOLS
6
import json
7
 
8
class CodeInterpreterAgent:
9
    def __init__(self, model: str = "gpt-4-turbo-preview"):
10
        self.client = OpenAI()
11
        self.model = model
12
        self.max_iterations = 5
13
        
14
        self.system_prompt = """You are a helpful coding assistant that can execute Python code.
15
 
16
When the user asks you to do something that requires computation, data analysis, or code execution:
17
1. Write Python code to accomplish the task
18
2. Use the execute_python tool to run it
19
3. Analyze the output and provide a clear response
20
 
21
You have access to a sandbox with these pre-installed libraries:
22
- pandas, numpy, scipy (data analysis)
23
- matplotlib, seaborn, plotly (visualization)
24
- scikit-learn (machine learning)
25
- requests (HTTP)
26
- Standard library (json, csv, datetime, etc.)
27
 
28
Guidelines:
29
- Always show your work by executing code
30
- Handle errors gracefully and retry with fixes
31
- For visualizations, save to /app/output.png and mention it
32
- Be concise but thorough in explanations"""
33
 
34
    def run(self, user_message: str, executor: CodeExecutor) -> str:
35
        """Run the agent loop."""
36
        messages = [
37
            {"role": "system", "content": self.system_prompt},
38
            {"role": "user", "content": user_message}
39
        ]
40
        
41
        for iteration in range(self.max_iterations):
42
            # Call GPT-4
43
            response = self.client.chat.completions.create(
44
                model=self.model,
45
                messages=messages,
46
                tools=TOOLS,
47
                tool_choice="auto"
48
            )
49
            
50
            assistant_message = response.choices[0].message
51
            messages.append(assistant_message)
52
            
53
            # Check if GPT-4 wants to execute code
54
            if assistant_message.tool_calls:
55
                for tool_call in assistant_message.tool_calls:
56
                    if tool_call.function.name == "execute_python":
57
                        # Parse the code
58
                        args = json.loads(tool_call.function.arguments)
59
                        code = args["code"]
60
                        description = args.get("description", "Executing code")
61
                        
62
                        print(f"\n🔧 Executing: {description}")
63
                        print(f"```python\n{code}\n```")
64
                        
65
                        # Execute in sandbox
66
                        result = executor.execute(code)
67
                        
68
                        # Format result for GPT-4
69
                        if result["success"]:
70
                            tool_result = f"✅ Code executed successfully.\n\nOutput:\n{result['stdout']}"
71
                        else:
72
                            tool_result = f"❌ Code failed with exit code {result['exit_code']}.\n\nError:\n{result['stderr']}\n\nStdout:\n{result['stdout']}"
73
                        
74
                        print(f"\n📤 Result: {tool_result[:200]}...")
75
                        
76
                        # Add tool result to messages
77
                        messages.append({
78
                            "role": "tool",
79
                            "tool_call_id": tool_call.id,
80
                            "content": tool_result
81
                        })
82
            else:
83
                # No tool calls - GPT-4 is done
84
                return assistant_message.content
85
        
86
        return "Max iterations reached. Please try a simpler request."
87
 
88
 
89
def main():
90
    """Example usage."""
91
    agent = CodeInterpreterAgent()
92
    
93
    with CodeExecutor() as executor:
94
        # Example 1: Simple calculation
95
        print("\n" + "="*60)
96
        print("Example 1: Fibonacci sequence")
97
        print("="*60)
98
        result = agent.run(
99
            "Calculate the first 20 Fibonacci numbers and find their sum",
100
            executor
101
        )
102
        print(f"\n🤖 Agent: {result}")
103
        
104
        # Example 2: Data analysis
105
        print("\n" + "="*60)
106
        print("Example 2: Data analysis")
107
        print("="*60)
108
        result = agent.run(
109
            "Create a sample dataset of 100 sales records with date, product, and amount columns. Then show me basic statistics and the top 5 products by total sales.",
110
            executor
111
        )
112
        print(f"\n🤖 Agent: {result}")
113
 
114
 
115
if __name__ == "__main__":
116
    main()
117
 

Step 5: Run the Agent

bash
1
python agent.py
2
 

Example output:

text
1
============================================================
2
Example 1: Fibonacci sequence
3
============================================================
4
 
5
 Executing: Calculate Fibonacci numbers and sum
6
```python
7
def fibonacci(n):
8
    fib = [0, 1]
9
    for i in range(2, n):
10
        fib.append(fib[i-1] + fib[i-2])
11
    return fib
12
 
13
fibs = fibonacci(20)
14
print(f"First 20 Fibonacci numbers: {fibs}")
15
print(f"Sum: {sum(fibs)}")
16
 

📤 Result: ✅ Code executed successfully.

Output: First 20 Fibonacci numbers: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181] Sum: 10945

🤖 Agent: The first 20 Fibonacci numbers are [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181] and their sum is 10,945.

text
1
 
2
## Step 6: Add File Upload Support
3
 
4
Let users upload files for analysis:
5
 
6
```python
7
# agent_with_files.py
8
 
9
def analyze_csv(file_path: str, question: str):
10
    """Upload a CSV and ask questions about it."""
11
    agent = CodeInterpreterAgent()
12
    
13
    with CodeExecutor() as executor:
14
        # Upload the file to sandbox
15
        executor.upload_file(file_path, "/app/data.csv")
16
        
17
        # Modify the prompt to mention the file
18
        prompt = f"""I've uploaded a CSV file to /app/data.csv.
19
 
20
Please analyze it and answer this question:
21
{question}
22
 
23
Start by loading the file and showing its structure."""
24
        
25
        result = agent.run(prompt, executor)
26
        return result
27
 
28
 
29
# Usage
30
result = analyze_csv(
31
    "sales_data.csv",
32
    "What were the top 3 performing months?"
33
)
34
print(result)
35
 

Step 7: Handle Visualizations

For charts and plots, save to a file and download:

python
1
# visualization.py
2
 
3
def create_visualization(data_description: str, chart_request: str):
4
    """Create a visualization and return the image."""
5
    agent = CodeInterpreterAgent()
6
    
7
    prompt = f"""Create the following visualization:
8
{chart_request}
9
 
10
Data: {data_description}
11
 
12
Save the chart to /app/chart.png using plt.savefig('/app/chart.png', dpi=150, bbox_inches='tight')"""
13
    
14
    with CodeExecutor() as executor:
15
        result = agent.run(prompt, executor)
16
        
17
        # Download the generated chart
18
        try:
19
            image_data = executor.download_file("/app/chart.png")
20
            
21
            # Save locally
22
            with open("output_chart.png", "wb") as f:
23
                f.write(image_data)
24
            
25
            print("Chart saved to output_chart.png")
26
        except Exception as e:
27
            print(f"Could not download chart: {e}")
28
        
29
        return result
30
 
31
 
32
# Usage
33
create_visualization(
34
    "Monthly sales data for 2024",
35
    "A bar chart showing sales by month with a trend line"
36
)
37
 

Step 8: Error Recovery

The agent should handle errors and retry. Here's an enhanced version:

python
1
# robust_agent.py
2
 
3
class RobustCodeInterpreterAgent(CodeInterpreterAgent):
4
    def __init__(self, *args, **kwargs):
5
        super().__init__(*args, **kwargs)
6
        
7
        # Enhanced system prompt with error handling guidance
8
        self.system_prompt += """
9
 
10
Error Handling:
11
- If code fails, analyze the error carefully
12
- Fix the issue and try again
13
- Common fixes: import missing libraries, fix syntax errors, handle edge cases
14
- After 2 failed attempts at the same approach, try a different method"""
15
    
16
    def run(self, user_message: str, executor: CodeExecutor) -> str:
17
        """Run with enhanced error tracking."""
18
        messages = [
19
            {"role": "system", "content": self.system_prompt},
20
            {"role": "user", "content": user_message}
21
        ]
22
        
23
        error_count = 0
24
        last_error = None
25
        
26
        for iteration in range(self.max_iterations):
27
            response = self.client.chat.completions.create(
28
                model=self.model,
29
                messages=messages,
30
                tools=TOOLS,
31
                tool_choice="auto"
32
            )
33
            
34
            assistant_message = response.choices[0].message
35
            messages.append(assistant_message)
36
            
37
            if assistant_message.tool_calls:
38
                for tool_call in assistant_message.tool_calls:
39
                    if tool_call.function.name == "execute_python":
40
                        args = json.loads(tool_call.function.arguments)
41
                        code = args["code"]
42
                        
43
                        result = executor.execute(code)
44
                        
45
                        if result["success"]:
46
                            error_count = 0  # Reset on success
47
                            tool_result = f"✅ Success:\n{result['stdout']}"
48
                        else:
49
                            error_count += 1
50
                            last_error = result["stderr"]
51
                            
52
                            if error_count >= 3:
53
                                tool_result = f""" Failed ({error_count} attempts).
54
 
55
Error: {result['stderr']}
56
 
57
 You've had multiple failures. Please try a completely different approach or simplify the solution."""
58
                            else:
59
                                tool_result = f""" Failed (attempt {error_count}):
60
{result['stderr']}
61
 
62
Please fix the error and try again."""
63
                        
64
                        messages.append({
65
                            "role": "tool",
66
                            "tool_call_id": tool_call.id,
67
                            "content": tool_result
68
                        })
69
            else:
70
                return assistant_message.content
71
        
72
        return f"Could not complete the task after {self.max_iterations} iterations. Last error: {last_error}"
73
 

Complete Example: Data Analysis Agent

Here's a full working example:

python
1
# complete_agent.py
2
 
3
from openai import OpenAI
4
from hopx_ai import Sandbox
5
import json
6
 
7
# Tool definition
8
TOOLS = [{
9
    "type": "function",
10
    "function": {
11
        "name": "execute_python",
12
        "description": "Execute Python code in a secure sandbox with pandas, numpy, matplotlib",
13
        "parameters": {
14
            "type": "object",
15
            "properties": {
16
                "code": {"type": "string", "description": "Python code to execute"}
17
            },
18
            "required": ["code"]
19
        }
20
    }
21
}]
22
 
23
def run_code_interpreter(user_request: str) -> str:
24
    """Complete code interpreter agent."""
25
    client = OpenAI()
26
    
27
    messages = [
28
        {"role": "system", "content": "You are a Python coding assistant. Execute code to answer questions."},
29
        {"role": "user", "content": user_request}
30
    ]
31
    
32
    with Sandbox.create(template="code-interpreter") as sandbox:
33
        for _ in range(5):  # Max iterations
34
            response = client.chat.completions.create(
35
                model="gpt-4-turbo-preview",
36
                messages=messages,
37
                tools=TOOLS
38
            )
39
            
40
            msg = response.choices[0].message
41
            messages.append(msg)
42
            
43
            if not msg.tool_calls:
44
                return msg.content
45
            
46
            for tool_call in msg.tool_calls:
47
                code = json.loads(tool_call.function.arguments)["code"]
48
                result = sandbox.run_code(code)
49
                
50
                output = result.stdout if result.exit_code == 0 else f"Error: {result.stderr}"
51
                messages.append({
52
                    "role": "tool",
53
                    "tool_call_id": tool_call.id,
54
                    "content": output
55
                })
56
    
57
    return "Could not complete request."
58
 
59
 
60
# Example usage
61
if __name__ == "__main__":
62
    result = run_code_interpreter(
63
        "Generate 1000 random numbers, calculate their mean and standard deviation, "
64
        "and create a histogram showing their distribution."
65
    )
66
    print(result)
67
 

Common Questions

Can I use Claude or other LLMs instead of GPT-4?

Yes! Any LLM with function/tool calling works. Anthropic Claude example:

python
1
from anthropic import Anthropic
2
 
3
client = Anthropic()
4
response = client.messages.create(
5
    model="claude-3-opus-20240229",
6
    tools=[...],  # Same tool schema
7
    messages=[...]
8
)
9
 

How do I handle long-running code?

For code that takes more than 60 seconds, use background execution:

python
1
# Start long-running task
2
process_id = sandbox.run_code_background(long_running_code)
3
 
4
# Check status periodically
5
while True:
6
    status = sandbox.get_process_status(process_id)
7
    if status.completed:
8
        break
9
    time.sleep(5)
10
 

Can I keep the sandbox between requests?

Yes, to reuse state (installed packages, created files):

python
1
# Create once
2
sandbox = Sandbox.create(template="code-interpreter")
3
 
4
# Use for multiple requests
5
sandbox.run_code("pip install transformers")
6
sandbox.run_code("from transformers import pipeline; ...")
7
sandbox.run_code("# Uses same environment...")
8
 
9
# Clean up when done
10
sandbox.kill()
11
 

How do I restrict what code can do?

HopX sandboxes are already isolated. For additional restrictions:

  • Use network policies to limit outbound connections
  • Set resource limits (CPU, memory, disk)
  • Use custom templates with minimal packages

Conclusion

You've built a Code Interpreter agent that:

  • ✅ Uses GPT-4 for code generation
  • ✅ Executes code in isolated HopX sandboxes
  • ✅ Handles errors and iterates
  • ✅ Supports file upload/download
  • ✅ Creates visualizations

This architecture is production-ready. The sandbox isolation means even malicious or buggy LLM-generated code can't harm your infrastructure.


Ready to build your own? Sign up for HopX and get $200 in free credits.