Build a Code Interpreter Agent with OpenAI and HopX
OpenAI's Code Interpreter is powerful but limited: you can't customize the environment, install arbitrary packages, or integrate it with your own data. Let's build our own version using GPT-4 and HopX.
By the end of this tutorial, you'll have an AI agent that:
- Writes Python code based on natural language requests
- Executes code in a secure, isolated sandbox
- Handles errors and iterates until success
- Returns results to the user
Architecture Overview
| 1 | ┌─────────────────────────────────────────────────────────┐ |
| 2 | │ User Request │ |
| 3 | │ "Analyze this CSV and plot sales" │ |
| 4 | └─────────────────────────────────────────────────────────┘ |
| 5 | │ |
| 6 | ▼ |
| 7 | ┌─────────────────────────────────────────────────────────┐ |
| 8 | │ OpenAI GPT-4 │ |
| 9 | │ (with tool/function calling) │ |
| 10 | └─────────────────────────────────────────────────────────┘ |
| 11 | │ |
| 12 | Generates code |
| 13 | │ |
| 14 | ▼ |
| 15 | ┌─────────────────────────────────────────────────────────┐ |
| 16 | │ HopX Sandbox │ |
| 17 | │ (isolated microVM execution) │ |
| 18 | └─────────────────────────────────────────────────────────┘ |
| 19 | │ |
| 20 | Returns output |
| 21 | │ |
| 22 | ▼ |
| 23 | ┌─────────────────────────────────────────────────────────┐ |
| 24 | │ Agent Decision │ |
| 25 | │ Success? Return to user : Retry with fix │ |
| 26 | └─────────────────────────────────────────────────────────┘ |
| 27 | |
Prerequisites
- Python 3.8+
- OpenAI API key
- HopX API key (get one at console.hopx.ai)
Step 1: Set Up the Project
Create a new directory and install dependencies:
| 1 | mkdir code-interpreter |
| 2 | cd code-interpreter |
| 3 | pip install openai hopx-ai |
| 4 | |
Set your API keys:
| 1 | export OPENAI_API_KEY="sk-..." |
| 2 | export HOPX_API_KEY="..." |
| 3 | |
Step 2: Define the Execute Code Tool
OpenAI's function calling feature lets GPT-4 request code execution. First, define the tool schema:
| 1 | # tools.py |
| 2 | |
| 3 | EXECUTE_CODE_TOOL = { |
| 4 | "type": "function", |
| 5 | "function": { |
| 6 | "name": "execute_python", |
| 7 | "description": "Execute Python code in a secure sandbox. Use this to run calculations, process data, create visualizations, or test code. The sandbox has pandas, numpy, matplotlib, and other common libraries installed.", |
| 8 | "parameters": { |
| 9 | "type": "object", |
| 10 | "properties": { |
| 11 | "code": { |
| 12 | "type": "string", |
| 13 | "description": "The Python code to execute. Must be valid Python 3 code." |
| 14 | }, |
| 15 | "description": { |
| 16 | "type": "string", |
| 17 | "description": "Brief description of what this code does" |
| 18 | } |
| 19 | }, |
| 20 | "required": ["code", "description"] |
| 21 | } |
| 22 | } |
| 23 | } |
| 24 | |
| 25 | TOOLS = [EXECUTE_CODE_TOOL] |
| 26 | |
Step 3: Create the Sandbox Executor
This function handles code execution in HopX:
| 1 | # executor.py |
| 2 | |
| 3 | from hopx_ai import Sandbox |
| 4 | from typing import Optional |
| 5 | import base64 |
| 6 | |
| 7 | class CodeExecutor: |
| 8 | def __init__(self, template: str = "code-interpreter"): |
| 9 | self.template = template |
| 10 | self.sandbox: Optional[Sandbox] = None |
| 11 | |
| 12 | def __enter__(self): |
| 13 | # Create sandbox on entry |
| 14 | self.sandbox = Sandbox.create(template=self.template) |
| 15 | return self |
| 16 | |
| 17 | def __exit__(self, *args): |
| 18 | # Clean up sandbox on exit |
| 19 | if self.sandbox: |
| 20 | self.sandbox.kill() |
| 21 | |
| 22 | def execute(self, code: str) -> dict: |
| 23 | """Execute code and return structured result.""" |
| 24 | if not self.sandbox: |
| 25 | raise RuntimeError("Executor not initialized. Use 'with' statement.") |
| 26 | |
| 27 | result = self.sandbox.run_code(code) |
| 28 | |
| 29 | return { |
| 30 | "success": result.exit_code == 0, |
| 31 | "stdout": result.stdout, |
| 32 | "stderr": result.stderr, |
| 33 | "exit_code": result.exit_code |
| 34 | } |
| 35 | |
| 36 | def upload_file(self, local_path: str, sandbox_path: str): |
| 37 | """Upload a file to the sandbox.""" |
| 38 | with open(local_path, 'rb') as f: |
| 39 | content = f.read() |
| 40 | self.sandbox.files.write(sandbox_path, content) |
| 41 | |
| 42 | def download_file(self, sandbox_path: str) -> bytes: |
| 43 | """Download a file from the sandbox.""" |
| 44 | return self.sandbox.files.read(sandbox_path) |
| 45 | |
| 46 | def list_files(self, path: str = "/app") -> list: |
| 47 | """List files in sandbox directory.""" |
| 48 | return self.sandbox.files.list(path) |
| 49 | |
Step 4: Build the Agent Loop
The agent orchestrates between GPT-4 and the sandbox:
| 1 | # agent.py |
| 2 | |
| 3 | from openai import OpenAI |
| 4 | from executor import CodeExecutor |
| 5 | from tools import TOOLS |
| 6 | import json |
| 7 | |
| 8 | class CodeInterpreterAgent: |
| 9 | def __init__(self, model: str = "gpt-4-turbo-preview"): |
| 10 | self.client = OpenAI() |
| 11 | self.model = model |
| 12 | self.max_iterations = 5 |
| 13 | |
| 14 | self.system_prompt = """You are a helpful coding assistant that can execute Python code. |
| 15 | |
| 16 | When the user asks you to do something that requires computation, data analysis, or code execution: |
| 17 | 1. Write Python code to accomplish the task |
| 18 | 2. Use the execute_python tool to run it |
| 19 | 3. Analyze the output and provide a clear response |
| 20 | |
| 21 | You have access to a sandbox with these pre-installed libraries: |
| 22 | - pandas, numpy, scipy (data analysis) |
| 23 | - matplotlib, seaborn, plotly (visualization) |
| 24 | - scikit-learn (machine learning) |
| 25 | - requests (HTTP) |
| 26 | - Standard library (json, csv, datetime, etc.) |
| 27 | |
| 28 | Guidelines: |
| 29 | - Always show your work by executing code |
| 30 | - Handle errors gracefully and retry with fixes |
| 31 | - For visualizations, save to /app/output.png and mention it |
| 32 | - Be concise but thorough in explanations""" |
| 33 | |
| 34 | def run(self, user_message: str, executor: CodeExecutor) -> str: |
| 35 | """Run the agent loop.""" |
| 36 | messages = [ |
| 37 | {"role": "system", "content": self.system_prompt}, |
| 38 | {"role": "user", "content": user_message} |
| 39 | ] |
| 40 | |
| 41 | for iteration in range(self.max_iterations): |
| 42 | # Call GPT-4 |
| 43 | response = self.client.chat.completions.create( |
| 44 | model=self.model, |
| 45 | messages=messages, |
| 46 | tools=TOOLS, |
| 47 | tool_choice="auto" |
| 48 | ) |
| 49 | |
| 50 | assistant_message = response.choices[0].message |
| 51 | messages.append(assistant_message) |
| 52 | |
| 53 | # Check if GPT-4 wants to execute code |
| 54 | if assistant_message.tool_calls: |
| 55 | for tool_call in assistant_message.tool_calls: |
| 56 | if tool_call.function.name == "execute_python": |
| 57 | # Parse the code |
| 58 | args = json.loads(tool_call.function.arguments) |
| 59 | code = args["code"] |
| 60 | description = args.get("description", "Executing code") |
| 61 | |
| 62 | print(f"\n🔧 Executing: {description}") |
| 63 | print(f"```python\n{code}\n```") |
| 64 | |
| 65 | # Execute in sandbox |
| 66 | result = executor.execute(code) |
| 67 | |
| 68 | # Format result for GPT-4 |
| 69 | if result["success"]: |
| 70 | tool_result = f"✅ Code executed successfully.\n\nOutput:\n{result['stdout']}" |
| 71 | else: |
| 72 | tool_result = f"❌ Code failed with exit code {result['exit_code']}.\n\nError:\n{result['stderr']}\n\nStdout:\n{result['stdout']}" |
| 73 | |
| 74 | print(f"\n📤 Result: {tool_result[:200]}...") |
| 75 | |
| 76 | # Add tool result to messages |
| 77 | messages.append({ |
| 78 | "role": "tool", |
| 79 | "tool_call_id": tool_call.id, |
| 80 | "content": tool_result |
| 81 | }) |
| 82 | else: |
| 83 | # No tool calls - GPT-4 is done |
| 84 | return assistant_message.content |
| 85 | |
| 86 | return "Max iterations reached. Please try a simpler request." |
| 87 | |
| 88 | |
| 89 | def main(): |
| 90 | """Example usage.""" |
| 91 | agent = CodeInterpreterAgent() |
| 92 | |
| 93 | with CodeExecutor() as executor: |
| 94 | # Example 1: Simple calculation |
| 95 | print("\n" + "="*60) |
| 96 | print("Example 1: Fibonacci sequence") |
| 97 | print("="*60) |
| 98 | result = agent.run( |
| 99 | "Calculate the first 20 Fibonacci numbers and find their sum", |
| 100 | executor |
| 101 | ) |
| 102 | print(f"\n🤖 Agent: {result}") |
| 103 | |
| 104 | # Example 2: Data analysis |
| 105 | print("\n" + "="*60) |
| 106 | print("Example 2: Data analysis") |
| 107 | print("="*60) |
| 108 | result = agent.run( |
| 109 | "Create a sample dataset of 100 sales records with date, product, and amount columns. Then show me basic statistics and the top 5 products by total sales.", |
| 110 | executor |
| 111 | ) |
| 112 | print(f"\n🤖 Agent: {result}") |
| 113 | |
| 114 | |
| 115 | if __name__ == "__main__": |
| 116 | main() |
| 117 | |
Step 5: Run the Agent
| 1 | python agent.py |
| 2 | |
Example output:
| 1 | ============================================================ |
| 2 | Example 1: Fibonacci sequence |
| 3 | ============================================================ |
| 4 | |
| 5 | �� Executing: Calculate Fibonacci numbers and sum |
| 6 | ```python |
| 7 | def fibonacci(n): |
| 8 | fib = [0, 1] |
| 9 | for i in range(2, n): |
| 10 | fib.append(fib[i-1] + fib[i-2]) |
| 11 | return fib |
| 12 | |
| 13 | fibs = fibonacci(20) |
| 14 | print(f"First 20 Fibonacci numbers: {fibs}") |
| 15 | print(f"Sum: {sum(fibs)}") |
| 16 | |
📤 Result: ✅ Code executed successfully.
Output: First 20 Fibonacci numbers: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181] Sum: 10945
🤖 Agent: The first 20 Fibonacci numbers are [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181] and their sum is 10,945.
| 1 | |
| 2 | ## Step 6: Add File Upload Support |
| 3 | |
| 4 | Let users upload files for analysis: |
| 5 | |
| 6 | ```python |
| 7 | # agent_with_files.py |
| 8 | |
| 9 | def analyze_csv(file_path: str, question: str): |
| 10 | """Upload a CSV and ask questions about it.""" |
| 11 | agent = CodeInterpreterAgent() |
| 12 | |
| 13 | with CodeExecutor() as executor: |
| 14 | # Upload the file to sandbox |
| 15 | executor.upload_file(file_path, "/app/data.csv") |
| 16 | |
| 17 | # Modify the prompt to mention the file |
| 18 | prompt = f"""I've uploaded a CSV file to /app/data.csv. |
| 19 | |
| 20 | Please analyze it and answer this question: |
| 21 | {question} |
| 22 | |
| 23 | Start by loading the file and showing its structure.""" |
| 24 | |
| 25 | result = agent.run(prompt, executor) |
| 26 | return result |
| 27 | |
| 28 | |
| 29 | # Usage |
| 30 | result = analyze_csv( |
| 31 | "sales_data.csv", |
| 32 | "What were the top 3 performing months?" |
| 33 | ) |
| 34 | print(result) |
| 35 | |
Step 7: Handle Visualizations
For charts and plots, save to a file and download:
| 1 | # visualization.py |
| 2 | |
| 3 | def create_visualization(data_description: str, chart_request: str): |
| 4 | """Create a visualization and return the image.""" |
| 5 | agent = CodeInterpreterAgent() |
| 6 | |
| 7 | prompt = f"""Create the following visualization: |
| 8 | {chart_request} |
| 9 | |
| 10 | Data: {data_description} |
| 11 | |
| 12 | Save the chart to /app/chart.png using plt.savefig('/app/chart.png', dpi=150, bbox_inches='tight')""" |
| 13 | |
| 14 | with CodeExecutor() as executor: |
| 15 | result = agent.run(prompt, executor) |
| 16 | |
| 17 | # Download the generated chart |
| 18 | try: |
| 19 | image_data = executor.download_file("/app/chart.png") |
| 20 | |
| 21 | # Save locally |
| 22 | with open("output_chart.png", "wb") as f: |
| 23 | f.write(image_data) |
| 24 | |
| 25 | print("Chart saved to output_chart.png") |
| 26 | except Exception as e: |
| 27 | print(f"Could not download chart: {e}") |
| 28 | |
| 29 | return result |
| 30 | |
| 31 | |
| 32 | # Usage |
| 33 | create_visualization( |
| 34 | "Monthly sales data for 2024", |
| 35 | "A bar chart showing sales by month with a trend line" |
| 36 | ) |
| 37 | |
Step 8: Error Recovery
The agent should handle errors and retry. Here's an enhanced version:
| 1 | # robust_agent.py |
| 2 | |
| 3 | class RobustCodeInterpreterAgent(CodeInterpreterAgent): |
| 4 | def __init__(self, *args, **kwargs): |
| 5 | super().__init__(*args, **kwargs) |
| 6 | |
| 7 | # Enhanced system prompt with error handling guidance |
| 8 | self.system_prompt += """ |
| 9 | |
| 10 | Error Handling: |
| 11 | - If code fails, analyze the error carefully |
| 12 | - Fix the issue and try again |
| 13 | - Common fixes: import missing libraries, fix syntax errors, handle edge cases |
| 14 | - After 2 failed attempts at the same approach, try a different method""" |
| 15 | |
| 16 | def run(self, user_message: str, executor: CodeExecutor) -> str: |
| 17 | """Run with enhanced error tracking.""" |
| 18 | messages = [ |
| 19 | {"role": "system", "content": self.system_prompt}, |
| 20 | {"role": "user", "content": user_message} |
| 21 | ] |
| 22 | |
| 23 | error_count = 0 |
| 24 | last_error = None |
| 25 | |
| 26 | for iteration in range(self.max_iterations): |
| 27 | response = self.client.chat.completions.create( |
| 28 | model=self.model, |
| 29 | messages=messages, |
| 30 | tools=TOOLS, |
| 31 | tool_choice="auto" |
| 32 | ) |
| 33 | |
| 34 | assistant_message = response.choices[0].message |
| 35 | messages.append(assistant_message) |
| 36 | |
| 37 | if assistant_message.tool_calls: |
| 38 | for tool_call in assistant_message.tool_calls: |
| 39 | if tool_call.function.name == "execute_python": |
| 40 | args = json.loads(tool_call.function.arguments) |
| 41 | code = args["code"] |
| 42 | |
| 43 | result = executor.execute(code) |
| 44 | |
| 45 | if result["success"]: |
| 46 | error_count = 0 # Reset on success |
| 47 | tool_result = f"✅ Success:\n{result['stdout']}" |
| 48 | else: |
| 49 | error_count += 1 |
| 50 | last_error = result["stderr"] |
| 51 | |
| 52 | if error_count >= 3: |
| 53 | tool_result = f"""❌ Failed ({error_count} attempts). |
| 54 | |
| 55 | Error: {result['stderr']} |
| 56 | |
| 57 | ⚠️ You've had multiple failures. Please try a completely different approach or simplify the solution.""" |
| 58 | else: |
| 59 | tool_result = f"""❌ Failed (attempt {error_count}): |
| 60 | {result['stderr']} |
| 61 | |
| 62 | Please fix the error and try again.""" |
| 63 | |
| 64 | messages.append({ |
| 65 | "role": "tool", |
| 66 | "tool_call_id": tool_call.id, |
| 67 | "content": tool_result |
| 68 | }) |
| 69 | else: |
| 70 | return assistant_message.content |
| 71 | |
| 72 | return f"Could not complete the task after {self.max_iterations} iterations. Last error: {last_error}" |
| 73 | |
Complete Example: Data Analysis Agent
Here's a full working example:
| 1 | # complete_agent.py |
| 2 | |
| 3 | from openai import OpenAI |
| 4 | from hopx_ai import Sandbox |
| 5 | import json |
| 6 | |
| 7 | # Tool definition |
| 8 | TOOLS = [{ |
| 9 | "type": "function", |
| 10 | "function": { |
| 11 | "name": "execute_python", |
| 12 | "description": "Execute Python code in a secure sandbox with pandas, numpy, matplotlib", |
| 13 | "parameters": { |
| 14 | "type": "object", |
| 15 | "properties": { |
| 16 | "code": {"type": "string", "description": "Python code to execute"} |
| 17 | }, |
| 18 | "required": ["code"] |
| 19 | } |
| 20 | } |
| 21 | }] |
| 22 | |
| 23 | def run_code_interpreter(user_request: str) -> str: |
| 24 | """Complete code interpreter agent.""" |
| 25 | client = OpenAI() |
| 26 | |
| 27 | messages = [ |
| 28 | {"role": "system", "content": "You are a Python coding assistant. Execute code to answer questions."}, |
| 29 | {"role": "user", "content": user_request} |
| 30 | ] |
| 31 | |
| 32 | with Sandbox.create(template="code-interpreter") as sandbox: |
| 33 | for _ in range(5): # Max iterations |
| 34 | response = client.chat.completions.create( |
| 35 | model="gpt-4-turbo-preview", |
| 36 | messages=messages, |
| 37 | tools=TOOLS |
| 38 | ) |
| 39 | |
| 40 | msg = response.choices[0].message |
| 41 | messages.append(msg) |
| 42 | |
| 43 | if not msg.tool_calls: |
| 44 | return msg.content |
| 45 | |
| 46 | for tool_call in msg.tool_calls: |
| 47 | code = json.loads(tool_call.function.arguments)["code"] |
| 48 | result = sandbox.run_code(code) |
| 49 | |
| 50 | output = result.stdout if result.exit_code == 0 else f"Error: {result.stderr}" |
| 51 | messages.append({ |
| 52 | "role": "tool", |
| 53 | "tool_call_id": tool_call.id, |
| 54 | "content": output |
| 55 | }) |
| 56 | |
| 57 | return "Could not complete request." |
| 58 | |
| 59 | |
| 60 | # Example usage |
| 61 | if __name__ == "__main__": |
| 62 | result = run_code_interpreter( |
| 63 | "Generate 1000 random numbers, calculate their mean and standard deviation, " |
| 64 | "and create a histogram showing their distribution." |
| 65 | ) |
| 66 | print(result) |
| 67 | |
Common Questions
Can I use Claude or other LLMs instead of GPT-4?
Yes! Any LLM with function/tool calling works. Anthropic Claude example:
| 1 | from anthropic import Anthropic |
| 2 | |
| 3 | client = Anthropic() |
| 4 | response = client.messages.create( |
| 5 | model="claude-3-opus-20240229", |
| 6 | tools=[...], # Same tool schema |
| 7 | messages=[...] |
| 8 | ) |
| 9 | |
How do I handle long-running code?
For code that takes more than 60 seconds, use background execution:
| 1 | # Start long-running task |
| 2 | process_id = sandbox.run_code_background(long_running_code) |
| 3 | |
| 4 | # Check status periodically |
| 5 | while True: |
| 6 | status = sandbox.get_process_status(process_id) |
| 7 | if status.completed: |
| 8 | break |
| 9 | time.sleep(5) |
| 10 | |
Can I keep the sandbox between requests?
Yes, to reuse state (installed packages, created files):
| 1 | # Create once |
| 2 | sandbox = Sandbox.create(template="code-interpreter") |
| 3 | |
| 4 | # Use for multiple requests |
| 5 | sandbox.run_code("pip install transformers") |
| 6 | sandbox.run_code("from transformers import pipeline; ...") |
| 7 | sandbox.run_code("# Uses same environment...") |
| 8 | |
| 9 | # Clean up when done |
| 10 | sandbox.kill() |
| 11 | |
How do I restrict what code can do?
HopX sandboxes are already isolated. For additional restrictions:
- Use network policies to limit outbound connections
- Set resource limits (CPU, memory, disk)
- Use custom templates with minimal packages
Conclusion
You've built a Code Interpreter agent that:
- ✅ Uses GPT-4 for code generation
- ✅ Executes code in isolated HopX sandboxes
- ✅ Handles errors and iterates
- ✅ Supports file upload/download
- ✅ Creates visualizations
This architecture is production-ready. The sandbox isolation means even malicious or buggy LLM-generated code can't harm your infrastructure.
Ready to build your own? Sign up for HopX and get $200 in free credits.