Run Any LLM with Ollama in Secure Sandboxes

Q: Can I use any Ollama model?

Yes. Any model in the [Ollama library](https://ollama.com/library) works—Llama 3.3, Mixtral, CodeLlama, Phi-3, DeepSeek, and more. Just change the OLLAMA_MODEL variable in your template and rebuild.

Want to run LLMs like Llama 3.3, Mixtral, or CodeLlama without sending data to third-party APIs? This guide shows you how to deploy Ollama in isolated HopX sandboxes—giving you the privacy of self-hosting with the simplicity of a managed service.

What you'll learn:

Deploy any Ollama model with ~100ms cold starts
Save up to 78% compared to pay-per-token APIs
Keep sensitive data in hardware-isolated environments
Scale from 1 to 1,000 sandboxes with the same code

Why Traditional LLM Deployment Costs You More Than Money

Running AI models in production introduces three critical problems:

Problem 1: Security Risks You Can't Afford

Container-based deployments share a host kernel. One escape path compromises your entire infrastructure. If your application handles sensitive data—medical records, financial transactions, or proprietary code—this shared-kernel architecture creates unacceptable risk.

Problem 2: Cold Starts Kill User Experience

Traditional containers take 10+ seconds to start. Every request waits while resources spin up. Your users see loading screens. Your AI agents sit idle. Productivity drops while infrastructure catches up.

Problem 3: Unpredictable Costs Drain Budgets

Cloud API pricing varies with token count. One complex query can cost 10x more than expected. Monthly bills fluctuate wildly. You can't forecast spending or optimize costs when pricing depends on factors outside your control.

The Solution: Ollama on Isolated Micro-VMs

Combining Ollama with HopX sandboxes solves these problems through a different architecture. Each model runs in its own micro-VM with dedicated kernel, file system, and network stack.

What changes:

Security: Each sandbox has its own kernel. No shared resources.
Speed: Sandboxes start in ~100ms from pre-built snapshots.
Cost: Pay per second of actual compute usage. Pause when idle.
Privacy: Your data never leaves your infrastructure.

Real Performance Numbers

Metric	Traditional Containers	HopX Micro-VMs
Cold Start Time	10-15 seconds	~100 milliseconds
Kernel Isolation	Shared kernel	Dedicated kernel per VM
Runtime Limits	15 minutes (typical)	Hours to days
Startup Cost	Fixed per invocation	$0.000014/vCPU-second
Data Residency	Provider-dependent	Your choice of region

For high-volume workloads, self-managed micro-VMs can reduce costs by up to 78% compared to pay-per-token APIs.

Prerequisites

Before starting, you need:

A HopX account (sign up here for $200 in free credits)
Python 3.11+
Your HOPX_API_KEY from the dashboard

Set up your environment:

Sign up at console.hopx.ai
Get your API key from the dashboard
Set the environment variable:

bash

1	export HOPX_API_KEY="your-api-key-here"
2

Step 1: Install Dependencies

bash

1	pip install hopx-ai python-dotenv asyncio
2

Step 2: Configure Environment

python

import os
import time
import asyncio
from dotenv import load_dotenv
 
# Load environment variables
load_dotenv()
 
# Verify API key is set
api_key = os.getenv("HOPX_API_KEY")
if not api_key:
    print("⚠️  HOPX_API_KEY not found in environment")
    print("Please set it: export HOPX_API_KEY=your-key-here")
else:
    print("✓ API key configured")
 

Step 3: Create Ollama Template

Templates define your sandbox environment. This template:

Starts with Python 3.13 base image
Installs Ollama
Pre-downloads your chosen model
Configures the environment for production use

python

from hopx_ai import Template
from hopx_ai.template.types import BuildOptions, BuildResult
 
# Configuration
OLLAMA_MODEL = "llama3.3"  # Change this to your preferred model
TEMPLATE_NAME = f"ollama-production-{int(time.time())}"
 
def create_ollama_template() -> Template:
    """Create a production-ready Ollama template."""
    return (
        Template()
        .from_python_image("3.13")
        .run_cmd("mkdir -p /workspace")
        .set_env("LANG", "en_US.UTF-8")
        .set_env("PYTHONUNBUFFERED", "1")
        .set_env("HOME", "/workspace")
        .run_cmd("curl -fsSL https://ollama.com/install.sh | sh")
        .run_cmd(f"/usr/local/bin/ollama pull {OLLAMA_MODEL}")
        .set_workdir("/workspace")
    )
 
def create_build_options(api_key: str) -> BuildOptions:
    """Configure build options for the template."""
    return BuildOptions(
        name=TEMPLATE_NAME,
        api_key=api_key,
        cpu=2,
        memory=2048,  # MB
        disk_gb=20,
        on_log=lambda log: print(f"[{log.get('level')}] {log.get('message')}"),
        on_progress=lambda p: print(f"Build progress: {p}%"),
    )
 
async def build_template() -> BuildResult:
    """Build the Ollama template."""
    template = create_ollama_template()
    options = create_build_options(os.getenv("HOPX_API_KEY"))
    print(f"Building template: {TEMPLATE_NAME}")
    return await Template.build(template, options)
 
print("✓ Template configuration ready")
 

Step 4: Build and Deploy Your First Sandbox

This step builds the template and creates a sandbox. Note: Building takes ~2 minutes the first time.

python

from hopx_ai import Sandbox
 
async def deploy_ollama_sandbox():
    """Deploy an Ollama sandbox."""
    # Build the template (do this once)
    print("Building template... (this takes ~2 minutes)")
    result = await build_template()
    print(f"✓ Template ready: {result.template_id}")
 
    # Create sandbox from template
    print("Creating sandbox...")
    sandbox = Sandbox.create(
        template=TEMPLATE_NAME,
        api_key=os.getenv("HOPX_API_KEY")
    )
    print(f"✓ Sandbox created: {sandbox.sandbox_id}")
 
    # Test with a simple prompt
    print("\nTesting model...")
    response = sandbox.commands.run(
        f"/usr/local/bin/ollama run {OLLAMA_MODEL} 'Explain quantum computing in one sentence'",
        timeout=240
    )
 
    print(f"\nModel response:\n{response.stdout}")
 
    return sandbox
 
# Run the deployment
sandbox = await deploy_ollama_sandbox()
 

Step 5: Persist and Reconnect to Sandboxes

Creating new sandboxes every time wastes resources. Save the sandbox ID and reconnect:

python

async def get_or_create_sandbox() -> Sandbox:
    """Get existing sandbox or create new one."""
    sandbox_file = ".hopx_sandbox_id"
    
    if os.path.exists(sandbox_file):
        with open(sandbox_file, "r") as f:
            sandbox_id = f.read().strip()
 
        try:
            sandbox = Sandbox.connect(
                sandbox_id,
                api_key=os.getenv("HOPX_API_KEY")
            )
            print(f"✓ Reconnected to sandbox: {sandbox_id}")
            return sandbox
        except Exception as e:
            print(f"Could not reconnect: {e}")
            print("Creating new sandbox...")
    
    # Build and create new sandbox
    template_result = await build_template()
    sandbox = Sandbox.create(
        template=TEMPLATE_NAME,
        api_key=os.getenv("HOPX_API_KEY")
    )
 
    with open(sandbox_file, "w") as f:
        f.write(sandbox.sandbox_id)
    print(f"✓ Created new sandbox: {sandbox.sandbox_id}")
 
    return sandbox
 

Choose the Right Ollama Model

For Speed and Efficiency

smollm (135M-1.7B): Minimal resources, great for testing
phi-3 (3.8B): Fast inference, good for classification
qwen2 (7B): Strong multilingual support

For Quality and Reasoning

llama3.3 (70B): Advanced reasoning and coding
mixtral (47B): Mixture-of-experts for specialized tasks
deepseek-r1 (70B): Advanced reasoning and problem-solving

For Code Generation

codellama (7B-34B): Optimized for programming
codegemma (7B): Google's code-focused model

Resource Requirements

2 vCPU, 2GB RAM: Models up to 3B parameters
4 vCPU, 8GB RAM: Models up to 13B parameters
8 vCPU, 16GB RAM: Models up to 70B parameters

Cost Calculator: What You Actually Pay

HopX charges per second:

Compute: $0.000014 per vCPU-second
Memory: $0.0000045 per GiB-second
Storage: $0.00000003 per GiB-second

python

def calculate_cost(vcpu: int, memory_gb: int, storage_gb: int, hours: float) -> dict:
    """Calculate HopX sandbox costs."""
    seconds = hours * 3600
    
    compute_cost = vcpu * seconds * 0.000014
    memory_cost = memory_gb * seconds * 0.0000045
    storage_cost = storage_gb * seconds * 0.00000003
    
    total = compute_cost + memory_cost + storage_cost
    
    return {
        "compute": round(compute_cost, 4),
        "memory": round(memory_cost, 4),
        "storage": round(storage_cost, 4),
        "total": round(total, 4),
        "daily": round(total, 4),
        "monthly": round(total * 30, 2)
    }
 
# Example 1: Development Testing
print("Example 1: Development Testing (7B model, 30 min/day)")
dev_cost = calculate_cost(vcpu=2, memory_gb=4, storage_gb=20, hours=0.5)
print(f"  Daily cost: ${dev_cost['daily']}")
print(f"  Monthly cost: ${dev_cost['monthly']}")
 
# Example 2: Production AI Agent
print("Example 2: Production AI Agent (13B model, 8 hours/day)")
prod_cost = calculate_cost(vcpu=4, memory_gb=8, storage_gb=30, hours=8)
print(f"  Daily cost: ${prod_cost['daily']}")
print(f"  Monthly cost: ${prod_cost['monthly']}")
 
# Example 3: 24/7 Service
print("Example 3: High-Volume API (10 sandboxes, 24/7)")
service_cost = calculate_cost(vcpu=2, memory_gb=4, storage_gb=20, hours=24)
print(f"  Per sandbox daily: ${service_cost['daily']}")
print(f"  10 sandboxes monthly: ${service_cost['monthly'] * 10}")
 

Cost Optimization Patterns

python

# Pattern 1: Pause When Idle
# Paused sandboxes cost 10x less
 
def pause_sandbox_when_idle(sandbox: Sandbox):
    """Pause sandbox to reduce costs."""
    sandbox.pause()  # Preserves state, reduces costs
    print("Sandbox paused. Resume with sandbox.resume()")
 
# Pattern 2: Delete Completed Work
 
async def run_and_cleanup(sandbox: Sandbox, task: str):
    """Run task and clean up."""
    try:
        result = sandbox.commands.run(task)
        return result
    finally:
        sandbox.delete()  # Stop all charges
 
# Pattern 3: Choose Model by Complexity
 
def choose_model(complexity_score: float) -> str:
    """Choose model based on task complexity."""
    if complexity_score < 0.5:
        return "phi-3"  # Fast, cheap
    elif complexity_score < 0.8:
        return "llama3.3"  # Balanced
    else:
        return "mixtral"  # Heavy reasoning
 
# Pattern 4: Batch Requests
 
async def batch_process(sandbox: Sandbox, prompts: list[str], model: str):
    """Process multiple prompts in one session."""
    results = []
    for prompt in prompts:
        result = sandbox.commands.run(f"ollama run {model} '{prompt}'")
        results.append(result.stdout)
    # Delete sandbox after batch completes
    sandbox.delete()
    return results
 

Security Best Practices

Why Isolation Matters

Each HopX sandbox has:

Dedicated kernel: No shared kernel vulnerabilities
Isolated file system: No cross-sandbox file access
Separate network stack: Network policies per sandbox
Process tree isolation: Processes can't see other sandboxes

This matters for:

Healthcare: HIPAA-compliant patient data
Finance: PCI DSS requirements
Legal: Privileged document analysis
Enterprise: Proprietary code and trade secrets

python

# Handle Secrets Securely
 
def create_secure_sandbox():
    """Create sandbox with secure environment variables."""
    sandbox = Sandbox.create(
        template=TEMPLATE_NAME,
        api_key=os.getenv("HOPX_API_KEY"),
        env_vars={
            "DATABASE_URL": os.getenv("DATABASE_URL"),
            "API_SECRET": os.getenv("API_SECRET")
        }
    )
    return sandbox
 
# Choose Data Region
 
def create_regional_sandbox(region: str = "us-east"):
    """Create sandbox in specific region."""
    sandbox = Sandbox.create(
        template=TEMPLATE_NAME,
        api_key=os.getenv("HOPX_API_KEY"),
        region=region  # "us-east" or "eu-west"
    )
    return sandbox
 

Production Pattern: Long-Running AI Agent

python

async def run_ai_agent():
    """Run a long-running AI agent."""
    sandbox = await get_or_create_sandbox()
 
    # Agent runs continuously
    while True:
        # Get next task (implement your task queue here)
        task = get_next_task()  # Your implementation
 
        result = sandbox.commands.run(
            f"ollama run llama3.3 '{task.prompt}'",
            timeout=300
        )
 
        process_result(result.stdout)  # Your implementation
 
        # Check if we should continue
        if should_stop():  # Your implementation
            break
 
    # Pause instead of delete to preserve state
    sandbox.pause()
 

Production Pattern: Multi-Tenant Application

python

tenant_sandboxes = {}
 
def get_tenant_sandbox(tenant_id: str) -> Sandbox:
    """Get or create isolated sandbox for tenant."""
    if tenant_id not in tenant_sandboxes:
        sandbox = Sandbox.create(
            template=TEMPLATE_NAME,
            api_key=os.getenv("HOPX_API_KEY")
        )
        tenant_sandboxes[tenant_id] = sandbox
 
    return tenant_sandboxes[tenant_id]
 
# Example usage
tenant_a_sandbox = get_tenant_sandbox("tenant-a")
tenant_b_sandbox = get_tenant_sandbox("tenant-b")
 

Use Case: Private Document Analysis

python

async def analyze_documents(documents: list[str]) -> list[dict]:
    """Analyze sensitive documents privately."""
    sandbox = await get_or_create_sandbox()
 
    results = []
    for doc in documents:
        # Upload document to sandbox
        sandbox.files.write("/workspace/document.txt", doc)
 
        # Analyze with Ollama
        response = sandbox.commands.run(
            "ollama run llama3.3 'Summarize /workspace/document.txt'",
            timeout=180
        )
 
        results.append({
            "summary": response.stdout,
            "document": doc[:100]  # First 100 chars for reference
        })
 
    return results
 

Use Case: Code Generation and Testing

python

async def generate_and_test_code(specification: str):
    """Generate and test code in isolated environment."""
    sandbox = await get_or_create_sandbox()
 
    # Generate code
    code_response = sandbox.commands.run(
        f"ollama run codellama 'Write Python function: {specification}'",
        timeout=120
    )
 
    generated_code = code_response.stdout
 
    # Write to file
    sandbox.files.write("/workspace/generated.py", generated_code)
 
    # Test the code
    test_result = sandbox.commands.run(
        "python /workspace/generated.py",
        timeout=30
    )
 
    return {
        "code": generated_code,
        "test_output": test_result.stdout,
        "success": test_result.exit_code == 0
    }
 

Monitoring and Debugging

python

def monitor_sandbox(sandbox: Sandbox):
    """Monitor sandbox resource usage."""
    info = sandbox.get_info()
 
    print(f"Status: {info.status}")
    print(f"CPU cores: {info.cpu}")
    print(f"Memory: {info.memory}MB")
    print(f"Disk: {info.disk_gb}GB")
    print(f"Region: {info.region}")
    print(f"Created: {info.created_at}")
 
# Error handling with retry
 
async def run_with_retry(
    sandbox: Sandbox,
    command: str,
    max_retries: int = 3
) -> str:
    """Run command with exponential backoff retry."""
    for attempt in range(max_retries):
        try:
            result = sandbox.commands.run(command, timeout=120)
            return result.stdout
        except TimeoutError:
            if attempt == max_retries - 1:
                raise
            print(f"Timeout on attempt {attempt + 1}, retrying...")
            await asyncio.sleep(2 ** attempt)  # Exponential backoff
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            print(f"Error on attempt {attempt + 1}: {e}, retrying...")
            await asyncio.sleep(2 ** attempt)
 

Troubleshooting Guide

Issue 1: Model Not Found

Problem: Error: model 'model-name' not found

Solution: Pull the model in your template:

python

1	.run_cmd("/usr/local/bin/ollama pull your-model-name")
2

Issue 2: Out of Memory

Problem: Sandbox crashes with memory errors

Solution: Increase memory in BuildOptions:

python

1	BuildOptions(memory=8192) # Instead of 2048
2

Issue 3: Slow Response Times

Problem: Models take too long to respond

Solution: Use smaller models or increase CPU:

python

1	OLLAMA_MODEL = "phi-3" # Faster model
2	BuildOptions(cpu=4) # More CPU
3

Issue 4: Connection Timeouts

Problem: SDK times out connecting

Solution: Increase timeout:

python

1	sandbox.commands.run(command, timeout=300)
2

Quick Start Checklist

Step 1: Sign Up

Visit console.hopx.ai
Create account (no credit card required)
Claim $200 in free credits
Copy your API key from dashboard

Step 2: Setup

bash

1	pip install hopx-ai
2	export HOPX_API_KEY=your-key-here
3

Step 3: Deploy

Build template (~2 minutes, one time)
Create sandbox (~100ms)
Run any Ollama model

What you get:

Metric	Value
Build time	~2 minutes (once)
Cold start	~100ms
Runtime limit	None
Cost	~$0.10/hour for 7B model

Comparing Your Options

Approach	Cold Start	Isolation	Cost Model	Best For
Cloud APIs (OpenAI, Anthropic)	Instant	Provider-managed	Per-token	Low volume, varied tasks
Self-Hosted VMs	Minutes	Strong	Fixed monthly	Predictable high volume
Containers (Docker)	10+ seconds	Shared kernel	Fixed or per-second	Development only
HopX + Ollama	~100ms	Hardware-level	Per-second usage	Variable volume, privacy needs

Choose HopX + Ollama when you need:

Fast cold starts for user-facing applications
Strong isolation for sensitive data
Cost control through per-second billing
Freedom to switch models without vendor lock-in
Data privacy and regulatory compliance

Next Steps

You now have everything needed to run production LLMs in secure sandboxes.

What you learned:

Privacy: Data stays in environments you control
Speed: 100ms startup beats any container solution
Cost: Pay only for seconds of actual usage
Security: Hardware-level isolation protects sensitive workloads
Flexibility: Run any model, any size, any configuration

Get started:

Use the $200 free credits
Build your first template
Test a few models
See how the economics work for your use case

When ready to scale, the same code works for 10 sandboxes or 1,000.

Frequently Asked Questions

Can I use any Ollama model?

Yes. Any model in the Ollama library works—Llama 3.3, Mixtral, CodeLlama, Phi-3, DeepSeek, and more. Just change the OLLAMA_MODEL variable in your template and rebuild.

How much does it cost to run a model 24/7?

For a 7B model (2 vCPU, 4GB RAM, 20GB disk), expect around $2-3 per day. Larger models like 70B need more resources and cost proportionally more. Use sandbox.pause() when idle to reduce costs by 90%.

Is my data really private?

Yes. Each sandbox runs in its own micro-VM with dedicated kernel, filesystem, and network. Your prompts and outputs never leave the sandbox. You can also choose specific regions (US, EU) for data residency compliance.

How long can a sandbox run?

As long as you need—hours, days, or weeks. There are no 15-minute timeouts like AWS Lambda. You pay per second of runtime and can pause/resume to save costs.

Can I run multiple models in one sandbox?

Yes. Pull multiple models in your template, then switch between them at runtime with ollama run model-name. This is useful for routing simple queries to smaller models and complex ones to larger models.

What if my model is too slow?

Three options: (1) Use a smaller, faster model like Phi-3, (2) Increase vCPU count in BuildOptions, (3) Use quantized versions of models (q4 instead of full precision).

Resources

Getting Started

About HopX

Ready to run your own LLMs? Sign up for HopX and get $200 in free credits to start.

1	import os
2	import time
3	import asyncio
4	from dotenv import load_dotenv
5
6	# Load environment variables
7	load_dotenv()
8
9	# Verify API key is set
10	api_key = os.getenv("HOPX_API_KEY")
11	if not api_key:
12	print("⚠️ HOPX_API_KEY not found in environment")
13	print("Please set it: export HOPX_API_KEY=your-key-here")
14	else:
15	print("✓ API key configured")
16

1	from hopx_ai import Template
2	from hopx_ai.template.types import BuildOptions, BuildResult
3
4	# Configuration
5	OLLAMA_MODEL = "llama3.3" # Change this to your preferred model
6	TEMPLATE_NAME = f"ollama-production-{int(time.time())}"
7
8	def create_ollama_template() -> Template:
9	"""Create a production-ready Ollama template."""
10	return (
11	Template()
12	.from_python_image("3.13")
13	.run_cmd("mkdir -p /workspace")
14	.set_env("LANG", "en_US.UTF-8")
15	.set_env("PYTHONUNBUFFERED", "1")
16	.set_env("HOME", "/workspace")
17	.run_cmd("curl -fsSL https://ollama.com/install.sh \| sh")
18	.run_cmd(f"/usr/local/bin/ollama pull {OLLAMA_MODEL}")
19	.set_workdir("/workspace")
20	)
21
22	def create_build_options(api_key: str) -> BuildOptions:
23	"""Configure build options for the template."""
24	return BuildOptions(
25	name=TEMPLATE_NAME,
26	api_key=api_key,
27	cpu=2,
28	memory=2048, # MB
29	disk_gb=20,
30	on_log=lambda log: print(f"[{log.get('level')}] {log.get('message')}"),
31	on_progress=lambda p: print(f"Build progress: {p}%"),
32	)
33
34	async def build_template() -> BuildResult:
35	"""Build the Ollama template."""
36	template = create_ollama_template()
37	options = create_build_options(os.getenv("HOPX_API_KEY"))
38	print(f"Building template: {TEMPLATE_NAME}")
39	return await Template.build(template, options)
40
41	print("✓ Template configuration ready")
42

1	from hopx_ai import Sandbox
2
3	async def deploy_ollama_sandbox():
4	"""Deploy an Ollama sandbox."""
5	# Build the template (do this once)
6	print("Building template... (this takes ~2 minutes)")
7	result = await build_template()
8	print(f"✓ Template ready: {result.template_id}")
9
10	# Create sandbox from template
11	print("Creating sandbox...")
12	sandbox = Sandbox.create(
13	template=TEMPLATE_NAME,
14	api_key=os.getenv("HOPX_API_KEY")
15	)
16	print(f"✓ Sandbox created: {sandbox.sandbox_id}")
17
18	# Test with a simple prompt
19	print("\nTesting model...")
20	response = sandbox.commands.run(
21	f"/usr/local/bin/ollama run {OLLAMA_MODEL} 'Explain quantum computing in one sentence'",
22	timeout=240
23	)
24
25	print(f"\nModel response:\n{response.stdout}")
26
27	return sandbox
28
29	# Run the deployment
30	sandbox = await deploy_ollama_sandbox()
31

1	async def get_or_create_sandbox() -> Sandbox:
2	"""Get existing sandbox or create new one."""
3	sandbox_file = ".hopx_sandbox_id"
4
5	if os.path.exists(sandbox_file):
6	with open(sandbox_file, "r") as f:
7	sandbox_id = f.read().strip()
8
9	try:
10	sandbox = Sandbox.connect(
11	sandbox_id,
12	api_key=os.getenv("HOPX_API_KEY")
13	)
14	print(f"✓ Reconnected to sandbox: {sandbox_id}")
15	return sandbox
16	except Exception as e:
17	print(f"Could not reconnect: {e}")
18	print("Creating new sandbox...")
19
20	# Build and create new sandbox
21	template_result = await build_template()
22	sandbox = Sandbox.create(
23	template=TEMPLATE_NAME,
24	api_key=os.getenv("HOPX_API_KEY")
25	)
26
27	with open(sandbox_file, "w") as f:
28	f.write(sandbox.sandbox_id)
29	print(f"✓ Created new sandbox: {sandbox.sandbox_id}")
30
31	return sandbox
32

1	def calculate_cost(vcpu: int, memory_gb: int, storage_gb: int, hours: float) -> dict:
2	"""Calculate HopX sandbox costs."""
3	seconds = hours * 3600
4
5	compute_cost = vcpu * seconds * 0.000014
6	memory_cost = memory_gb * seconds * 0.0000045
7	storage_cost = storage_gb * seconds * 0.00000003
8
9	total = compute_cost + memory_cost + storage_cost
10
11	return {
12	"compute": round(compute_cost, 4),
13	"memory": round(memory_cost, 4),
14	"storage": round(storage_cost, 4),
15	"total": round(total, 4),
16	"daily": round(total, 4),
17	"monthly": round(total * 30, 2)
18	}
19
20	# Example 1: Development Testing
21	print("Example 1: Development Testing (7B model, 30 min/day)")
22	dev_cost = calculate_cost(vcpu=2, memory_gb=4, storage_gb=20, hours=0.5)
23	print(f" Daily cost: ${dev_cost['daily']}")
24	print(f" Monthly cost: ${dev_cost['monthly']}")
25
26	# Example 2: Production AI Agent
27	print("Example 2: Production AI Agent (13B model, 8 hours/day)")
28	prod_cost = calculate_cost(vcpu=4, memory_gb=8, storage_gb=30, hours=8)
29	print(f" Daily cost: ${prod_cost['daily']}")
30	print(f" Monthly cost: ${prod_cost['monthly']}")
31
32	# Example 3: 24/7 Service
33	print("Example 3: High-Volume API (10 sandboxes, 24/7)")
34	service_cost = calculate_cost(vcpu=2, memory_gb=4, storage_gb=20, hours=24)
35	print(f" Per sandbox daily: ${service_cost['daily']}")
36	print(f" 10 sandboxes monthly: ${service_cost['monthly'] * 10}")
37

1	# Pattern 1: Pause When Idle
2	# Paused sandboxes cost 10x less
3
4	def pause_sandbox_when_idle(sandbox: Sandbox):
5	"""Pause sandbox to reduce costs."""
6	sandbox.pause() # Preserves state, reduces costs
7	print("Sandbox paused. Resume with sandbox.resume()")
8
9	# Pattern 2: Delete Completed Work
10
11	async def run_and_cleanup(sandbox: Sandbox, task: str):
12	"""Run task and clean up."""
13	try:
14	result = sandbox.commands.run(task)
15	return result
16	finally:
17	sandbox.delete() # Stop all charges
18
19	# Pattern 3: Choose Model by Complexity
20
21	def choose_model(complexity_score: float) -> str:
22	"""Choose model based on task complexity."""
23	if complexity_score < 0.5:
24	return "phi-3" # Fast, cheap
25	elif complexity_score < 0.8:
26	return "llama3.3" # Balanced
27	else:
28	return "mixtral" # Heavy reasoning
29
30	# Pattern 4: Batch Requests
31
32	async def batch_process(sandbox: Sandbox, prompts: list[str], model: str):
33	"""Process multiple prompts in one session."""
34	results = []
35	for prompt in prompts:
36	result = sandbox.commands.run(f"ollama run {model} '{prompt}'")
37	results.append(result.stdout)
38	# Delete sandbox after batch completes
39	sandbox.delete()
40	return results
41

1	# Handle Secrets Securely
2
3	def create_secure_sandbox():
4	"""Create sandbox with secure environment variables."""
5	sandbox = Sandbox.create(
6	template=TEMPLATE_NAME,
7	api_key=os.getenv("HOPX_API_KEY"),
8	env_vars={
9	"DATABASE_URL": os.getenv("DATABASE_URL"),
10	"API_SECRET": os.getenv("API_SECRET")
11	}
12	)
13	return sandbox
14
15	# Choose Data Region
16
17	def create_regional_sandbox(region: str = "us-east"):
18	"""Create sandbox in specific region."""
19	sandbox = Sandbox.create(
20	template=TEMPLATE_NAME,
21	api_key=os.getenv("HOPX_API_KEY"),
22	region=region # "us-east" or "eu-west"
23	)
24	return sandbox
25

1	async def run_ai_agent():
2	"""Run a long-running AI agent."""
3	sandbox = await get_or_create_sandbox()
4
5	# Agent runs continuously
6	while True:
7	# Get next task (implement your task queue here)
8	task = get_next_task() # Your implementation
9
10	result = sandbox.commands.run(
11	f"ollama run llama3.3 '{task.prompt}'",
12	timeout=300
13	)
14
15	process_result(result.stdout) # Your implementation
16
17	# Check if we should continue
18	if should_stop(): # Your implementation
19	break
20
21	# Pause instead of delete to preserve state
22	sandbox.pause()
23

1	tenant_sandboxes = {}
2
3	def get_tenant_sandbox(tenant_id: str) -> Sandbox:
4	"""Get or create isolated sandbox for tenant."""
5	if tenant_id not in tenant_sandboxes:
6	sandbox = Sandbox.create(
7	template=TEMPLATE_NAME,
8	api_key=os.getenv("HOPX_API_KEY")
9	)
10	tenant_sandboxes[tenant_id] = sandbox
11
12	return tenant_sandboxes[tenant_id]
13
14	# Example usage
15	tenant_a_sandbox = get_tenant_sandbox("tenant-a")
16	tenant_b_sandbox = get_tenant_sandbox("tenant-b")
17

1	async def analyze_documents(documents: list[str]) -> list[dict]:
2	"""Analyze sensitive documents privately."""
3	sandbox = await get_or_create_sandbox()
4
5	results = []
6	for doc in documents:
7	# Upload document to sandbox
8	sandbox.files.write("/workspace/document.txt", doc)
9
10	# Analyze with Ollama
11	response = sandbox.commands.run(
12	"ollama run llama3.3 'Summarize /workspace/document.txt'",
13	timeout=180
14	)
15
16	results.append({
17	"summary": response.stdout,
18	"document": doc[:100] # First 100 chars for reference
19	})
20
21	return results
22

1	async def generate_and_test_code(specification: str):
2	"""Generate and test code in isolated environment."""
3	sandbox = await get_or_create_sandbox()
4
5	# Generate code
6	code_response = sandbox.commands.run(
7	f"ollama run codellama 'Write Python function: {specification}'",
8	timeout=120
9	)
10
11	generated_code = code_response.stdout
12
13	# Write to file
14	sandbox.files.write("/workspace/generated.py", generated_code)
15
16	# Test the code
17	test_result = sandbox.commands.run(
18	"python /workspace/generated.py",
19	timeout=30
20	)
21
22	return {
23	"code": generated_code,
24	"test_output": test_result.stdout,
25	"success": test_result.exit_code == 0
26	}
27

1	def monitor_sandbox(sandbox: Sandbox):
2	"""Monitor sandbox resource usage."""
3	info = sandbox.get_info()
4
5	print(f"Status: {info.status}")
6	print(f"CPU cores: {info.cpu}")
7	print(f"Memory: {info.memory}MB")
8	print(f"Disk: {info.disk_gb}GB")
9	print(f"Region: {info.region}")
10	print(f"Created: {info.created_at}")
11
12	# Error handling with retry
13
14	async def run_with_retry(
15	sandbox: Sandbox,
16	command: str,
17	max_retries: int = 3
18	) -> str:
19	"""Run command with exponential backoff retry."""
20	for attempt in range(max_retries):
21	try:
22	result = sandbox.commands.run(command, timeout=120)
23	return result.stdout
24	except TimeoutError:
25	if attempt == max_retries - 1:
26	raise
27	print(f"Timeout on attempt {attempt + 1}, retrying...")
28	await asyncio.sleep(2 ** attempt) # Exponential backoff
29	except Exception as e:
30	if attempt == max_retries - 1:
31	raise
32	print(f"Error on attempt {attempt + 1}: {e}, retrying...")
33	await asyncio.sleep(2 ** attempt)
34

Run Any LLM with Ollama in Secure Sandboxes

Why Traditional LLM Deployment Costs You More Than Money

Problem 1: Security Risks You Can't Afford

Problem 2: Cold Starts Kill User Experience

Problem 3: Unpredictable Costs Drain Budgets

The Solution: Ollama on Isolated Micro-VMs

Real Performance Numbers

Prerequisites

Step 1: Install Dependencies

Step 2: Configure Environment

Step 3: Create Ollama Template

Step 4: Build and Deploy Your First Sandbox

Step 5: Persist and Reconnect to Sandboxes

Choose the Right Ollama Model

For Speed and Efficiency

For Quality and Reasoning

For Code Generation

Resource Requirements

Cost Calculator: What You Actually Pay

Cost Optimization Patterns

Security Best Practices

Why Isolation Matters

Production Pattern: Long-Running AI Agent

Production Pattern: Multi-Tenant Application

Use Case: Private Document Analysis

Use Case: Code Generation and Testing

Monitoring and Debugging

Troubleshooting Guide

Issue 1: Model Not Found

Issue 2: Out of Memory

Issue 3: Slow Response Times

Issue 4: Connection Timeouts

Quick Start Checklist

Comparing Your Options

Next Steps

Frequently Asked Questions

Can I use any Ollama model?

How much does it cost to run a model 24/7?

Is my data really private?

How long can a sandbox run?

Can I run multiple models in one sandbox?

What if my model is too slow?

Resources

Related articles

Microsoft Agent Framework with HopX: Secure Code Execution for AI Agents

Microsoft AutoGen with Isolated Code Execution Using HopX

CrewAI Multi-Agent Pipelines with Secure Code Execution