How HopX Achieves 100ms Cold Starts

When developers hear "virtual machine," they think of slow boot times—30 seconds to minutes. Containers are faster but still take seconds. HopX sandboxes start in under 100 milliseconds.

This isn't marketing spin. It's the result of carefully designed infrastructure that prioritizes startup latency above all else. Here's how we do it.

The Cold Start Problem

Cold start is the time between requesting a new execution environment and having it ready to run code. It's the enemy of responsive AI systems.

Traditional cold start times:

Virtual Machines: 30-60 seconds
Docker containers: 2-10 seconds
AWS Lambda: 100ms - 5s (depending on runtime)
Kubernetes pods: 5-30 seconds

For AI agents that need to spawn sandboxes dynamically, these numbers are unacceptable. An agent waiting 10 seconds to start code execution breaks the user experience.

Our Target: Sub-100ms

We set an aggressive target: sandboxes must be ready in under 100 milliseconds. That's the threshold where latency becomes imperceptible to humans.

Achieving this required rethinking every layer of the stack.

The Technology Stack

1. Firecracker Micro-VMs

At the core of HopX is Firecracker, the virtualization technology developed by AWS for Lambda and Fargate.

Why Firecracker?

Minimal VMM (Virtual Machine Monitor) - only essential devices
Boots a minimal Linux kernel in ~125ms
Memory footprint of ~5MB per VM
Full hardware virtualization (not containers)

text

1	Traditional VM: Firecracker Micro-VM:
2	┌─────────────────────┐ ┌─────────────────────┐
3	│ Guest OS │ │ Minimal Guest │
4	│ (Full kernel) │ │ (Stripped kernel) │
5	├─────────────────────┤ ├─────────────────────┤
6	│ BIOS/UEFI │ │ Minimal Boot │
7	│ Device Models │ │ (No BIOS) │
8	├─────────────────────┤ ├─────────────────────┤
9	│ QEMU/KVM │ │ Firecracker │
10	│ (Complex VMM) │ │ (Minimal VMM) │
11	└─────────────────────┘ └─────────────────────┘
12	~30 seconds ~125ms
13

2. Memory Snapshots

Booting even a minimal kernel in 125ms isn't fast enough. We use memory snapshots to eliminate boot time entirely.

How snapshots work:

Boot a sandbox to a "ready" state
Capture complete memory state (snapshot)
Store snapshot on fast storage
Restore snapshot instead of booting

text

Cold boot path:           Snapshot restore path:
BIOS → Kernel →          Restore memory pages →
Init → Services →        Resume execution
Ready (~125ms)           (~15ms)
 

3. Copy-on-Write Memory

When restoring snapshots, we don't copy all memory upfront. We use copy-on-write (CoW) semantics:

Map snapshot pages as read-only
Only copy pages when written to
Most pages are never written

This means restore time is nearly instant—we're just setting up page table mappings.

python

# Conceptual representation
class SnapshotRestore:
    def restore(self, snapshot):
        # Map pages read-only (microseconds)
        for page in snapshot.pages:
            self.map_readonly(page)
        
        # Pages only copied when written (later)
        # Most pages never copied at all
 

4. Pre-warmed Pool

For the fastest possible starts, we maintain a pool of pre-restored sandboxes:

text

1	Request → [Pre-warmed Pool] → Sandbox Ready
2	↓
3	~10ms (just hand off)
4

The pool automatically scales based on demand patterns:

More capacity during peak hours
Fewer standby sandboxes during low usage
Machine learning predicts demand spikes

5. Optimized Rootfs

Our root filesystem images are optimized for fast loading:

Traditional Linux rootfs:

Full package manager
Documentation
Multiple locales
Development headers
Size: 500MB - 2GB

HopX rootfs:

Runtime-only binaries
Single locale (C.UTF-8)
No documentation
Stripped binaries
Size: 50MB - 200MB

Smaller images mean:

Faster snapshot loading
Less memory pressure
More sandboxes per host

6. Minimal Kernel Configuration

We use a custom Linux kernel configuration optimized for our use case:

text

Disabled:
- USB support
- Sound
- Bluetooth
- Wireless
- Most filesystems (keep ext4)
- Unnecessary drivers
 
Enabled:
- virtio (fast virtual devices)
- KVM guest support
- Minimal TTY
- Network (virtio-net)
- Block devices (virtio-blk)
 

Result: Kernel boots faster, uses less memory, has smaller attack surface.

The Complete Boot Path

Here's what happens when you call Sandbox.create():

text

T+0ms:    API receives request
T+2ms:    Auth and rate limiting
T+5ms:    Select host with capacity
T+8ms:    Check pre-warmed pool
          ↓
          If available:
T+10ms:     Claim sandbox from pool
T+12ms:     Configure networking
T+15ms:     Return sandbox handle
          ↓
          If pool empty:
T+10ms:     Start snapshot restore
T+25ms:     Memory mapping complete
T+30ms:     Resume VM execution
T+35ms:     Configure networking
T+40ms:     Start user process
T+50ms:     Ready for commands
 

Total: 15-50ms depending on pool availability.

Benchmarks

We continuously measure cold start performance:

Scenario	P50	P95	P99
Pre-warmed pool	12ms	18ms	25ms
Snapshot restore	45ms	62ms	85ms
Cold boot (rare)	130ms	180ms	250ms

Compare to alternatives:

Platform	Cold Start
HopX	12-50ms
AWS Lambda (Python)	200-1000ms
Google Cloud Run	500-2000ms
Docker	2000-5000ms
Traditional VM	30000-60000ms

Optimizations We Tried (And Rejected)

Not every optimization makes sense. Here's what we tried but didn't adopt:

WASM Instead of VMs

WebAssembly sandboxes start faster (~1ms) but:

Limited to WASM-compiled code
No system calls
Can't run arbitrary Python/Node
Weaker isolation than hardware virtualization

We chose VMs for flexibility and security.

Container Pooling

Pre-creating containers seemed promising but:

Security isolation weaker than VMs
Container escape vulnerabilities exist
Shared kernel attack surface
Not suitable for untrusted code

Unikernels

Single-purpose OS images boot incredibly fast but:

Requires recompiling applications
No standard tooling
Debugging is difficult
Not practical for general use

Real-World Impact

Fast cold starts enable use cases that weren't possible before:

AI Agents

An agent can spawn sandboxes mid-conversation without noticeable delay:

python

# User asks: "Calculate the Fibonacci sequence"
# Agent decides to run code
 
sandbox = Sandbox.create(template="code-interpreter")  # <50ms
result = sandbox.commands.run("python fib.py")  # Code runs
# Total latency: imperceptible
 
# Agent responds with results
 

Interactive Development

Code execution feels instant, like running locally:

python

# Each cell execution creates fresh sandbox
for cell in notebook_cells:
    sandbox = Sandbox.create()  # Fast enough for interactive use
    output = sandbox.commands.run(cell.code)
    display(output)
    sandbox.kill()
 

Parallel Processing

Spawn hundreds of sandboxes without waiting:

python

import asyncio
from hopx import Sandbox
 
async def process_item(item):
    sandbox = await Sandbox.create_async()  # Non-blocking
    result = await sandbox.commands.run_async(f"process {item}")
    await sandbox.kill_async()
    return result
 
# Process 100 items in parallel
items = range(100)
results = await asyncio.gather(*[process_item(i) for i in items])
# All 100 sandboxes started within ~500ms total
 

Future Improvements

We're continuously working on reducing latency further:

Speculative Execution

Predict sandbox needs before requests arrive:

Analyze request patterns
Pre-warm specific templates
Geographic pre-positioning

Even Smaller Snapshots

Reduce snapshot size through:

Memory deduplication
Compression
Differential snapshots

Edge Deployment

Place sandboxes closer to users:

Edge locations worldwide
Sub-10ms network latency
Local snapshot caches

Conclusion

Achieving sub-100ms cold starts required innovation at every layer:

Firecracker micro-VMs for minimal overhead
Memory snapshots to skip boot entirely
Copy-on-write restore for instant page mapping
Pre-warmed pools for immediate availability
Optimized rootfs for smaller images
Custom kernel for faster boots

The result: sandboxes that feel instant, enabling new categories of applications that require on-demand isolated execution.

When latency drops below human perception thresholds, the technology becomes invisible. That's our goal—making sandboxes so fast you forget they're not local processes.

How HopX Achieves 100ms Cold Starts

How HopX Achieves 100ms Cold Starts

The Cold Start Problem

Our Target: Sub-100ms

The Technology Stack

1. Firecracker Micro-VMs

2. Memory Snapshots

3. Copy-on-Write Memory

4. Pre-warmed Pool

5. Optimized Rootfs

6. Minimal Kernel Configuration

The Complete Boot Path

Benchmarks

Optimizations We Tried (And Rejected)

WASM Instead of VMs

Container Pooling

Unikernels

Real-World Impact

AI Agents

Interactive Development

Parallel Processing

Future Improvements

Speculative Execution

Even Smaller Snapshots

Edge Deployment

Conclusion

Further Reading

Related articles

Multi-Agent Architectures with HopX

1	Cold boot path: Snapshot restore path:
2	BIOS → Kernel → Restore memory pages →
3	Init → Services → Resume execution
4	Ready (~125ms) (~15ms)
5

1	# Conceptual representation
2	class SnapshotRestore:
3	def restore(self, snapshot):
4	# Map pages read-only (microseconds)
5	for page in snapshot.pages:
6	self.map_readonly(page)
7
8	# Pages only copied when written (later)
9	# Most pages never copied at all
10

1	Disabled:
2	- USB support
3	- Sound
4	- Bluetooth
5	- Wireless
6	- Most filesystems (keep ext4)
7	- Unnecessary drivers
8
9	Enabled:
10	- virtio (fast virtual devices)
11	- KVM guest support
12	- Minimal TTY
13	- Network (virtio-net)
14	- Block devices (virtio-blk)
15

1	T+0ms: API receives request
2	T+2ms: Auth and rate limiting
3	T+5ms: Select host with capacity
4	T+8ms: Check pre-warmed pool
5	↓
6	If available:
7	T+10ms: Claim sandbox from pool
8	T+12ms: Configure networking
9	T+15ms: Return sandbox handle
10	↓
11	If pool empty:
12	T+10ms: Start snapshot restore
13	T+25ms: Memory mapping complete
14	T+30ms: Resume VM execution
15	T+35ms: Configure networking
16	T+40ms: Start user process
17	T+50ms: Ready for commands
18

1	# User asks: "Calculate the Fibonacci sequence"
2	# Agent decides to run code
3
4	sandbox = Sandbox.create(template="code-interpreter") # <50ms
5	result = sandbox.commands.run("python fib.py") # Code runs
6	# Total latency: imperceptible
7
8	# Agent responds with results
9

1	# Each cell execution creates fresh sandbox
2	for cell in notebook_cells:
3	sandbox = Sandbox.create() # Fast enough for interactive use
4	output = sandbox.commands.run(cell.code)
5	display(output)
6	sandbox.kill()
7

1	import asyncio
2	from hopx import Sandbox
3
4	async def process_item(item):
5	sandbox = await Sandbox.create_async() # Non-blocking
6	result = await sandbox.commands.run_async(f"process {item}")
7	await sandbox.kill_async()
8	return result
9
10	# Process 100 items in parallel
11	items = range(100)
12	results = await asyncio.gather(*[process_item(i) for i in items])
13	# All 100 sandboxes started within ~500ms total
14