How HopX Achieves 100ms Cold Starts
When developers hear "virtual machine," they think of slow boot times—30 seconds to minutes. Containers are faster but still take seconds. HopX sandboxes start in under 100 milliseconds.
This isn't marketing spin. It's the result of carefully designed infrastructure that prioritizes startup latency above all else. Here's how we do it.
The Cold Start Problem
Cold start is the time between requesting a new execution environment and having it ready to run code. It's the enemy of responsive AI systems.
Traditional cold start times:
- Virtual Machines: 30-60 seconds
- Docker containers: 2-10 seconds
- AWS Lambda: 100ms - 5s (depending on runtime)
- Kubernetes pods: 5-30 seconds
For AI agents that need to spawn sandboxes dynamically, these numbers are unacceptable. An agent waiting 10 seconds to start code execution breaks the user experience.
Our Target: Sub-100ms
We set an aggressive target: sandboxes must be ready in under 100 milliseconds. That's the threshold where latency becomes imperceptible to humans.
Achieving this required rethinking every layer of the stack.
The Technology Stack
1. Firecracker Micro-VMs
At the core of HopX is Firecracker, the virtualization technology developed by AWS for Lambda and Fargate.
Why Firecracker?
- Minimal VMM (Virtual Machine Monitor) - only essential devices
- Boots a minimal Linux kernel in ~125ms
- Memory footprint of ~5MB per VM
- Full hardware virtualization (not containers)
| 1 | Traditional VM: Firecracker Micro-VM: |
| 2 | ┌─────────────────────┐ ┌─────────────────────┐ |
| 3 | │ Guest OS │ │ Minimal Guest │ |
| 4 | │ (Full kernel) │ │ (Stripped kernel) │ |
| 5 | ├─────────────────────┤ ├─────────────────────┤ |
| 6 | │ BIOS/UEFI │ │ Minimal Boot │ |
| 7 | │ Device Models │ │ (No BIOS) │ |
| 8 | ├─────────────────────┤ ├─────────────────────┤ |
| 9 | │ QEMU/KVM │ │ Firecracker │ |
| 10 | │ (Complex VMM) │ │ (Minimal VMM) │ |
| 11 | └─────────────────────┘ └─────────────────────┘ |
| 12 | ~30 seconds ~125ms |
| 13 | |
2. Memory Snapshots
Booting even a minimal kernel in 125ms isn't fast enough. We use memory snapshots to eliminate boot time entirely.
How snapshots work:
- Boot a sandbox to a "ready" state
- Capture complete memory state (snapshot)
- Store snapshot on fast storage
- Restore snapshot instead of booting
| 1 | Cold boot path: Snapshot restore path: |
| 2 | BIOS → Kernel → Restore memory pages → |
| 3 | Init → Services → Resume execution |
| 4 | Ready (~125ms) (~15ms) |
| 5 | |
3. Copy-on-Write Memory
When restoring snapshots, we don't copy all memory upfront. We use copy-on-write (CoW) semantics:
- Map snapshot pages as read-only
- Only copy pages when written to
- Most pages are never written
This means restore time is nearly instant—we're just setting up page table mappings.
| 1 | # Conceptual representation |
| 2 | class SnapshotRestore: |
| 3 | def restore(self, snapshot): |
| 4 | # Map pages read-only (microseconds) |
| 5 | for page in snapshot.pages: |
| 6 | self.map_readonly(page) |
| 7 | |
| 8 | # Pages only copied when written (later) |
| 9 | # Most pages never copied at all |
| 10 | |
4. Pre-warmed Pool
For the fastest possible starts, we maintain a pool of pre-restored sandboxes:
| 1 | Request → [Pre-warmed Pool] → Sandbox Ready |
| 2 | ↓ |
| 3 | ~10ms (just hand off) |
| 4 | |
The pool automatically scales based on demand patterns:
- More capacity during peak hours
- Fewer standby sandboxes during low usage
- Machine learning predicts demand spikes
5. Optimized Rootfs
Our root filesystem images are optimized for fast loading:
Traditional Linux rootfs:
- Full package manager
- Documentation
- Multiple locales
- Development headers
- Size: 500MB - 2GB
HopX rootfs:
- Runtime-only binaries
- Single locale (C.UTF-8)
- No documentation
- Stripped binaries
- Size: 50MB - 200MB
Smaller images mean:
- Faster snapshot loading
- Less memory pressure
- More sandboxes per host
6. Minimal Kernel Configuration
We use a custom Linux kernel configuration optimized for our use case:
| 1 | Disabled: |
| 2 | - USB support |
| 3 | - Sound |
| 4 | - Bluetooth |
| 5 | - Wireless |
| 6 | - Most filesystems (keep ext4) |
| 7 | - Unnecessary drivers |
| 8 | |
| 9 | Enabled: |
| 10 | - virtio (fast virtual devices) |
| 11 | - KVM guest support |
| 12 | - Minimal TTY |
| 13 | - Network (virtio-net) |
| 14 | - Block devices (virtio-blk) |
| 15 | |
Result: Kernel boots faster, uses less memory, has smaller attack surface.
The Complete Boot Path
Here's what happens when you call Sandbox.create():
| 1 | T+0ms: API receives request |
| 2 | T+2ms: Auth and rate limiting |
| 3 | T+5ms: Select host with capacity |
| 4 | T+8ms: Check pre-warmed pool |
| 5 | ↓ |
| 6 | If available: |
| 7 | T+10ms: Claim sandbox from pool |
| 8 | T+12ms: Configure networking |
| 9 | T+15ms: Return sandbox handle |
| 10 | ↓ |
| 11 | If pool empty: |
| 12 | T+10ms: Start snapshot restore |
| 13 | T+25ms: Memory mapping complete |
| 14 | T+30ms: Resume VM execution |
| 15 | T+35ms: Configure networking |
| 16 | T+40ms: Start user process |
| 17 | T+50ms: Ready for commands |
| 18 | |
Total: 15-50ms depending on pool availability.
Benchmarks
We continuously measure cold start performance:
| Scenario | P50 | P95 | P99 |
|---|---|---|---|
| Pre-warmed pool | 12ms | 18ms | 25ms |
| Snapshot restore | 45ms | 62ms | 85ms |
| Cold boot (rare) | 130ms | 180ms | 250ms |
Compare to alternatives:
| Platform | Cold Start |
|---|---|
| HopX | 12-50ms |
| AWS Lambda (Python) | 200-1000ms |
| Google Cloud Run | 500-2000ms |
| Docker | 2000-5000ms |
| Traditional VM | 30000-60000ms |
Optimizations We Tried (And Rejected)
Not every optimization makes sense. Here's what we tried but didn't adopt:
WASM Instead of VMs
WebAssembly sandboxes start faster (~1ms) but:
- Limited to WASM-compiled code
- No system calls
- Can't run arbitrary Python/Node
- Weaker isolation than hardware virtualization
We chose VMs for flexibility and security.
Container Pooling
Pre-creating containers seemed promising but:
- Security isolation weaker than VMs
- Container escape vulnerabilities exist
- Shared kernel attack surface
- Not suitable for untrusted code
Unikernels
Single-purpose OS images boot incredibly fast but:
- Requires recompiling applications
- No standard tooling
- Debugging is difficult
- Not practical for general use
Real-World Impact
Fast cold starts enable use cases that weren't possible before:
AI Agents
An agent can spawn sandboxes mid-conversation without noticeable delay:
| 1 | # User asks: "Calculate the Fibonacci sequence" |
| 2 | # Agent decides to run code |
| 3 | |
| 4 | sandbox = Sandbox.create(template="code-interpreter") # <50ms |
| 5 | result = sandbox.commands.run("python fib.py") # Code runs |
| 6 | # Total latency: imperceptible |
| 7 | |
| 8 | # Agent responds with results |
| 9 | |
Interactive Development
Code execution feels instant, like running locally:
| 1 | # Each cell execution creates fresh sandbox |
| 2 | for cell in notebook_cells: |
| 3 | sandbox = Sandbox.create() # Fast enough for interactive use |
| 4 | output = sandbox.commands.run(cell.code) |
| 5 | display(output) |
| 6 | sandbox.kill() |
| 7 | |
Parallel Processing
Spawn hundreds of sandboxes without waiting:
| 1 | import asyncio |
| 2 | from hopx import Sandbox |
| 3 | |
| 4 | async def process_item(item): |
| 5 | sandbox = await Sandbox.create_async() # Non-blocking |
| 6 | result = await sandbox.commands.run_async(f"process {item}") |
| 7 | await sandbox.kill_async() |
| 8 | return result |
| 9 | |
| 10 | # Process 100 items in parallel |
| 11 | items = range(100) |
| 12 | results = await asyncio.gather(*[process_item(i) for i in items]) |
| 13 | # All 100 sandboxes started within ~500ms total |
| 14 | |
Future Improvements
We're continuously working on reducing latency further:
Speculative Execution
Predict sandbox needs before requests arrive:
- Analyze request patterns
- Pre-warm specific templates
- Geographic pre-positioning
Even Smaller Snapshots
Reduce snapshot size through:
- Memory deduplication
- Compression
- Differential snapshots
Edge Deployment
Place sandboxes closer to users:
- Edge locations worldwide
- Sub-10ms network latency
- Local snapshot caches
Conclusion
Achieving sub-100ms cold starts required innovation at every layer:
- Firecracker micro-VMs for minimal overhead
- Memory snapshots to skip boot entirely
- Copy-on-write restore for instant page mapping
- Pre-warmed pools for immediate availability
- Optimized rootfs for smaller images
- Custom kernel for faster boots
The result: sandboxes that feel instant, enabling new categories of applications that require on-demand isolated execution.
When latency drops below human perception thresholds, the technology becomes invisible. That's our goal—making sandboxes so fast you forget they're not local processes.