Back to Blog

How HopX Achieves 100ms Cold Starts

Deep DivesAlin Dobra7 min read

How HopX Achieves 100ms Cold Starts

When developers hear "virtual machine," they think of slow boot times—30 seconds to minutes. Containers are faster but still take seconds. HopX sandboxes start in under 100 milliseconds.

This isn't marketing spin. It's the result of carefully designed infrastructure that prioritizes startup latency above all else. Here's how we do it.

The Cold Start Problem

Cold start is the time between requesting a new execution environment and having it ready to run code. It's the enemy of responsive AI systems.

Traditional cold start times:

  • Virtual Machines: 30-60 seconds
  • Docker containers: 2-10 seconds
  • AWS Lambda: 100ms - 5s (depending on runtime)
  • Kubernetes pods: 5-30 seconds

For AI agents that need to spawn sandboxes dynamically, these numbers are unacceptable. An agent waiting 10 seconds to start code execution breaks the user experience.

Our Target: Sub-100ms

We set an aggressive target: sandboxes must be ready in under 100 milliseconds. That's the threshold where latency becomes imperceptible to humans.

Achieving this required rethinking every layer of the stack.

The Technology Stack

1. Firecracker Micro-VMs

At the core of HopX is Firecracker, the virtualization technology developed by AWS for Lambda and Fargate.

Why Firecracker?

  • Minimal VMM (Virtual Machine Monitor) - only essential devices
  • Boots a minimal Linux kernel in ~125ms
  • Memory footprint of ~5MB per VM
  • Full hardware virtualization (not containers)
text
1
Traditional VM:                  Firecracker Micro-VM:
2
         
3
    Guest OS                     Minimal Guest     
4
    (Full kernel)                (Stripped kernel) 
5
         
6
    BIOS/UEFI                    Minimal Boot      
7
    Device Models                (No BIOS)         
8
         
9
    QEMU/KVM                     Firecracker       
10
    (Complex VMM)                (Minimal VMM)     
11
         
12
      ~30 seconds                    ~125ms
13
 

2. Memory Snapshots

Booting even a minimal kernel in 125ms isn't fast enough. We use memory snapshots to eliminate boot time entirely.

How snapshots work:

  1. Boot a sandbox to a "ready" state
  2. Capture complete memory state (snapshot)
  3. Store snapshot on fast storage
  4. Restore snapshot instead of booting
text
1
Cold boot path:           Snapshot restore path:
2
BIOS  Kernel           Restore memory pages 
3
Init  Services         Resume execution
4
Ready (~125ms)           (~15ms)
5
 

3. Copy-on-Write Memory

When restoring snapshots, we don't copy all memory upfront. We use copy-on-write (CoW) semantics:

  1. Map snapshot pages as read-only
  2. Only copy pages when written to
  3. Most pages are never written

This means restore time is nearly instant—we're just setting up page table mappings.

python
1
# Conceptual representation
2
class SnapshotRestore:
3
    def restore(self, snapshot):
4
        # Map pages read-only (microseconds)
5
        for page in snapshot.pages:
6
            self.map_readonly(page)
7
        
8
        # Pages only copied when written (later)
9
        # Most pages never copied at all
10
 

4. Pre-warmed Pool

For the fastest possible starts, we maintain a pool of pre-restored sandboxes:

text
1
Request  [Pre-warmed Pool]  Sandbox Ready
2
             
3
         ~10ms (just hand off)
4
 

The pool automatically scales based on demand patterns:

  • More capacity during peak hours
  • Fewer standby sandboxes during low usage
  • Machine learning predicts demand spikes

5. Optimized Rootfs

Our root filesystem images are optimized for fast loading:

Traditional Linux rootfs:

  • Full package manager
  • Documentation
  • Multiple locales
  • Development headers
  • Size: 500MB - 2GB

HopX rootfs:

  • Runtime-only binaries
  • Single locale (C.UTF-8)
  • No documentation
  • Stripped binaries
  • Size: 50MB - 200MB

Smaller images mean:

  • Faster snapshot loading
  • Less memory pressure
  • More sandboxes per host

6. Minimal Kernel Configuration

We use a custom Linux kernel configuration optimized for our use case:

text
1
Disabled:
2
- USB support
3
- Sound
4
- Bluetooth
5
- Wireless
6
- Most filesystems (keep ext4)
7
- Unnecessary drivers
8
 
9
Enabled:
10
- virtio (fast virtual devices)
11
- KVM guest support
12
- Minimal TTY
13
- Network (virtio-net)
14
- Block devices (virtio-blk)
15
 

Result: Kernel boots faster, uses less memory, has smaller attack surface.

The Complete Boot Path

Here's what happens when you call Sandbox.create():

text
1
T+0ms:    API receives request
2
T+2ms:    Auth and rate limiting
3
T+5ms:    Select host with capacity
4
T+8ms:    Check pre-warmed pool
5
          
6
          If available:
7
T+10ms:     Claim sandbox from pool
8
T+12ms:     Configure networking
9
T+15ms:     Return sandbox handle
10
          
11
          If pool empty:
12
T+10ms:     Start snapshot restore
13
T+25ms:     Memory mapping complete
14
T+30ms:     Resume VM execution
15
T+35ms:     Configure networking
16
T+40ms:     Start user process
17
T+50ms:     Ready for commands
18
 

Total: 15-50ms depending on pool availability.

Benchmarks

We continuously measure cold start performance:

ScenarioP50P95P99
Pre-warmed pool12ms18ms25ms
Snapshot restore45ms62ms85ms
Cold boot (rare)130ms180ms250ms

Compare to alternatives:

PlatformCold Start
HopX12-50ms
AWS Lambda (Python)200-1000ms
Google Cloud Run500-2000ms
Docker2000-5000ms
Traditional VM30000-60000ms

Optimizations We Tried (And Rejected)

Not every optimization makes sense. Here's what we tried but didn't adopt:

WASM Instead of VMs

WebAssembly sandboxes start faster (~1ms) but:

  • Limited to WASM-compiled code
  • No system calls
  • Can't run arbitrary Python/Node
  • Weaker isolation than hardware virtualization

We chose VMs for flexibility and security.

Container Pooling

Pre-creating containers seemed promising but:

  • Security isolation weaker than VMs
  • Container escape vulnerabilities exist
  • Shared kernel attack surface
  • Not suitable for untrusted code

Unikernels

Single-purpose OS images boot incredibly fast but:

  • Requires recompiling applications
  • No standard tooling
  • Debugging is difficult
  • Not practical for general use

Real-World Impact

Fast cold starts enable use cases that weren't possible before:

AI Agents

An agent can spawn sandboxes mid-conversation without noticeable delay:

python
1
# User asks: "Calculate the Fibonacci sequence"
2
# Agent decides to run code
3
 
4
sandbox = Sandbox.create(template="code-interpreter")  # <50ms
5
result = sandbox.commands.run("python fib.py")  # Code runs
6
# Total latency: imperceptible
7
 
8
# Agent responds with results
9
 

Interactive Development

Code execution feels instant, like running locally:

python
1
# Each cell execution creates fresh sandbox
2
for cell in notebook_cells:
3
    sandbox = Sandbox.create()  # Fast enough for interactive use
4
    output = sandbox.commands.run(cell.code)
5
    display(output)
6
    sandbox.kill()
7
 

Parallel Processing

Spawn hundreds of sandboxes without waiting:

python
1
import asyncio
2
from hopx import Sandbox
3
 
4
async def process_item(item):
5
    sandbox = await Sandbox.create_async()  # Non-blocking
6
    result = await sandbox.commands.run_async(f"process {item}")
7
    await sandbox.kill_async()
8
    return result
9
 
10
# Process 100 items in parallel
11
items = range(100)
12
results = await asyncio.gather(*[process_item(i) for i in items])
13
# All 100 sandboxes started within ~500ms total
14
 

Future Improvements

We're continuously working on reducing latency further:

Speculative Execution

Predict sandbox needs before requests arrive:

  • Analyze request patterns
  • Pre-warm specific templates
  • Geographic pre-positioning

Even Smaller Snapshots

Reduce snapshot size through:

  • Memory deduplication
  • Compression
  • Differential snapshots

Edge Deployment

Place sandboxes closer to users:

  • Edge locations worldwide
  • Sub-10ms network latency
  • Local snapshot caches

Conclusion

Achieving sub-100ms cold starts required innovation at every layer:

  1. Firecracker micro-VMs for minimal overhead
  2. Memory snapshots to skip boot entirely
  3. Copy-on-write restore for instant page mapping
  4. Pre-warmed pools for immediate availability
  5. Optimized rootfs for smaller images
  6. Custom kernel for faster boots

The result: sandboxes that feel instant, enabling new categories of applications that require on-demand isolated execution.

When latency drops below human perception thresholds, the technology becomes invisible. That's our goal—making sandboxes so fast you forget they're not local processes.

Further Reading