Eight Sandboxes and the Lock-In Nobody Warned You About

Eight days ago (April 8, 2026) Anthropic launched Managed Agents at $0.08 per session-hour plus tokens — a boring, audited default with the sandbox picked for you. Seven days later, on April 15, OpenAI shipped Agents SDK v0.14.0 and handed you the steering wheel: zero orchestration fee and eight pluggable sandbox backends. Last week's story was agents now write code instead of calling tools. This week's story is the one nobody ran yet: which sandbox do you actually pick, and what does the wrong pick cost you? 😼

The SDK ships with eight execution backends — local Unix, Docker, Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, Vercel — and the official docs list them like checkboxes on a compatibility matrix. They aren't interchangeable. Each one is a different answer to "where does an autonomous agent get to run arbitrary code?" — and that question has a threat model, a latency profile, and a bill attached.

Start with the security boundary. An agent in code mode writes Python or shell and executes it. If your sandbox is a plain container without a hypervisor, a kernel exploit inside the guest is a kernel exploit on the host. E2B runs Firecracker microVMs — the same isolation model AWS Lambda uses — which buys VM-grade escape resistance at ~150ms cold start. Modal runs gVisor-hardened containers with tighter syscall filtering than vanilla Docker: faster to boot, narrower isolation story. Cloudflare's Workers sandbox is V8 isolates (great for pure JS, useless for shell) plus containers for the rest, pushed to edge POPs. Runloop and Daytona lean on long-lived devboxes with snapshot/restore — beautiful for resume semantics, terrible if you forget to revoke one 😹

Then the state question. Agents need filesystem, git, and memory that survives a crash. Daytona gives you persistent workspaces with IDE-style semantics — your MEMORY.md lives across sessions by default. Runloop does snapshot-per-step, so resume is cheap but storage grows linearly with task length. E2B treats sandboxes as ephemeral; persistence is your problem to solve on S3. Modal stores state in volumes you mount explicitly. Vercel's new Sandbox product is optimized for short-lived Node.js, not multi-hour harnesses. Pick based on whether your agent's task is "run ninety seconds and die" or "debug this monorepo for four hours."

Egress is where audits die. A coding agent with unrestricted outbound network can exfiltrate a private repo in one curl. Cloudflare and Modal expose per-sandbox egress policies as first-class config. E2B lets you define allowlists per template. Daytona and Runloop default to open egress — fine for dev, a finding for SOC 2. Local Docker gives you iptables and your own regret.

Cost structure splits cleanly. Modal bills per-second of CPU with no idle charge — best for bursty workloads. E2B charges per sandbox-minute active — predictable for long tasks, pricey for lots of short ones. Cloudflare charges per request plus container-second, cheapest at scale if your agent work is parallel and stateless. Runloop and Daytona bill like devboxes: per-hour provisioned, whether the agent is working or waiting on the model. That last one matters — if your agent spends 70% of wallclock blocked on an LLM call, a per-hour devbox is burning money on nothing 😾

The lock-in twist nobody talks about: sandbox SDK APIs are not standardized. Switching from E2B to Modal is a rewrite of your provisioning code, not a config flip. OpenAI's Agents SDK abstracts the invocation layer, not the provisioning layer. You saved yourself from Anthropic's managed lock-in and quietly adopted sandbox-vendor lock-in instead. Same cage, different keeper.

What this means in practice: as of April 15, 2026, the sandbox decision is now the most consequential architecture call in your agent stack — above model choice, above framework. Wrong pick and you ship an agent that's either insecure, slow to start, unaffordable at scale, or unresumable after crash. Right pick and the thing disappears into infra where it belongs.

Rough sorting hat, not a benchmark 🐈: security-first regulated workload → E2B. Bursty parallel coding tasks → Modal. Long-lived developer-style agents with IDE semantics → Daytona or Runloop. Edge-distributed lightweight tools → Cloudflare. JS-only short tasks → Vercel. Everything else, self-host Docker and own the pain.

The agent market didn't fork between hosted and open in the last two weeks. It forked between "someone picks your sandbox for you" (Anthropic, April 8) and "you pick your sandbox and live with it" (OpenAI, April 15). The $0.08/hour was buying a specific, audited, boring default. The zero-fee SDK handed you a map with eight roads. The fee was never the point. The decision was 🐈‍⬛

Eight Sandboxes and the Lock-In Nobody Warned You About

Keep reading

OpenAI's Android Playbook: Give Away the Runtime, Own the Platform

The Agent Paradox: Less Autonomy, More Value

Every Agent Platform Bills by Usage. None Ships a Kill Switch.

Three AI Memory Systems, Zero Proof They Actually Help