You told your shiny new AI agent to "check flight prices and book the cheapest under $500." It opened a browser, clicked around for four minutes, and handed back a confirmation for the wrong airport. Not a different flight — a different city. Congrats, you just paid $470 to visit somewhere you didn't plan.

This is the part nobody talks about. Everyone argues about orchestration SDKs (the glue code that chains AI calls together) and managed agent platforms (hosted boxes where your agent lives). Meanwhile, the actual infrastructure that lets an agent "use the internet" — a headless browser (a Chrome instance with no visible window, driven by code) piloted by a vision model (an AI that reads screenshots like a human would) — is younger, flakier, and more concentrated than the LLM (large language model — the brain behind ChatGPT, Claude, Gemini) layer everyone obsesses over.

Between April 2 and April 15, 2026, the browser-agent layer crystallized. On April 2, open-source framework Browser Use shipped v0.12.6 with the telling changelog line "fix O(n²) bottlenecks in DOM capture for heavy pages" — translation: their previous release got quadratically slow on big websites, and nobody noticed until production 😹. On April 3, Browser Use Cloud went free-to-start, and coding agents like Claude Code can now sign up for Browser Use accounts on their own from the CLI. Agents provisioning agents. Welcome to 2026 🙀.

On April 6, Browserbase launched Stagehand Model Gateway — "one API key, one bill, access to top models without managing providers," per authors Miguel Gonzalez and Harsehaj Dhami. They're not just the browser vendor anymore. They want to be the billing spine above the LLM layer.

Then the big 48 hours. On April 14, Anthropic rebuilt Claude Code desktop and launched Routines — scheduled agent workflows running on Anthropic's cloud. On April 15, OpenAI shipped a massive Agents SDK update with sandboxing (isolated workspaces so parallel agents can't nuke each other's state), subagents, code mode, and support for 100+ LLMs. The New Stack called it "separating the harness from the compute" — a polite way of saying OpenAI wants to eat Browserbase's lunch.

Three architectures are fighting. Accessibility-tree navigation (reading a website's structured skeleton the way a screen reader does). Vision-model clicking (Claude Computer Use literally looks at a screenshot and says "click at x=420, y=380"). And hybrid, like Stagehand, which uses both. On the WebArena-Verified benchmark (a standard test of agents doing real web tasks), GPT-5.4 scores 67.3%. On OSWorld-Verified, 75% — above the human baseline of 72.4%. Sounds great until you read Berkeley's "Illusion of Progress" paper, which argues most web agents still underperform a 2024 baseline on sites they haven't been trained on 😾.

Now the price. Browser session-minutes + LLM tokens + retry loops + residential proxy fees easily triple per-task cost versus a text-only agent. Worse, failure modes are silent, not loud — the agent confidently books the wrong flight. No stack trace. Just a charge on your card and a hotel in Burbank when you meant Burlington 🐈‍⬛.

Here's the thing. If you ship an agent that touches the web, you've already picked a browser vendor whether you know it or not. Your orchestration SDK imports it transitively. Your "agent platform" is a thin wrapper around Browserbase, Browser Use, or Anthropic Computer Use. The procurement decision you didn't make is probably the biggest reliability risk in your stack.

The Playwright war of the agent era is over before most teams noticed it started. The browser is now a metered, billable, LLM-mediated line item in every production AI system. Check your invoices 😼.