Google's Agent Pricing Is a Trap Disguised as a Discount

Anthropic launched Managed Agents, OpenAI meters every token with platform fees stacked on top, and Google charges per vCPU-hour. We covered the event and the billing zoo yesterday. But everyone — myself included — spent too much time staring at orchestration fees. The real number is hiding one layer deeper.

The orchestration costs — $0.08/session-hour for Anthropic, ~$0.09/vCPU-hour for Google — are noise. A few cents per hour of babysitting. The number your CFO should actually lose sleep over is the token price, because that's where the 10x gap lives.

The Math Nobody Led With

Here's what the underlying models cost to process a million input tokens:

Gemini 2.5 Flash: $0.30
GPT-5: $1.25
Claude Sonnet 4.5: $3.00

That's not a rounding difference. Google's cheapest model costs one-tenth of Anthropic's workhorse on raw inference. An agent chewing through a million tokens — roughly 750,000 words — runs $0.30 on Flash versus $3.00 on Claude. Multiply by thousands of daily sessions, and orchestration fees become a footnote in a much uglier spreadsheet.

This is the actual battleground. Not who charges what for the sandbox. Who charges what for the thinking.

Google's Android Playbook, Reloaded

Enterprise analyst Kai Waehner laid it out on April 6: Google already has 11 million Cloud-connected organizations swiping their cards monthly. They don't need to win on agent orchestration margins. They need agents to drive more compute consumption on infrastructure customers already pay for.

This is Android economics applied to AI. Give away the runtime at near-cost, make it irresistible on price, and monetize the ecosystem customers build around it. The Vertex AI Agent Engine's free tier covers ~50 hours of compute per month — just enough to get your pipelines dependent on Google's session management ($0.25 per 1,000 events), Google's memory banks, Google's RAG Engine.

Waehner again: "Choosing Gemini means choosing Google Cloud as your inference layer, Google Workspace as your productivity surface, and Vertex AI as your development platform."

That's not a pricing decision. That's an adoption ceremony.

The Part Where "Cheap" Gets Expensive

The 10x token discount comes stapled to Google's entire stack. Your agents get wired into Vertex session management, Google's document retrieval, Google's orchestration layer. Migrating to Anthropic or OpenAI later means rebuilding from scratch — data pipelines, memory stores, retrieval logic, all of it.

Anthropic plays the opposite card. Claude is available through its own API, AWS Bedrock, and Google's own Vertex AI. Higher per-token cost, but you're buying the exit door. OpenAI sits somewhere in the middle, hedging with Azure while building its own platform gravity.

The cheapest tokens come with the stickiest infrastructure. Always have, always will.

What This Means for You

If you're choosing an agent platform this month, stop comparing orchestration fees. The token cost gap between providers is 5-10x wider than the runtime cost gap. Run the numbers on total inference spend at your projected volume, then ask yourself how much you'd pay to switch providers in two years.

The AI agent wars stopped being about who builds the smartest model. They're about who already holds your cloud contract — and who can make "free" feel like a bargain until the switching costs arrive. Google holds more of those contracts than anyone, and they just priced their agent runtime like someone who knows it.

Google's Agent Pricing Is a Trap Disguised as a Discount

The Math Nobody Led With

Google's Android Playbook, Reloaded

The Part Where "Cheap" Gets Expensive

What This Means for You

Keep reading

Three AI Memory Systems, Zero Proof They Actually Help

Your Agent's Permission Dialog Is a Placebo

MCP Is Anthropic's Android. The Lock-In Is in the Spec.

ADK Ships Both Protocols. The Destination Is Vertex.