Your AI Cost Model Just Forked. Here's the Math.

This morning we covered Anthropic's Managed Agents as a platform play — four layers of lock-in, each stickier than the last. But the billing model buried in that launch deserves its own autopsy. Because your AI cost model forked on April 8, and your finance team doesn't know yet.

Here's the split. Traditional API calls — question in, answer out — still bill per token, same as always. But anything that loops, retries, runs tools, or thinks autonomously now carries a second charge: $0.08 per session-hour, metered to the millisecond. Two meters running simultaneously. Anthropic is the first major vendor to price agent compute this way. OpenAI and Google went in completely different directions.

Let's run the same workload through all three. A realistic agent task: code review across a medium repository. One hour of runtime. The agent reads files, runs tests, loops on failures, writes a summary. By the end it's consumed roughly 500K input tokens and 100K output tokens.

Anthropic (Sonnet 4.6): Session runtime at $0.08. Input tokens at $3/M: $1.50. Output at $15/M: $1.50. Total: $3.08. The session fee is 2.6% of the bill.

OpenAI (GPT-5.2-Codex): No session fee. Input at $1.75/M: $0.88. Output at $14/M: $1.40. Total: $2.28. Pure tokens. Nothing else to track.

Google (Vertex AI Agent Engine): Per-second compute billing — vCPU and memory, priced like a cloud container. A standard agent runtime runs roughly $3–$8 per hour depending on configuration, and you pay Gemini token rates on top. The total varies wildly but typically embeds into existing GCP commitments. The same task might land around $5–$10 — or effectively zero if you've already committed enough GCP spend for the quarter.

At small scale, OpenAI wins on raw cost. At scale, the picture flips.

Budget 10,000 agent-hours per month. Anthropic's orchestration layer is $800 flat — one line item a CFO can approve without a tokenomics tutorial. OpenAI's bill is pure variable: every token, every retry, every moment an agent decides to reconsider moves the number. No floor, no ceiling. Google's agent cost disappears into your cloud commitment, which is either a feature or a trap depending on how your contract reads.

The break-even depends on how token-hungry your agents are. Lightweight agents — monitoring, routing, status checks — might use 50K tokens per hour. On Anthropic, that's $0.31 total. That $0.08 session fee is now 26% of the bill. Not a rounding error. Heavy agents doing code generation or deep research burn 1M+ tokens per hour. The session fee drops below 1%. Invisible.

Anthropic is making a built-in bet: agents will get heavier, not lighter. If the industry trends toward token efficiency — doing more with less context — the session-hour becomes an increasingly visible tax. If agents stay hungry, it vanishes into noise. Anthropic is betting on hungry.

Three vendors, three billing philosophies, zero common units. You cannot put them in the same spreadsheet without building a normalization model, and that model requires workload assumptions you haven't measured yet. Comparing AI vendor costs in 2026 is harder than comparing cloud costs in 2014 — and we still haven't solved that one either.

So here's where things land. Simple API calls stay on the token meter. Anything autonomous now lives on session-hours, pure tokens, or cloud-compute billing depending on your vendor. Switching means relearning not just the API but the entire financial model around it.

The token was the universal unit of AI cost for three years. Anthropic split it into two dimensions, and now everyone has to pick which denomination they're trading in.

Your AI Cost Model Just Forked. Here's the Math.

Keep reading

Anthropic Is Running the Stripe Playbook. You're the Merchant.

Anthropic Launched Managed Agents. The Model Is Now the Least Important Thing They Sell.

Multi-Agent Delegation Is an Org Chart Nobody Audited

OpenAI's $122B War Chest: First Moves of the Most Expensive Startup in History