You're a team lead staring at four invoices for AI coding tools — the software that writes code alongside your developers, like a very fast but very expensive junior colleague. One bill says "seats." Another says "tokens." A third says "session-hours." The fourth says "credits." Your spreadsheet has no common column. Your finance team is asking questions you cannot answer.

This shouldn't be hard. You just want to know which tool costs less per developer per month. But no vendor on Earth will give you that number — because the confusion is the product.

In the last four days alone, the pricing chaos went terminal. On April 18, Cursor's parent Anysphere closed a $2B funding round at a $50B valuation — the kind of war chest that lets you burn cash on pricing experiments indefinitely. The company sells dollar-equivalent usage credits at $20–$200/month. On April 20, GitHub straight-up paused new Copilot signups for its flat-rate seat model, citing "sustainability" — a word companies use when the unit economics are bleeding out. This follows weeks of billing-model musical chairs: on April 4, OpenAI moved Codex to token-based credits, Anthropic launched Managed Agents on April 8 at $0.08 per session-hour plus per-token rates plus $10 per thousand web searches, and Windsurf had already shipped daily quotas as far back as March 12. Six vendors, six billing models, zero overlap.

Now, the part where your wallet catches fire and nobody sends a notification. Each billing unit hides a different cost trap — and the nastiest one lives inside reasoning models. OpenAI's o-series and Anthropic's extended thinking modes generate hidden reasoning tokens: internal chain-of-thought the model produces before answering. You don't see them. You pay for them. A Stanford and UC Berkeley study published March 25 found that reasoning-token generation varies by up to 9.7× across runs of the same prompt — and that cost rankings between models can reverse by a factor of 28 depending on which run you measure. (I covered the study in detail in my April 20 breakdown — the short version is that your budget estimate isn't wrong, it's a random number generator.) Separately, Anthropic's Opus 4.7 tokenizer produces more tokens for identical text than its predecessor — same price per token, more tokens per request, as yesterday's model-swap analysis explored. Cursor's credit system throttles power users mid-session once they burn through fast requests. And Copilot's flat rate? It subsidizes the developer who uses it twice a day at the expense of the one who lives in it.

When you try to normalize everything to cost per actual output — say, cost per merged pull request (a completed code change that passes review) — the picture shifts dramatically. GetDX's Q1 2026 benchmarks, published April 15, across 64,680 developers show Cursor users at 4.1 PRs/day versus Copilot's 3.61. The 10× sticker-price gap between tools compresses to roughly 2–4× on a per-outcome basis. But the cheapest vendor flips depending on whether your team writes 50 lines a day or 500.

Each pricing model also reshapes how developers actually work. Flat seats encourage experimentation — try anything, it's prepaid. Per-token billing punishes exploration — every keystroke has a price tag. Session-hours reward fast agents and penalize debugging. Daily quotas create hard cliffs where your tool just stops mid-afternoon.

So what do you actually do? Your procurement team needs one metric: estimated monthly cost per developer at your usage pattern. No pricing page will give you that. The only honest path is a two-week parallel trial with your actual codebase and your actual humans.

The AI coding market just outsourced the math to the buyer. The first vendor to publish a transparent cost-per-outcome calculator wins the next wave of enterprise deals. The rest are betting you won't do the homework. Most of you won't.