Your AI coding agent ran overnight. You open the dashboard Monday morning and it glows: 14 pull requests created, 2,000 lines changed, three features scaffolded. You sip your coffee feeling like you hired a junior developer for free.

Then you actually read the code. Half those PRs contain fixes for bugs the agent introduced two commits earlier in the same session. One function got written, broken, rewritten, broken again, and finally landed on attempt five. The dashboard counted every attempt as productive work.

Welcome to the rework ratio — the metric nobody selling you AI coding tools wants to talk about.

Inside an Agent Session

Over the past month, every major coding tool shipped autonomous agents. GitHub Copilot and Cursor 3 launched theirs in early April; Claude Code Routines followed on April 14; OpenAI Codex expanded to multi-agent workflows on April 16. Each tool runs unsupervised iteration loops — the agent writes code, checks whether it works, and tries again if it doesn't.

That "tries again" part is where the accounting falls apart. Here's a condensed but representative session from an agent tasked with adding a user authentication endpoint. Forty-three minutes. Twelve commits:

# Commit message Type
1 Add auth route handler New work
2 Add JWT token generation New work
3 Fix import error in auth.py Rework
4 Add password hashing New work
5 Fix type error in hash function Rework
6 Rewrite auth route to fix 500 error Rework
7 Add input validation New work
8 Fix validation regex causing test failure Rework
9 Fix test broken by commit 6 Rework
10 Add rate limiting middleware New work
11 Fix rate limiter config path Rework
12 Clean up unused imports from iterations Rework

Five commits advance the feature. Seven fix problems the agent created in the same session. That's a 58% rework ratio — more than half the agent's effort spent correcting its own output.

The dashboard reported 12 commits, 847 lines changed, one feature completed. All technically true. All misleading.

How to Calculate the Rework Ratio

This isn't theoretical. You can extract it from any repository where agents operate:

Rework Ratio = (commits modifying code written earlier in the same agent session) ÷ (total commits in session)

Run git log --diff-filter=M on an agent-generated branch. Flag every commit that alters a file the agent already touched in the same session. Separate genuine extensions (adding a new function to an existing file) from corrections (fixing what just broke). The ratio sits right there in the diff history.

GitClear's April 2026 code quality report measured a related signal — code churn within 72 hours of writing — and found it running 7.1% for AI-assisted projects versus 3.2% for human-only baselines. But that captures churn after the PR merges — code that ships and then gets rewritten. The intra-session churn, where the agent breaks and fixes its own code before you ever see the pull request, remains invisible to every existing measurement tool.

That's the gap. GitClear measures post-merge churn. Vendor dashboards measure activity. Nobody measures the rework happening inside the agent's own loop.

The Dashboard Lie

Follow the math for a real team. Say your agents run 50 sessions per week across 10 engineers, averaging 12 commits per session. If the typical rework ratio is 55%:

  • 50 sessions × 12 commits = 600 commits/week (what the dashboard shows)
  • 600 × 0.55 = 330 commits that produced nothing that shipped
  • 330 rework commits × ~$0.15 avg token cost = ~$50/week burned on the AI equivalent of backspacing

Scale that up. A 100-engineer org running agents aggressively burns $2,000–$5,000 monthly in tokens that generate zero net code. The dashboard labels this "AI-assisted development." The P&L labels it waste.

As multiple analyses have confirmed this year — AI-generated code carries roughly 1.7× more issues per PR than human code, incidents climb in proportion to AI output, and agent reliability grows at half the rate of capability. The rework ratio explains part of the mechanism: code that survived five internal rewrites carries the architectural scars of the first four attempts. Functions get shaped by debugging history, not design intent.

What Survives After the Rework

Strip out the self-correction loops and honest productivity gains land around 1.5–2× for most teams. Larridin's Q1 2026 productivity benchmarks found AI usage across engineering teams jumped 65%, yet PR throughput grew roughly 10%. The gap between adoption and output is partially explained by rework eating the difference.

The hidden cost isn't only tokens. Every correction cycle adds defensive complexity to the final code. Variable names reflect debugging history rather than domain concepts. Abstractions accumulate guard clauses from prior failed attempts. The code works, but it reads like it was written by someone who kept changing their mind — because it was.

The Metric That Would Change Procurement

Ask your AI coding tool vendor one question before the next sprint planning: what percentage of agent actions in a session correct the agent's own prior output?

I checked every dashboard, every analytics page, every engineering intelligence report from the major tools shipping agents this month. Not one separates "new useful work" from "the agent arguing with itself."

The first vendor that ships this metric — honestly splitting new work from self-correction — wins enterprise deals. Not because the number will look flattering (it won't), but because it demonstrates something no vendor has yet offered: honesty about what autonomous coding actually produces.

You don't need to wait. Clone any agent-generated branch. Read the commits in order. Count the ones that fix what the agent just broke.

Your dashboard says 10×. Your git log says something else. Believe the git log.