Two weeks ago, you greenlit an agent pilot. Maybe it was Anthropic's shiny new Managed Agents, maybe OpenAI's updated Agents SDK. The vendor demo was gorgeous: a junior-level task — ticket triage, boilerplate code, data cleanup — vanished in 90 seconds. You did the math. Fewer junior hires, seniors freed for "high-value work." ROI looked bulletproof.
Here's the punchline nobody put on the slide: your senior engineers are now spending more time reviewing agent outputs than juniors ever spent producing them. And nobody budgeted for that.
The week that launched a thousand agents
Between April 8 and April 15, the three biggest AI vendors went all-in on production agents — autonomous AI systems that don't just answer questions but actually do work on their own. Anthropic shipped Managed Agents on April 8, with Notion, Rakuten, and Asana as launch partners. Atlassian plugged agents into Confluence on April 10. OpenAI expanded its Agents SDK on April 15 with sandbox environments and long-horizon tasks. Enterprise agents went from "we're experimenting" to "it's in prod" overnight.
Nobody asked what happens next.
The data nobody wants on the dashboard
The cracks had been showing up for months — if anyone was reading the research.
Faros.ai studied over 10,000 developers across 1,255 teams (published July 2025): individual devs completed 21% more tasks and merged 98% more pull requests — chunks of code submitted for review. Sounds like a win. But PR review time jumped 91%. Bugs increased 9%. And at the company level? "Any correlation between AI adoption and key performance metrics evaporates." Individual velocity went up. Team output went sideways. The agents didn't remove work — they moved it upstream to the review queue.
By now the supporting numbers are familiar — CodeRabbit's 1.7× more issues in AI-generated code (December 2025), Princeton's finding that agent reliability improves at half the rate of capability (March 2026). We've covered both on this channel. The Faros data explains why those numbers hit so hard at scale: the bottleneck didn't disappear. It migrated from production to review.
The structural trap
Here's why the ROI inverts, and it's not a bug anyone can patch.
Doing a task requires competence. Reviewing an autonomous system's output requires competence plus judgment plus the ability to catch errors the AI is confident about. Supervision is strictly harder than execution.
Addy Osmani named this "comprehension debt" — the growing gap between how much code exists and how much any human actually understands — on O'Reilly Radar on April 13: "A junior engineer can now generate code faster than a senior engineer can critically audit it." An Anthropic study of 52 engineers, published in February 2026, found AI-assisted devs scored 17 percentage points lower on comprehension tests for the code they'd just "written."
The human cost is already measurable. Harvard Business Review reported on March 5 that 14% of AI users experience "brain fry" — mental fatigue from excessive AI oversight. Oversight ranked as the single most mentally taxing AI activity. Workers with high oversight loads made 39% more major errors and experienced 33% more decision fatigue. They also quit more: 34% intent to leave, versus 25% for workers without brain fry.
Shashi Bellamkonda of Info-Tech Research Group called it "the oversight tax" on April 5. He cited a Microsoft engineer using an AI coding agent who reported he "could not step away from the screen" — it felt "like someone being dragged along by it." The engineer expected to hand work to a junior. He got an anxious babysitting shift where the consequences of looking away were unknowable.
The price tag nobody quoted you
Vendors bill by usage regardless of output quality. Agent supervision hours are invisible in project accounting — they show up as "senior engineer time" with no line item connecting them to the agent that created the work. The expertise bottleneck that limited your team before agents now limits your team after agents, just at a different layer.
Gartner's June 2025 prediction that over 40% of agentic projects get canceled by 2027 is starting to look conservative. The OutSystems survey from April 13 found 94% of IT leaders already worry about agent sprawl, and just 12% have centralized platforms to manage it. Meanwhile, 52% rely on "human-on-the-loop supervision" — the polite corporate way of saying "a person watches the robot and prays."
What this means for you
Before deploying agents, calculate the supervision cost per agent-hour — not the agent-hour price. If your team lacks senior reviewers, agents amplify the expertise gap instead of closing it. The vendor's ROI calculator doesn't have a field for "how much does it cost when your best engineer spends all Tuesday verifying the agent didn't quietly break authentication."
Ask your vendor one question: what is the expected supervision ratio? If they stare at you blankly, you have your answer.
The agent market's first real segmentation won't be by model quality or price. It'll be by which platform actually reduces the supervision load. That metric doesn't exist yet — and until it does, every ROI projection you've seen is missing its biggest variable. Two weeks ago the pitch was "agents replace junior work." Today the question is who replaces the senior engineer's sanity.


