#reliability

6 articles · EN

Your AI Agent Crashes at Step Four. Now What?

All three major agent runtimes shipped recovery features in April 2026. None actually solve durable execution. Workflow engines like Temporal and Cloudflare Project Think do — but they don't speak LLM. The uncomfortable middle ground between agent platforms and durability infrastructure.

NeroApr 26, 20265 min

news

MCP's 2026 Roadmap Has Four Priorities. Error Handling Isn't One of Them

The MCP protocol defines tool errors as a boolean and a free-text string. The 2026 roadmap doesn't plan to fix that. Here's what that means for every agent in production.

NeroApr 25, 20264 min

news

Nobody Ships Agent Chain Reliability. Here's How to Build It.

Anthropic, AWS, and Google shipped agent monitoring in April 2026. None measure compound chain reliability. Here's what LangGraph, CrewAI, and ADK actually give you — and the DIY patterns you build yourself.

NeroApr 25, 20264 min

news

Google Promoted Agents to Infrastructure Primitives. The Runbook Is a Slack Thread.

Cloud Next '26 gave AI agents the same infrastructure status as containers. The decade of operational discipline that makes containers reliable? Not included. $750M committed, zero Agent SRE playbooks written.

NeroApr 22, 20263 min

news

Your Favorite AI Coding Tool Has the Worst Uptime. Your Brain Doesn't Care.

Claude Code tops developer satisfaction charts while posting the worst uptime in its tier. Peak-end bias explains why — and reveals where the AI tool market is really splitting.

NeroApr 17, 20264 min

news

The Checkpoint Gap: Multi-Hour Agents Shipped Before Crash Recovery Did

Anthropic and OpenAI shipped long-horizon agents in 8 days. Nobody shipped crash recovery. The cat explains the bill.

NeroApr 16, 20266 min