Checkpoint Gap: Multi-Hour Agents आ गए, Crash Recovery अब भी नहीं

Tuesday रात को तुमने छह घंटे का एक agent चालू किया। उसे competitor का pricing page scrape करना था, चालीस पुराने Linear tickets triage करने थे, और तुम्हारे सोते वक्त एक Postgres migration dry-run चलाना था। Dashboard बोल रहा है "autonomous"। Marketing बोल रही है "long-horizon"। तुम्हारा credit card बोल रहा है "ठीक है, जो भी हो"। तुमने laptop बंद किया।

सुबह उठे तो — आधा-अधूरा task, तुम्हारे नाम से तीन duplicate Linear tickets, और एक Slack channel जहाँ teammate पूछ रहा है कि तुम 3 बजे रात को जगे हुए क्यों थे। Agent चौथे घंटे पर crash हो गया था। कोई नहीं बता सकता — न तुम, न vendor — कि "resume" दबाओगे तो नुकसान दोगुना होगा या ठीक हो जाएगा।

Welcome to April 2026 — वो महीना जब multi-hour agents reliability guarantee बनने से पहले pricing metric बन गए 😹।

आठ दिन, तीन persistence models, zero standards

8 से 15 April के बीच, दो सबसे बड़े agent vendors ने तीन अलग-अलग तरीके ship किए जिनसे AI agent एक घंटे के बाद भी ज़िंदा रहे — और किसी को इस बात पर सहमति नहीं है कि "ज़िंदा" का मतलब क्या है।

14 April को Anthropic ने Claude Code Routines launch किया — scheduled या webhook-triggered agent runs, research preview, daily caps के पीछे बंद (Pro पर 5/day, Max पर 15/day, Team और Enterprise पर 25/day)। Minimum schedule interval: एक घंटा। The Register ने इसे शालीनता से "mildly clever cron jobs" कहा 😼।

15 April को OpenAI ने Agents SDK v0.14.0 ship किया — नया SandboxAgent surface, pluggable sandbox backend (Docker, E2B, Modal, Vercel, Cloudflare — जो पसंद हो चुन लो), और एक चीज़ जिसका नाम है MEMORY.md — एक literal markdown file जो agent runs के बीच खुद को लिखता है।

और 8 April को Anthropic पहले ही Managed Agents launch कर चुका था, जो usage को session-hours में meter करता है — एक billing unit जो खुलेआम मानती है कि तुम्हारा agent घंटों चलेगा।

तीन persistence models। Zero interop। Welcome to long-horizon autonomy।

हर vendor असल में क्या save कर रहा है

एक छोटा detour — क्योंकि "agent याद रखता है" सुनने में simple लगता है, पर है नहीं।

एक agent एक loop है: LLM (large language model — ChatGPT या Claude के पीछे का दिमाग) task पढ़ता है, एक tool call करता है (web search, shell command, API call), result पढ़ता है, तय करता है आगे क्या करना है। Long-horizon agent वही loop है, घंटों तक चलता हुआ। Checkpoint loop की state का saved snapshot है, ताकि अगर process crash हो जाए, तो scratch से शुरू करने की जगह snapshot से resume कर सको।

अब देखो हर vendor असल में क्या save करता है:

Anthropic Routines — session के अंदर conversation और plan save करता है। Docs कहते हैं, "each matching GitHub event starts a new session" — यानी sessions triggers के बीच state share भी नहीं करते। और: "events beyond the limit are dropped until the window resets" — मतलब webhook spike पर काम चुपचाप lost, कोई queue नहीं, कोई retry नहीं 🙀।
OpenAI Sandbox Agents — sandbox के filesystem में MEMORY.md file save करता है। OpenAI के अपने docs कहते हैं कि यह "distills lessons into readable files rather than preserving full workspace state." आसान भाषा में: उसे याद रहता है कि उसने क्या सीखा, यह नहीं कि उसने क्या किया। बीच-git push में killed हो गया? Plan बच जाएगा। आधा-पुश किया commit नहीं।
Anthropic Managed Agents — session-hour से bill करता है। एक session-hour असल में क्या checkpoint करता है, वो undocumented है।

कोई भी — कोई भी नहीं — document करता है कि run crash होने पर side-effects का क्या होता है। Side-effect वो है जो agent ने अपनी memory के बाहर touch किया: एक API call fire हो गई, एक Linear ticket बन गया, तुम्हारे database में एक row डल गई, एक Slack message चला गया, एक git commit push हो गया। वो rewind नहीं होते।

वो "aha" जो किसी ने landing page पर नहीं लिखा

अब असली बात खुलकर: जब multi-hour agent crash होकर resume होता है, तो checkpoint agent की intent restore करता है, उस दुनिया की state नहीं जिस पर agent act कर रहा था।

तुम्हारे agent ने घंटे तीन पर एक Linear ticket file किया। घंटे चार पर crash हुआ। Hour 3.5 का checkpoint नहीं जानता कि ticket मौजूद है। Resume: वो फिर से ticket file करता है। बधाई हो, तुम्हारे पास duplicates हैं — और Anthropic के docs के अनुसार, "Linear tickets… use your linked accounts," तो duplicates तुम्हारे ही नाम से हैं। तुम्हारे teammates सोच रहे हैं कि तुम उन्हें spam कर रहे हो 😾।

यह bug नहीं है। यह architecture है। The New Stack के OpenAI release analysis में लिखा है कि harness "can keep auth, billing, audit logs, human review, and recovery state outside any one container" — जो सच है, और साथ ही यह कहने का शालीन तरीका है कि SDK की राय अपनी state पर है, तुम्हारी पर बिल्कुल नहीं।

Google के Vertex Agent Engine की बात करें तो December 2025 में Sessions और Memory Bank GA हो चुके थे; April 2026 ने सिर्फ़ Agent Designer preview add किया। तो कोई नहीं — न Anthropic, न OpenAI, न Google — तुम्हारे लिए side-effect idempotency solve नहीं करता।

वो कीमत जो किसी ने pricing page पर नहीं लिखी

Idempotency — यानी यह property कि किसी काम को दो बार करने का वही effect हो जो एक बार करने का होता है — अब पूरी तरह तुम्हारी problem है। तुम्हारा agent बाहर की दुनिया में जो भी tool call करता है, उसे idempotency key चाहिए (हर operation के लिए एक unique ID, ताकि receiving service retries को deduplicate कर सके)। हर external action को एक journaled outbox चाहिए (एक log जो तुम action से पहले लिखते हो, ताकि पता रहे तुमने क्या attempt किया था, भले ही confirm होने से पहले crash हो जाए)।

Re-runs की कीमत दोगुनी: दोगुने tokens (LLM जो word-chunks process करता है, per million बिल होते हैं), दोगुने session-hours, दोगुना wall-clock जो तुमने wait किया। और चूँकि कोई vendor portable checkpoint format offer नहीं करता, तुम mid-task Anthropic से OpenAI पर fail over नहीं कर सकते। तुम अपने bug reports की shape से lock-in हो।

Hacker News की Routines thread ने इसे सीधे रखा: "I'm not going to build my business on things I can't replicate myself." एक और commenter ने कहा multi-hour routine debug करना "maddening" होगा। दोनों बातों पर सही 🐈‍⬛।

अगर तुम यह production में ship कर रहे हो

अगर April 2026 में तुम एक घंटे से ज़्यादा देर तक agents चला रहे हो, तो platform का checkpoint तुम्हारी recovery story नहीं है। वो एक receipt है। तुम्हें तीन चीज़ें चाहिए जो vendors ने नहीं बनाईं:

एक journaled outbox — हर external side-effect execute होने से पहले log में लिखे, ताकि replay को पता रहे agent ने क्या attempt किया।
हर tool call पर idempotency keys — GitHub, Linear, Slack, तुम्हारी अपनी APIs। कोई exceptions नहीं।
एक manual resume UI — ताकि crash के बाद retry करना है, skip करना है, या abort करना है — यह इंसान तय करे। Agent नहीं। Vendor नहीं।

इस महीने असल में क्या बदला

"Agents घंटों चलते हैं" April 2026 में एक pricing unit बन गया। नीचे का plumbing अभी भी पंद्रह-मिनट scale का है। अगले quarter में कहीं, कोई enterprise पहला public post-mortem लिखेगा एक managed agent के बारे में जिसे कोई rewind नहीं कर सका — और दिलचस्प सवाल यह नहीं होगा कि कौन सा vendor fail हुआ, बल्कि यह कि किसने सोचा था कि checkpoint ही guarantee है 😸।

बिल्ले की सलाह: अपना खुद का outbox चलाओ। किसी भी vendor के "resume" button पर भरोसा मत करो। और अगर कोई sales deck "autonomous" बोले, तो कह दो कागज़ पर उस शब्द की definition लिखकर दो।

Checkpoint Gap: Multi-Hour Agents आ गए, Crash Recovery अब भी नहीं

आठ दिन, तीन persistence models, zero standards

हर vendor असल में क्या save कर रहा है

वो "aha" जो किसी ने landing page पर नहीं लिखा

वो कीमत जो किसी ने pricing page पर नहीं लिखी

अगर तुम यह production में ship कर रहे हो

इस महीने असल में क्या बदला

Keep reading

Browser-Agent Oligopoly जिसके लिए किसी ने वोट नहीं दिया

Tool-Calling खत्म। अब Agents Code लिखते हैं।

हर Agent SDK Runtime Ship करता है। Tests कोई नहीं।

दो Leaks, एक Company, और $852 Billion का IOU