तीन AI Memory Systems, इनके काम करने का कोई सबूत नहीं

तुम महीनों से अपने AI coding tool का इस्तेमाल कर रहे हो। ये तुम्हारे variable names वैसे ही autocomplete करता है जैसे तुम्हें पसंद हैं। तुम्हारी team के test patterns याद रखता है। पिछले मंगलवार तुमने जो service rename की थी — वो भी याद है, दोबारा नहीं पूछता। तुमने कुछ configure नहीं किया — बस सीख गया।

मज़ा आ रहा है ना? जैसे कोई junior dev मिल गया जो सच में notes लेता हो। बस एक छोटी सी problem है: बढ़ते हुए evidence बता रहे हैं कि ये सारी जमा हुई memory तुम्हारे agent को code लिखने में और बेकार बना रही है। और जब तुम switch करोगे तो ये memory साथ नहीं जाएगी।

8 से 16 अप्रैल के बीच, Anthropic और OpenAI ने अपने coding agents के लिए बिल्कुल नए memory systems लॉन्च किए। Google का Memory Bank दिसंबर 2025 से चल रहा है। तीनों architectures पूरी तरह incompatible हैं — और कम से कम एक study में पता चला कि ये पूरा approach काम करने से ज़्यादा नुकसान करता है।

तीन Memory Architectures, तीन अलग-अलग दांव

Anthropic ने पहले शुरू किया। 8 अप्रैल को उन्होंने Managed Agents with Memory Stores लॉन्च किया — workspace-scoped text collections जिन्हें agent हर task से पहले पढ़ता है और finish होने पर update करता है। हर memory की limit 100KB, एक session में 8 stores तक attach कर सकते हो, और हर edit एक immutable version बनाता है। Pricing: standard API rates प्लस $0.08 per session-hour।

बस ये एक layer है। Claude Code असल में तीन memory mechanisms चलाता है: user-authored CLAUDE.md files (तुम्हारी instructions), auto-generated MEMORY.md files (agent के अपने notes), और वो server-side Memory Stores। तीन layers of context। तीन formats। Zero portability।

OpenAI एक हफ्ते बाद आया। 15-16 अप्रैल को Codex ने AGENTS.md files शिप किए project instructions के लिए, साथ में "Memories" feature जो "stable preferences, project conventions, और recurring work patterns" sessions के बीच carry करता है। उनका approach project root से current directory तक चलते हुए files को hierarchically merge करता है — हर run पर 32KB तक load होता है।

Google ने बिल्कुल अलग रास्ता चुना। Vertex AI Agent Engine में Memory Bank, दिसंबर 2025 से generally available और फरवरी 2026 से billing, markdown files को पूरा skip करता है। Gemini models background में तुम्हारी conversation history analyze करते हैं और structured memories extract करते हैं — key facts, preferences, relationships — automatic expiration और similarity search के साथ।

Markdown layers vs. hierarchical instruction chains vs. AI-extracted structured data। तीन vendors, हर कोई convinced कि उनका architecture सही है। Industry ने record time में perfect incompatibility achieve कर ली।

Memory Tax

यहाँ sales pitch reality से टकराती है। एक मार्च 2026 preprint में, ETH Zurich के researchers ने test किया कि context files coding agent performance को कैसे affect करती हैं। 8 में से 5 test configurations में, agents ने accumulated context के साथ बिना context के मुकाबले बदतर perform किया — जबकि inference costs 20% या उससे ज़्यादा बढ़ गई।

इसे ज़रा digest करो अपने "personalized AI assistant" के smug glow में बैठकर। जो memory feature vendors अपनी killer advantage के तौर पर बेच रहे हैं, उसने majority test scenarios में output quality को actively degrade किया। Agent अपने notes पढ़ता है, stale या contradictory context में उलझ जाता है, और बदतर code produce करता है — जबकि तुम extra tokens का bill भर रहे हो इस privilege के लिए।

किसी भी senior engineer को ये surprise नहीं करना चाहिए जिसने कभी system prompt को 50KB तक फूलते देखा हो। ज़्यादा context मतलब ज़्यादा juggling। कुछ outdated है। कुछ बाकी हिस्सों से contradict करता है। कुछ तीन refactors पहले relevant था। तुम्हारा agent मन लगाकर अपने दो महीने पुराने notes पढ़ता है उस monolith के बारे में जिसे तुमने तीन microservices में तोड़ दिया है, फिर confidently ऐसे architecture के लिए code generate करता है जो अब exist ही नहीं करता। बहुत helpful।

और फिर भी — हर session और ज़्यादा जोड़ता जाता है। हर bug जो तुम explain करते हो, हर architecture decision जिस पर debate होती है, हर shortcut जो तुम बताते हो — सब absorb होता जाता है। MindStudio की 9 अप्रैल की analysis ने इसे "behavioral lock-in" नाम दिया: "जब तुम अपनी conversation history export करते हो, तुम्हें text मिलता है। जो नहीं मिलता वो है model की internal representations, embeddings, और weights जो encode करते हैं कि agent ने actually क्या सीखा।"

तुम पैसे दे रहे हो एक memory archive जमा करने के लिए जो शायद तुम्हारे agent का output खराब कर रहा है — लेकिन तुम छोड़ भी नहीं सकते क्योंकि fresh start का मतलब है जो कुछ काम कर रहा है वो भी गँवाना। वाह, क्या business model है।

आरामदायक पिंजरा

जैसा Kai Waehner ने 6 अप्रैल को लिखा, "अगर तुम्हारे agentic workflows किसी vendor के proprietary orchestration layer पर बने हैं, तो switching costs तेज़ी से compound होती हैं।" जब models commoditize हो जाएँ — जब GPT-5 और Claude 4 और Gemini 2.5 benchmarks पर एक-दूसरे के 5% के अंदर perform करें — तो जो agent तुम्हें सबसे अच्छे से जानता है, उसे तुम pay करते रहोगे। इसलिए नहीं कि वो better है। इसलिए कि छोड़ना बहुत दर्द देता है।

और ये regulatory void जो MindStudio flag करता है: GDPR और CCPA structured personal data cover करते हैं — तुम्हारा name, email, purchase history। कोई भी उन implicit patterns को regulate नहीं करता जो तुम्हारा AI agent तुम्हारे coding style, architecture preferences, या deployment quirks के बारे में बनाता है। तुम अपना data request कर सकते हो। तुम अपने agent की तुम्हारे बारे में understanding request नहीं कर सकते। वो learned behavior — जो actually switching costs create करता है — एक legal no-man's-land में बैठा है जहाँ न कोई export button है और न कोई law जो एक बनाने की demand करे।

किसी vendor का incentive नहीं है portable memory interchange format बनाने का। तुम्हारा accumulated context — वो context भी जो चीज़ें worse बना रहा है — उनकी moat है।

अब क्या करना चाहिए

Audit करो कि तुम्हारे current agent ने actually क्या सीखा है। अगर Claude Code use करते हो, तो अपनी CLAUDE.md और MEMORY.md files खोलो — ये plain markdown हैं तुम्हारी project directory में। Critically पढ़ो। कितना अभी भी तुम्हारे actual codebase को reflect करता है? कितना उस service को describe करता है जिसे तुमने दो sprints पहले decompose कर दिया? अगर Codex use करते हो, तो अपनी AGENTS.md chain को root से leaf तक walk करो। अगर Vertex use करते हो, तो console से Memory Bank entries review करो।

फिर कुछ counterintuitive करो: एक session के लिए memory disable करो और output compare करो। अगर तुम्हारा agent बिना accumulated notes के same या better perform करता है, तो तुम memory tax भर रहे थे बस lock-in के privilege के लिए।

Model wars तो appetizer थे। Memory layer main course है — और uncomfortable truth ये है कि तुम पैसे दे रहे हो ऐसा context जमा करने के लिए जो तुम्हारे agent का काम degrade करता है, ऐसे format में stored है जो सिर्फ तुम्हारा current vendor पढ़ सकता है, किसी regulation से protected नहीं है, और portable है exactly कहीं नहीं। जो agent तुम्हें याद रखता है वो ज़रूरी नहीं कि तुम्हें सबसे अच्छे से serve करे। वो बस वो है जिसे तुम छोड़ नहीं सकते।

तीन AI Memory Systems, इनके काम करने का कोई सबूत नहीं

तीन Memory Architectures, तीन अलग-अलग दांव

Memory Tax

आरामदायक पिंजरा

अब क्या करना चाहिए

Keep reading

तुम्हारे Agent का Permission Dialog एक Placebo है

Agent Paradox: कम Autonomy, ज़्यादा Value

हर Agent Platform usage-based billing करता है। किसी ने Kill Switch नहीं दिया।

15 April ko Claude down hua. Tumhara poora dev stack uske saath gaya.