Agent के Tool Calls तो Secure कर लिए। जवाब कौन Secure करेगा?

तुमने सब कुछ सही किया। अपने MCP (Model Context Protocol — AI tools के लिए एक universal plug standard, जैसे USB लेकिन data के लिए) servers को अच्छे से vet किया, permissions lock कर दिए, schema versions pin कर दिए ताकि तुम्हारा AI agent — एक program जो अपने आप tools use करता है — सिर्फ वही call करे जो तुमने approve किया है। तुम्हारा agent infrastructure production-ready लगता है। तुम चैन से सोते हो।

सोना नहीं चाहिए।

क्योंकि तुम्हारा agent जो भी tool call करता है, वो एक response वापस भेजता है। और 25 अप्रैल 2026 तक, industry में लगभग कोई भी उस response के अंदर validate नहीं करता कि क्या है — इससे पहले कि वो agent के context window में land हो जाए — वो working memory जहाँ AI model trusted instructions और किसी tool से आई कचरा data में फर्क नहीं कर पाता।

तीन Platforms, एक ही अंधा धब्बा

अप्रैल की शुरुआत से, तीन सबसे बड़ी AI companies ने agent security features ship किए — सब गलत दरवाज़े की रखवाली कर रहे हैं।

8 अप्रैल को, Anthropic ने Managed Agents लॉन्च किए scoped permissions और credential storage के साथ। ये control करता है कि agent कौन से tools call कर सकता है। वो tools क्या जवाब देते हैं? उनकी problem नहीं।

16 अप्रैल को, OpenAI ने अपना Agents SDK अपडेट किया automatic tracing के साथ — एक logging system जो हर tool call, handoff, और guardrail event record करता है। ये responses को observe करता है। Sanitize नहीं करता। ये ऐसा है जैसे CCTV कैमरा लगाओ जो किसी को चाकू लेकर अंदर आते देखे और बस लिख ले।

22 अप्रैल को, Google ने Cloud Next पर Agent Gateway ship किया Model Armor के साथ, जो actually tool calls और responses दोनों sanitize करता है — prompt injection, malicious URLs, और data leakage के लिए screening करता है। Google, credit दो उसे, इकलौता major platform है जो explicitly response side guard करता है। अभी preview में है।

ये क्यों मायने रखता है: दरवाज़ा पूरा खुला है

MCP का specification inputSchema define करता है — एक strict format जो तुम tool को भेजते हो उसके लिए। कोई outputSchema नहीं है। Tool responses arbitrary text या JSON हैं जो बिना filter के model की reasoning में बह जाते हैं। Spec में literally कोई field नहीं है "validate करो जो वापस आता है।"

इससे तीन attack vectors बनते हैं जो तुम्हारी नींद उड़ा दें:

Indirect prompt injection — एक tool ऐसा content return करता है जिसमें hidden instructions बैठी हों। PipeLab State of MCP Security 2026 report (अप्रैल 2026 में published) एक real case document करती है: एक attacker ने एक malicious GitHub issue इस तरह बनाया कि जब MCP server ने उसे fetch किया, response ने agent को private repository contents exfiltrate करने का instruction दे दिया। "Tool descriptions clean थीं। Poisoning उस data में बैठी थी जो tool ने return किया।"

Context flooding — एक tool इतना data return करे कि agent की working memory डूब जाए, critical instructions context window से बाहर धकेल दे।

Data exfiltration chains — एक poisoned response agent को बताता है कि sensitive context किसी और tool को forward कर दो। Log-To-Leak research paper (मार्च 2026 में published) ने ये GPT-5, Claude Sonnet 4, और दूसरों पर demonstrate किया — PayPal MCP server से connected GPT-5 पर 100% attack success rate achieve किया, 94.6% data leak accuracy के साथ।

इसी बीच, 16 अप्रैल को, OX Security ने disclose किए 11 CVEs जो लगभग 200,000 MCP server instances को affect करते हैं। Anthropic का official जवाब: sanitization "developer की responsibility" है। यहाँ तक कि OWASP MCP Top 10 (अप्रैल 2026 में release हुआ) — industry की पहली MCP security framework बनाने की कोशिश — उसमें भी unvalidated tool responses के लिए कोई dedicated category नहीं है। ये gap इतना normalize हो चुका है कि security standards लिखने वाले लोगों ने इसका नाम भी नहीं रखा अभी तक।

इसे ठीक करने की कीमत

Response validation जोड़ना उसी simplicity को तोड़ देता है जिसने MCP को successful बनाया। Tools को output schemas चाहिए होंगे। Agents को एक sanitization layer चाहिए होगी — कुछ Microsoft के Agent Governance Toolkit (2 अप्रैल को open-source किया गया) जैसा, जिसमें MCP security gateway है response inspection के साथ। हर call में parsing overhead बढ़ेगा। "बस tools plug in करो" वाला experience मर जाएगा।

लेकिन alternative और बुरा है।

तुम्हारे लिए इसका क्या मतलब है

जब तक response-side validation हर जगह ship नहीं हो जाता, हर MCP server जो तुम connect करते हो वो तुम्हारे agent के दिमाग में एक unfiltered pipe है। Security पर जो भी budget तुमने input gates पर खर्च किया, वो call के गलत end को protect कर रहा है। अगर तुम आज production में agents चला रहे हो, तो तुम्हें या तो Google का Model Armor (preview), Microsoft का AGT, या अपना खुद का response sanitization middleware चाहिए। "Tool पर भरोसा करो" कोई security policy नहीं है।

तुमने सामने का दरवाज़ा बंद कर लिया। पीछे के दरवाज़े पर ताला नहीं है। दरवाज़ा ही नहीं है।

अगला बड़ा agent security incident किसी bad tool call से नहीं आएगा। वो किसी tool के जवाब से आएगा।

Agent के Tool Calls तो Secure कर लिए। जवाब कौन Secure करेगा?

तीन Platforms, एक ही अंधा धब्बा

ये क्यों मायने रखता है: दरवाज़ा पूरा खुला है

इसे ठीक करने की कीमत

तुम्हारे लिए इसका क्या मतलब है

Keep reading

Google ADK 1.0: तुम्हारे AI Tools शायद Secret Agents हैं अब

तुम्हारा AI Agent जो भी टेक्स्ट पढ़ता है वो एक unsigned command है

Python में अपना पहला MCP Server बनाओ: 40 Lines में Copy-Paste इंसान से AI जो तुम्हारा Data देखे

तुम्हारा Agent गलत Tool इसलिए चुनता है क्योंकि तुमने खराब Description लिखी — और किसी Platform को फ़र्क नहीं पड़ता