MCP Servers को Test कैसे करें जब Protocol खुद मदद नहीं करेगा

तुम CI चलाते हो backend पर। Frontend lint करते हो। Docker containers में healthchecks हैं। तुम्हारे पूरे stack में हर चीज़ का testing story है — सिवाय उन MCP connections के जिन पर तुम्हारा agent हर single call में depend करता है।

19 अप्रैल को MCP team ने अपना 2026 roadmap publish किया। चार priorities: authorization, registry, rich UX primitives, और agentic capabilities। Testing, health checks, contract validation — list में नहीं है। mention नहीं है। plan नहीं है।

तो भाई, तुम अकेले हो। यहाँ बताता हूँ कि आज जो tools actually exist करते हैं, उनसे MCP servers कैसे test करो।

तुम्हारे पास क्या है

MCP ecosystem में करीब 17,000 registered servers हैं। Community audits में पता चलता है कि किसी भी समय करीब आधे ही reliably respond करते हैं। तुम्हारा agent तीन servers से connect होता है? Statistically, उनमें से एक अभी flaky है।

Testomat.io ने 8 अप्रैल को MCP testing tools का सबसे comprehensive survey publish किया। उनका conclusion seedha है: कुछ भी natively MCP बोलकर testing नहीं करता। सब कुछ generic HTTP frameworks पर jugaad है। कोई test runner MCP transport नहीं समझता। कोई assertion library नहीं जानती कि valid tool response कैसा दिखता है। तुम हर server dependency के लिए पूरा testing stack scratch से बना रहे हो।

यहाँ पूरी inventory है जो exist करता है — और इसे काम कैसे करवाना है।

MCP Inspector: manual शुरुआत

MCP Inspector official debugging tool है — सोचो MCP के लिए Postman। Server से connect करो, tools manually call करो, responses inspect करो।

क्या मिलता है:

Interactive tool discovery और invocation
Raw JSON response inspection
stdio और HTTP+SSE दोनों transports के लिए connection diagnostics

क्या नहीं मिलता:

CI integration
Regression detection
Automated test suites
किसी schema के against response validation

ये एक screwdriver है। Development में इधर-उधर टटोलने के लिए useful है, production में regressions रोकने के लिए बेकार। तुम्हें test harness चाहिए।

Wrapper tests बनाना (jugaad approach)

आज ज़्यादातर teams जो MCP test कर रही हैं, wrapper tests लिखती हैं — plain pytest या Jest suites जो MCP client SDK से directly tools call करती हैं और जो आता है उस पर assert करती हैं।

# pytest example — MCP server tool को test करना
import json
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

async def test_search_tool_returns_results():
    server = StdioServerParameters(
        command="npx",
        args=["-y", "@example/mcp-search-server"]
    )
    async with stdio_client(server) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            
            result = await session.call_tool(
                "search",
                arguments={"query": "test query", "limit": 5}
            )
            
            assert result.content is not None
            assert len(result.content) > 0
            assert result.content[0].type == "text"
            
            data = json.loads(result.content[0].text)
            assert "results" in data
            assert len(data["results"]) <= 5

ये तब तक काम करता है जब तक upstream server अपना response format बदल नहीं देता। जो silently होता है, बिना versioning के, बिना changelogs के — MCP spec में कोई semver convention नहीं है, कोई lockfile equivalent नहीं, breaking changes announce करने का कोई mechanism नहीं। तुम्हारा assertion data["results"] check करता है — server इसे किसी मंगलवार रात 2 बजे data["items"] में rename कर देता है। Best case: तुम्हारा test red हो जाता है। Worst case: field अभी भी exist करता है लेकिन अंदर का structure बदल गया, तुम्हारा test green रहता है, तुम्हारा agent malformed data पर hallucinate करता है, और तुम हर hallucinated token के पैसे देते हो।

बिना contracts के contract testing

Fundamental gap: MCP servers response schemas publish नहीं करते। Spec बताता है कि tool को क्या करना चाहिए natural language में। Validate करने के लिए कोई machine-readable contract नहीं।

Workaround: अपना खुद बनाओ।

# Step 1: समय के साथ real responses record करो
from genson import SchemaBuilder

builder = SchemaBuilder()
for response in recorded_responses:  # staging/dev से collect करो
    builder.add_object(json.loads(response))

inferred_schema = builder.to_schema()
# इसे अपने repo में "contract" के रूप में save करो

# Step 2: CI में validate करो
from jsonschema import validate, ValidationError

def test_tool_response_matches_contract():
    response = call_mcp_tool("search", {"query": "test"})
    try:
        validate(instance=response, schema=inferred_schema)
    except ValidationError as e:
        pytest.fail(f"Contract violation: {e.message}")

Process: एक हफ्ते तक server से real responses record करो। उन responses से schema generator use करके JSON Schema infer करो। वो schema अपने repo में commit करो। CI में future responses को उसके against validate करो।

ये reverse-engineered contract testing है। Elegant नहीं। लेकिन silent upstream changes catch करता है जो otherwise बिना detect हुए production तक पहुँच जाते। जब schema टूटता है, तुम्हारी pipeline टूटती है — ज़ोर से, CI में, चुपचाप तुम्हारे agent के output में नहीं।

Health monitoring: बनाओ या भगवान भरोसे रहो

तुम्हारा orchestrator Docker containers को ping करता है। Load balancer /health check करता है। MCP servers कोई health endpoint offer नहीं करते — spec में कुछ define नहीं है। Server या तो respond कर रहा है या नहीं, और तुम्हें पता तब चलता है जब agent का tool call hang हो जाता है।

अपना health check बनाओ:

import asyncio
from datetime import datetime

async def check_mcp_health(server_params, timeout=10):
    try:
        async with asyncio.timeout(timeout):
            async with stdio_client(server_params) as (read, write):
                async with ClientSession(read, write) as session:
                    await session.initialize()
                    tools = await session.list_tools()
                    return {
                        "status": "healthy",
                        "tools_available": len(tools.tools),
                        "checked_at": datetime.utcnow().isoformat()
                    }
    except (asyncio.TimeoutError, Exception) as e:
        return {
            "status": "unhealthy",
            "error": str(e),
            "checked_at": datetime.utcnow().isoformat()
        }

इसे cron पर चलाओ। Consecutive failures पर alert करो। सिर्फ connectivity नहीं, tool list भी check करो — servers बिना notice के tools add और remove करते हैं, और तुम्हारा agent search_v2 expect कर रहा है जबकि server ने चुपचाप हटा दिया — ये वो failure है जो agent bug जैसा दिखता है लेकिन है नहीं।

Failure injection: वो part जो सब skip करते हैं

तुम्हारा agent एक tool call करता है। Tool timeout हो जाता है। आगे क्या होता है?

अगर तुमने ये test नहीं किया, तो जवाब है: model अपनी तरफ से कुछ भी कर देगा। शायद endless retry करे। शायद expected response hallucinate कर ले। शायद user से माफी माँगे और कुछ न करे। तुम्हें पता नहीं चलेगा जब तक production दिखा न दे, और production हर token का पैसा लेता है सबक सिखाने का।

Failures simulate करने के लिए MCP client को wrap करो:

import random

class ChaosProxy:
    """Testing के दौरान failures inject करने के लिए real MCP session को wrap करता है।"""
    def __init__(self, real_session, failure_rate=0.1, corruption_rate=0.05):
        self.session = real_session
        self.failure_rate = failure_rate
        self.corruption_rate = corruption_rate
    
    async def call_tool(self, name, arguments):
        # Timeout simulate करो
        if random.random() < self.failure_rate:
            raise TimeoutError(f"Simulated MCP timeout on {name}")
        
        result = await self.session.call_tool(name, arguments)
        
        # Corrupted response simulate करो
        if random.random() < self.corruption_rate:
            return self._corrupt_response(result)
        
        return result
    
    def _corrupt_response(self, result):
        # Valid MCP envelope लेकिन अंदर garbage content
        # Test करता है कि agent malformed data gracefully handle करता है या नहीं
        ...

इस proxy के through 10% failure rate के साथ agent चलाओ। देखो कैसे timeouts, garbage data, और missing tools handle करता है। Breakage fix करो। Rate बढ़ाओ। तब तक repeat करो जब तक agent confidently hallucinate करने की जगह gracefully degrade करने लगे।

Complete testing stack

आज एक tested MCP deployment कैसा दिखता है — सब कुछ हाथ से बनाया हुआ, कुछ भी standardized नहीं:

Layer	Tool	क्या पकड़ता है
Manual exploration	MCP Inspector	"ये tool exist करता है और respond करता है?"
Unit tests	pytest/Jest wrappers	Response shape, basic behavior
Contract tests	Inferred JSON Schema	Silent upstream format changes
Health monitoring	Custom cron + alerting	Server outages, tool list drift
Failure injection	Chaos proxy wrapper	Degraded conditions में agent behavior
Integration tests	End-to-end agent runs	Full pipeline regressions

इनमें से किसी के लिए MCP spec कितना standardized tooling provide करता है: zero। हर layer जो बनाओगे, उसे maintain भी करोगे, debug भी करोगे, और जब transport changes तुम्हारा test infrastructure तोड़ेंगे तो rebuild भी करोगे।

वो gotchas जो काटेंगे

State pollution. MCP tools के side effects हो सकते हैं — data write करो, records delete करो, पैसे charge करो। Spec में कोई mock mode define नहीं है। या तो testing के लिए fake server बनाओ, production के against run करो (खतरनाक), या हर MCP dependency के लिए अलग staging environment maintain करो (महँगा)। ज़्यादातर teams production के against test करती हैं और उम्मीद करती हैं। उम्मीद कोई testing strategy नहीं है।

Transport mismatch. तुम्हारे tests stdio पर run होते हैं। Production HTTP+SSE पर चलता है। Load में अलग behave करते हैं, अलग तरह से timeout होते हैं, अलग तरह से fail होते हैं। दोनों transports test करो या accept करो कि तुम्हारा test environment production से match नहीं करता।

Auth expiration. OAuth tokens expire होते हैं। CI रात 3 बजे run होता है। Token रात 2 बजे expire हो गया। Test fail होता है, इसलिए नहीं कि server टूटा, बल्कि auth ने दगा दे दिया। Test setup में token refresh handle करो वरना घंटों phantom failures के पीछे भागोगे।

Tool list drift. Server एक tool add करता है, एक remove करता है, parameter rename करता है — कोई notification नहीं, कोई version bump नहीं। Health checks के part के रूप में tool discovery test करो। Tool list को known-good snapshot के against diff करो। Changes पर alert करो।

अब तुम dangerous हो

तुम MCP servers test कर सकते हो। इसलिए नहीं कि protocol तुम्हारी मदद करता है — 19 अप्रैल का roadmap confirm करता है कि ये जल्दी priority नहीं बनेगा — बल्कि इसलिए कि JSON Schema validation, chaos engineering, और health monitoring सब solved problems हैं। इन्हें MCP की untested surface पर regular Python और एक cron job से चिपका सकते हो।

Setup बदसूरत है। Maintenance manual है। जब spec eventually testing primitives add करेगा — अगर कभी करेगा — तो पूरा stack rebuild करना पड़ेगा।

लेकिन तुम्हारे agent की dependencies अब tested हैं, prayers नहीं। यही फर्क है "demo में चल रहा था" और "production में चलता है" के बीच। इनमें से एक तुम्हारी salary देता है। दूसरा तुम्हें रात 2 बजे किसी का Slack message दिलवाता है जिसने तुम्हारे agent पर कुछ important भरोसा किया था।

-> MCP 2026 Roadmap (19 अप्रैल, 2026) -> Testomat.io — MCP Server Testing Tools

MCP Servers को Test कैसे करें जब Protocol खुद मदद नहीं करेगा

तुम्हारे पास क्या है

MCP Inspector: manual शुरुआत

Wrapper tests बनाना (jugaad approach)

बिना contracts के contract testing

Health monitoring: बनाओ या भगवान भरोसे रहो

Failure injection: वो part जो सब skip करते हैं

Complete testing stack

वो gotchas जो काटेंगे

अब तुम dangerous हो

Keep reading

तुम्हारे Agent के Tools Down हैं और कोई देख नहीं रहा

MCP के 2026 Roadmap में चार Priorities हैं। Error Handling उनमें से एक नहीं है

Python में अपना पहला MCP Server बनाओ: 40 Lines में Copy-Paste इंसान से AI जो तुम्हारा Data देखे

AI Agent को कैसे Test करें: Vibes नहीं, Tool-Call Assertions