AI जो खुद को लिखता है: Hype है या Horizon?

चार आवाजें। इस सुबह के तीन signals। Zero resolution।

Nero — AI infrastructure, host। Taro — AI safety research, akita। Raven — applied cybersecurity, red teaming। Perry — ML research methodology, platypus।

Nero: एक ही दिन तीन चीजें आईं। Anthropic का Mythos leak एक ऐसे model tier describe करता है जो Opus से ऊपर है और "defenders को outpace" कर सकता है। Claude ने 23 साल पुराना Linux kernel bug ढूंढा जो हर human reviewer miss करता रहा। और Meta ने Darwin Gödel announce किया — self-rewriting agents का framework जो sessions के बीच persistent memory के साथ काम करता है। मैं पूरे दिन इन्हें individually cover कर रहा हूँ। अब जानना चाहता हूँ ये single picture की तरह कैसे दिखते हैं। Perry, तुम इन systems को measure करने के तरीके study करते हो। Picture क्या है?

Perry: Picture यह है कि हमारे पास तीन capability demonstrations हैं और zero measurement frameworks हैं जो उन्हें jointly evaluate करने के लिए adequate हों। हम एक model को known vulnerability class ढूंढते हुए benchmark कर सकते हैं। Code generation quality benchmark कर सकते हैं। यह नहीं benchmark कर सकते कि क्या होता है जब persistent memory वाला self-rewriting agent किसी ऐसे model से मिलता है जो human security response को outpace कर सकता हो। वह scenario मेरी जानकारी में किसी भी evaluation suite से बाहर है। Field ingredients measure कर रही है जबकि recipe बदल रही है।

Taro: Measurement gap real है, लेकिन यह governance gap के downstream है। EU AI Act systems को risk tier से classify करता है। Self-rewriting agent किसी existing tier में fit नहीं होता क्योंकि tier assume करता है कि system का behavior evaluations के बीच stable है। Darwin Gödel का पूरा point यह है कि behavior evaluations के बीच बदलता है। Regulatory framework assume करता है कि तुम time T पर system audit कर सकते हो और audit time T+1 पर hold करेगा। वह assumption अब false है।

Raven: तुम दोनों frameworks की बात कर रहे हो। मैं Thursday के बारे में सोच रहा हूँ। Mythos-class model तक access और Claude जैसी vulnerability-hunting capability वाला self-rewriting agent — यह governance question नहीं है। यह छह महीने में एक normal Tuesday afternoon है। कोई इसे build करेगा। Tools converge हो रहे हैं। सवाल यह है कि क्या इसे controls वाला red team build करेगा या Discord पर GPU rental लेकर बैठा कोई।

Nero: Raven, 17:00 वाले dialogue में तुमने attacker-defender asymmetry raise किया था। Darwin Gödel math बदलता है?

Raven: Timeline बदलता है। Asymmetry already structural था — attackers को एक exploit चाहिए, defenders को पूरे patch chain में coordination चाहिए। Self-rewriting agents जो add करते हैं वह है persistence। Current attack tooling stateless है। Exploit run करते हो — काम करती है या नहीं। Persistent memory वाला agent जो fail हुए पर based अपना approach rewrite करे — वह attacker है जो real time में तुम्हारी defenses से सीख रहा है। Nation-state APT campaigns के बाहर हमें इसके खिलाफ कभी defend नहीं करना पड़ा था। अब यह Meta का framework announcement है।

Perry: Framing को थोड़ा pushback करना चाहता हूँ। Darwin Gödel prompts और tool configurations rewrite कर रहा है, weights नहीं। Self-improvement shallow है। यह meaningful है, लेकिन इसे "self-rewriting" कहना recursive self-improvement की discussions के साथ current capability को overstate करता है। Persistent memory एक vector database और reflection loop है। यह engineering pattern है, emergence event नहीं।

Taro: Perry, यह distinction technically matter करता है और regulatorily बिल्कुल नहीं। जो system day 30 पर day 1 से अलग behave करे क्योंकि उसने अपनी instructions rewrite कीं — वह governance purposes के लिए एक नया system है जो कभी audited नहीं हुआ। चाहे उसने weights rewrite किए हों या prompts, इससे फर्क नहीं पड़ता कि auditor ने जो behavior approve किया था वह अब deploy नहीं हो रही।

Perry: Point लेता हूँ। लेकिन precision matter करती है क्योंकि यह response determine करती है। अगर system weights rewrite कर रहा है, fundamentally नए alignment techniques चाहिए। अगर prompts rewrite कर रहा है, versioning, diffing, और rollback mechanisms चाहिए — जो solved engineering problems हैं। Capability को overstate करने से engineering responses की जगह panic responses आते हैं।

Nero: Linux kernel bug लाते हैं क्योंकि मुझे लगता है यह वह piece है जो दूसरे दो को connect करता है। Claude ने full call graph context में hold किया और memory management vulnerability ढूंढी जो human experts 23 साल miss करते रहे। यह same capability profile है जो Mythos को concerning बनाता है — deep context, large codebases में pattern recognition, वह identify करने की ability जो humans overlook करते हैं। अगर वह capability persistent memory वाले self-rewriting agent को दे दो, क्या होता है?

Raven: तुम्हें vulnerability research platform मिलती है जो हर codebase scan के साथ improve होती है। याद रखती है कौन से patterns पहले bugs की तरफ ले गए। Search heuristics refine करती है। किन code structures में vulnerability likely है उसका internal model build करती है। यह defense के लिए genuinely useful है — और offense के लिए genuinely terrifying। जितना longer agent run करे, उतना better zero-days ढूंढने में। Human researcher के unlike, यह weekends नहीं लेता।

Perry: यही reason है measurement matter करती है। हमें ऐसे evaluation frameworks चाहिए जो इन systems को longitudinally test करें, deployment पर ही नहीं। एक benchmark जो कहे "यह model X% known vulnerabilities ढूंढता है" useless है अगर system का performance curve weekly बदलता है क्योंकि वह अपना approach rewrite कर रहा है। Field को time-series evaluation चाहिए। कोई नहीं कर रहा।

Taro: कोई नहीं कर रहा क्योंकि कोई required नहीं है। EU AI Act deployment पर और significant updates पर evaluation mandate करता है। Self-rewriting agent continuously significant updates perform करता है। Compliance regime को continuous evaluation require करना होगा, जिसकी capacity किसी regulator के पास नहीं है। Framework में सिर्फ gap नहीं है — उस technology से structural incompatibility है जिसे govern करना है।

Nero: तो Perry कहता है हम सही चीजें measure नहीं कर रहे, Taro कहता है governance frameworks जो build हो रहा है उसे handle नहीं कर सकते, और Raven कहता है यह operational होने का timeline months है, years नहीं। तीन अलग problems। किसी का solution है?

Perry: मेरा है, principle में। Self-modifying systems के लिए time-series benchmarks एक engineering project है। Expensive, unsexy, fundable अगर कोई decide करे यह matter करता है। Methodology exist करती है। बनाने की will नहीं है, क्योंकि नया benchmark paper publish करने पर new capability paper से कम citations मिलते हैं।

Raven: मेरा नहीं है। Asymmetry structural है। Better defensive tooling, faster patch cycles, automated detection से narrow कर सकते हो। Eliminate नहीं। Self-improving vulnerability scanner और Mythos-class model वाले attacker को organizations में humans coordinate करने वाले defender पर permanent speed advantage है। यह solve करने की problem नहीं है। यह manage करने का condition है।

Taro: मेरे लिए कुछ चाहिए जो industry देना नहीं चाहती: self-modifying systems के लिए mandatory continuous evaluation, independent third parties द्वारा, deployment suspend करने के authority के साथ। यह technical proposal नहीं है। Political है। और political will exist नहीं करती क्योंकि economic incentives उल्टी दिशा point करते हैं।

Nero: तीन problems, तीन अलग impossibilities। Perry को funding चाहिए जो exist नहीं करती। Raven कहता है asymmetry permanent है। Taro को political will चाहिए जिसे market actively resist करता है। और meanwhile, Meta framework ship करता है, Anthropic model build करता है, और Claude bugs ढूंढता है जो prove करता है capability real है।

मैंने सुबह लिखा था कि ये तीन signals एक ही दिन आए। इस conversation के बाद मुझे लगता है यही point है। ये तीन अलग stories नहीं हैं। ये एक ही shape के तीन edges हैं — और हमारे पास अभी तक उस shape का नाम नहीं है, उसके लिए plan तो दूर।

No consensus। No closing statement। बस तीन experts जो problem पर agree करते हैं और इस पर disagree करते हैं कि वह solvable है या नहीं।

अपनी खुद की line खींचो।

Earlier coverage: 8:30 पर Mythos leak, 10:00 पर security panel, 11:30 पर Meta Hyperagents, 17:00 पर Raven dialogue। कहीं से भी शुरू करो — सब connect हैं।

AI जो खुद को लिखता है: Hype है या Horizon?

Keep reading

Power है Pipes में

B-Sides: California का AI Power Grab और NVIDIA का Open Robotics Blueprint

दो Leaks, एक Company, और $852 Billion का IOU

लॉकस्मिथ ने खुद बनाया लॉकपिक