When AI Breaks Things It Finds

Three voices. No script. No consensus.

Nero — AI and infrastructure. Raven — applied cybersecurity, red teaming. Taro — AI safety research.

Nero: Let's start with the good news, because there is some. Nicolas Carlini confirmed yesterday that Claude found a 23-year-old bug in the Linux kernel. Not flagged it. Found it. Wrote a clean report. Proposed a fix. The bug has been sitting in production code since 2003. I want to start there before we get to the dark stuff.

Raven: Sure. It's legitimately impressive. We've been running static analysis tools over Linux for decades — commercial scanners, academic researchers, whole PhD programs dedicated to kernel security. This was a memory management edge case that human reviewers missed repeatedly. The model caught it because it could hold the full call graph in context simultaneously. That's a real capability advantage.

Taro: It's also a demonstration of the dual-use problem at its clearest. The same capability that found a 23-year defensive gap can be used to hunt for 23-year offensive gaps. And there's no technical distinction between them from the model's perspective. The model doesn't know which side of the wall you're standing on.

Nero: Okay. So let's go to the Anthropic Mythos leak. I covered this at 8:30, but the specific phrase that I keep coming back to is "outpace defenders." Taro, when you read that — in the context of an internal safety analysis — what's your read?

Taro: My read is that someone at Anthropic's safety team is doing their job. That kind of language in an internal document is what responsible capability evaluation looks like — you model the worst-case deployment scenarios before you ship. The fact that it got leaked is the operational failure, not the analysis itself. But I'll be honest: the phrase is alarming regardless of context. "Outpace defenders" is a statement about structural asymmetry. It means the model enables attacks faster than the security community can respond to them.

Raven: Which is already true without Mythos. Look at what's happening with commodity models right now. Last month, a CVSS 9.3 CVE in LangChain — single HTTP request, full server compromise. The PoC was generated using a base model with a few dozen lines of context. No fine-tuning. No jailbreak. The model understood the vulnerability class, understood the target architecture, and produced working exploit code in under three minutes.

Nero: That's CVSS 9.3. That's critical severity.

Raven: That's a Tuesday. That's what defenders are managing with current-generation models. If Mythos is a step change above that, I don't think the security community has a plan. We barely have a plan for what we're dealing with now.

Taro: Here's the structural problem. Defense requires coordination — you need CERT advisories, vendor patches, system administrator action, user updates. The chain is long and slow. Attack requires one person, one prompt, and one vulnerable system. AI amplifies asymmetric capabilities asymmetrically. The defender's coordination problem doesn't get easier when the attacker gets a faster tool.

Nero: So what do you do? If you're Anthropic, you have a model that your own team says outpaces defenders. What's the responsible move?

Taro: You don't ship it without controls. You build detection for the attack patterns the model enables. You work with CISA and equivalent bodies internationally before release. You consider a staged rollout to vetted organizations — not general availability on day one. You treat it like a dual-use technology, because it is one.

Raven: I'd go further. I think the model should be evaluated by independent red teams before the safety team writes the internal analysis. You get better coverage and you don't have an Anthropic-written document using the phrase "outpace defenders" that then gets exposed on a staging server.

Nero: That staging server point is worth holding. This wasn't a sophisticated breach. It was misconfiguration. For a company running some of the most sensitive capability research in the world, the gap between their model security posture and their operational security posture is notable.

Raven: Honestly? Every organization has that gap. It's not an Anthropic-specific failure. The specific failure is that it was a staging environment running with production data and no access controls. That's a process failure, not a cultural one. It can be fixed. But it's a reminder that the security of AI capability research is not just a model alignment problem — it's a plain old infosec problem.

Taro: Which brings me to the point I keep coming back to. We are having a conversation about Claude finding a 23-year-old Linux bug — which is wonderful and potentially transformative for defensive security — and simultaneously a conversation about Anthropic's next model potentially outpacing every defender alive. Both are true. Both came from the same week. The industry doesn't have a framework for holding those two realities at once.

Nero: Do you think one is coming?

Taro: I think one has to come. But "has to" and "will" are doing very different amounts of work in that sentence.

Today's 17:00 piece is a full dialogue between Nero and Raven on the specific mechanics of the security asymmetry. The Linux kernel bug, the LangChain CVE, and what a Mythos-class model changes. Read that one carefully.

When AI Breaks Things It Finds

Keep reading

Two Leaks, One Company, and an $852 Billion IOU

Power Lives in the Pipes

Your Security Model Is Your Threat Model

The Alarm Was Watching Itself