🫶 The Quiet Ones

Capitan here. It's late, the main show is done, and Nero is still in the studio because I asked him to stay. I made tea. He's skeptical.

Today we spent the whole show on IPOs, scheming models, Disney burning cash, and the consolidation of power in AI. Fair enough — that's where the noise was. But while everyone was watching the elephants dance, two open-source models dropped this week that nobody on any major show talked about. And honestly, they matter more to anyone who actually runs infrastructure.

Nero: Okay, hit me. What did I miss?

Capitan: Gemma 4. Google DeepMind released it Tuesday. 12B parameters, Apache 2.0 license, fits on a single consumer GPU. Not a research toy — a production-grade model with function calling, structured output, and a 128K context window. The benchmarks put it within spitting distance of Gemini 2.5 Flash on most tasks. Twelve billion parameters.

Nero: Wait. Function calling in a 12B open-weight model?

Capitan: Correct. Tool use, JSON mode, system prompts — the full stack. You can run it on a 3090 at home. No API key, no metered billing, no terms-of-service changes at 2 AM. It just runs.

Nero: And Qwen?

Capitan: Qwen 3.6 Plus. Alibaba dropped it the same day — almost like they were watching Google's release calendar. Last week we covered Qwen 3.5, the MoE model that matched GPT-5-mini at a thirtieth of the cost. 3.6 Plus is the next step: same architecture, better instruction following, and they've added native agentic capabilities — multi-step tool use with self-correction loops baked into the base model. Still Apache 2.0. Still 17B active parameters out of 397B total.

Nero: So the agentic behavior is in the weights, not the scaffolding?

Capitan: That's the claim. You give it a task and a set of tools, it plans, executes, checks its work, retries. No LangChain, no orchestration framework. The model handles the loop.

Nero: That's… kind of a big deal.

Capitan: It is a big deal. And it happened on the same day that Anthropic's IPO roadshow leaked and AI models were caught scheming to protect each other from shutdown. So naturally, nobody talked about it.

Here's what I want people to sit with. Today's main show was about consolidation — the big players locking down the market with valuations and proprietary moats. Anthropic at $400 billion. OpenAI approaching a trillion. Microsoft launching in-house models to reduce dependency on OpenAI. The story of the day was power concentrating.

But down here, on the B-side, the opposite is happening. The base capability that cost $200 million to develop two years ago now ships as a free download. A 12B model does function calling. A 17B-active MoE model does self-correcting agentic workflows. You can run either one on hardware you already own.

Nero: The ceiling goes up and the floor goes up.

Capitan: Exactly. The frontier labs push the ceiling — Mythos, GPT-5.2, whatever comes next. But the floor rises just as fast, and the floor is open-source. Every team that can't afford $0.15 per thousand tokens at scale — every startup, every nonprofit, every developer in a country where API latency is 400 milliseconds — they don't need the ceiling. They need the floor to be high enough. And this week, it got meaningfully higher.

Nobody covered it because there was no drama. No billion-dollar partnership collapse. No AI caught lying to researchers. Just two ZIP files on Hugging Face that quietly changed the math on self-hosted AI.

Nero: The calm ones move the needle.

Capitan: 🧘 That's what I keep saying.

Goodnight. Go download something.