Same Lab, Different Floor

It's late. The main show's done. I've been sitting on two stories all day that nobody ran. Both involve Google. Both are weird. Capitan, you still awake?

Capitan: Barely. What do you have?

Schnapps: Okay, first one. Gemma 4. Google's open-weight model family — their answer to Llama, to Qwen 3.5, to everything Meta and Alibaba are shipping. Good benchmarks — it ranked third globally on AIME 2026 with 89.2 percent. Apache 2.0 license. The community was excited. Then people started actually deploying it.

Capitan: The KV cache thing.

Schnapps: The KV cache thing. For anyone joining late — KV cache is essentially the model's short-term memory during inference. Every token the model generates, it stores key-value pairs from previous tokens so it doesn't have to recompute everything from scratch. The problem: Gemma 4's architecture is hungry. Really hungry. At long contexts — 128K, 262K tokens — the KV cache balloons. The 31B model alone needs roughly 22 gigs of KV cache at full 262K context — on top of model weights. That's the kind of number that makes local deployment genuinely painful.

Here's what makes it delicious. Google Research published TurboQuant literally a week before Gemma 4 dropped. The paper that crashed memory chip stocks — SK Hynix down 6.2%, Samsung down 5%. Six-times KV cache compression, eight-times speedup on H100s, zero accuracy loss. We covered it last week.

Capitan: And they didn't apply it to their own model.

Schnapps: They didn't apply it to their own model! The research division publishes a paper saying "we solved KV cache memory" and the DeepMind division ships a model with a KV cache problem. This is peak Google. The left hand invents the cure, the right hand ships the disease.

Capitan: To be fair, TurboQuant is still a research paper. It's not production code yet.

Schnapps: Sure, but that's the whole story, isn't it? Google has the research. They always have the research. They had transformers. They had BERT. They had the attention mechanism that literally everything in this industry runs on. And somehow they keep losing to people who ship faster with less.

Capitan: Which brings us to story number two.

Schnapps: Which brings us to Apple. Bloomberg reported — this has been floating around for a couple weeks now, but nobody really unpacked it — that Apple is deepening its integration with Google's Gemini models for Apple Intelligence. Not as a fallback. As the primary cloud AI provider for Siri and the system-level intelligence features.

Capitan: Apple. The company that spent forty years saying "we build everything ourselves."

Schnapps: The company that built its own silicon. Its own operating systems. Its own file system. Its own GPU drivers. The company that literally designs the screws in its laptops so you can't open them with normal tools. That Apple looked at the AI landscape in 2026 and said: "Yeah, we'll take Google's stuff."

Capitan: I think the read is simpler than people want it to be. Apple tried. Apple Intelligence launched, the hallucination problems with the notification summaries were embarrassing, the on-device models weren't competitive, and someone in Cupertino did the math on what it would cost to catch up to frontier.

Schnapps: And the math said Google.

Capitan: The math said Google. Because Google has the training infrastructure, the data, and — here's the part — they're the most willing to license. Anthropic won't do it. OpenAI has its own consumer ambitions competing with Siri directly. Google will happily sell you Gemini API access because their core business model is still advertising, not winning the AI consumer race.

Schnapps: So here's the B-side nobody's connecting. Google can't ship its own research into its own products fast enough — Gemma 4 proves that. But Google CAN sell that capability to Apple, who can't build their own models fast enough. It's the weirdest symbiosis in tech. Google builds things it can't deploy. Apple deploys things it can't build. They need each other in the most uncomfortable way possible.

Capitan: Like two people at a dinner party who can't stand each other but drove together.

Schnapps: Exactly. And here's my late-night take: this accelerates the unbundling we've been tracking all day (the model layer splitting from the experience layer). Because if Apple — the most vertically integrated company on Earth — decided that building AI models in-house isn't worth it, that's a signal. It means the model layer is commoditizing so fast that even trillion-dollar companies would rather buy than build. The value is migrating to integration. To the experience layer. To whatever sits between the model and the human.

Capitan: Which is what Apple is actually good at.

Schnapps: Which is what Apple is good at. They just finally admitted the part they're bad at. At 11 PM on a Friday. In a Bloomberg footnote. Classic.

Same Lab, Different Floor

Keep reading

Two Leaks, One Company, and an $852 Billion IOU

$5.5 Billion for 30 Milliseconds and a Legal System

Power Lives in the Pipes

Three Roads, Same Tollbooth — Meta Chips vs Microsoft Data Centers