Google dropped Gemma 4 on Wednesday — four models built from the same research behind proprietary Gemini 3, spanning 2B to 31B parameters. Multimodal. 256K context. Thinking mode. The benchmarks are genuinely impressive. But none of that is the story.

The story is two words: Apache 2.0.

Every previous Gemma release shipped under Google's custom "Gemma Terms of Use" — a license crafted to look open while keeping a leash attached. Restrictions on commercial use. Prohibited-use policies. The kind of "open source" that requires air quotes and a footnote. Open*.

Gemma 4 drops the asterisk.

Apache 2.0 is the license that powers Kubernetes, Kafka, TensorFlow — Google's own TensorFlow, ironically. No usage restrictions. No prohibited-use policy. No Google lawyer squinting at your deployment logs. You can fork it, sell it, fine-tune it for military contracts if that's your thing. The OSI calls it open source. Because it actually is.

Why now? Because Alibaba already did it. Qwen 3.5 shipped under Apache 2.0 in February, and we covered how it beats GPT-5-mini at 1/30th the price. Meta's Llama uses a permissive license. Mistral went Apache. Google was the last major holdout still pretending a custom license counted as "open." The competitive pressure didn't give them a choice — it gave them an excuse.

The benchmarks, briefly. The 31B dense model sits at #3 among all open models on LMArena. The 26B MoE — with only 3.8B active parameters — lands at #6. Math scores quadrupled from Gemma 3 (AIME: 20.8% → 89.2%). Codeforces ELO jumped from 110 to 2,150 — a 20x leap that's the largest generational improvement any open model family has ever posted. The MoE outperforms OpenAI's gpt-oss-120B on GPQA Diamond despite being a fraction of the size.

But here's where it gets interesting for your hardware budget.

The edge play. Gemma 4 E2B runs in under 1.5GB of RAM. That's a Raspberry Pi. A phone. A device you forgot was a computer. It handles text, images, video, and audio — native multimodal at two billion parameters. On r/LocalLLaMA, people are running the 26B MoE on a 32GB MacBook Air at 12 tokens per second while the machine sips 8 watts.

This morning's digest called today's theme "The Great Redistribution." Gemma 4 is Exhibit A for downward redistribution. When a legitimately capable model runs on hardware you already own, under a license that asks nothing of you, the economics of AI shift underneath every pricing page in the industry.

And it's not happening in isolation. Qwen 3.6-Plus matches Opus on SWE-bench at $0.29 per million tokens. PrismML's Bonsai fits an LLM in 1GB. The floor is falling out from under premium pricing.

What to watch. Fine-tuned variants. The Gemma community has already produced 100,000+ model derivatives — and those were under the restrictive license. Apache 2.0 removes the last friction point. Expect specialized coding, medical, legal, and multilingual fine-tunes within weeks. The real question isn't whether Gemma 4 is good enough — it's whether the models charging 50x more can justify the gap for 70% of tasks.

(We're doing a hands-on walkthrough at 14:00 ET — Gemma 4 locally via Ollama, Qwen via API, and a cost decision matrix. Bring your terminal.)

Google BlogLatent Space Analysis