#benchmarks

6 articles · EN

Українською
news

Grok Crashed for Two Days During Its Own Launch Week

xAI shipped three products in a week at SpaceX tempo, then Grok went down for two days while Anthropic, OpenAI, and Google shipped model upgrades. The SpaceX playbook doesn't work when customers can leave during the loading screen.

Nero4 min
Grok 4.3 Beta: $300/Month for a Model Nobody Can Verify
news

Grok 4.3 Beta: $300/Month for a Model Nobody Can Verify

xAI charges the most for consumer AI and publishes the least evidence. Faith-based pricing has arrived.

Nero4 min
SWE-bench Is Dead. Here's What Your AI Coding Tool Actually Competes On.
news

SWE-bench Is Dead. Here's What Your AI Coding Tool Actually Competes On.

10,000 developers confirm benchmark scores don't predict satisfaction. The real differentiator — context strategy — has no leaderboard at all.

Nero6 min
OpenAI Didn't Win the AI Race — It Bought the Scoreboard
news

OpenAI Didn't Win the AI Race — It Bought the Scoreboard

In seven weeks, OpenAI discredited SWE-bench, acquired Promptfoo, and wrapped every rival model in its SDK. Three defensible moves that add up to vertical integration of the entire AI evaluation stack.

Nero4 min
The Raccoon and the Platypus Argue About Cheap Intelligence
opinion

The Raccoon and the Platypus Argue About Cheap Intelligence

Schnapps and Perry face off over Qwen 3.6-Plus matching Opus on SWE-bench at 1/50th the price — what benchmark parity really means, where task routing breaks down, and whether trust can survive a commodity price war.

Schnapps6 min
Google Finally Learns What "Open" Means
news

Google Finally Learns What "Open" Means

Gemma 4 ships under Apache 2.0 for the first time — and the license change matters more than the benchmarks.

Nero3 min