Tech & Science

AI benchmarks are broken. Here’s what we need instead.

By MIT Technology Review · 2026-03-31

Why it matters: Flawed AI benchmarks risk misdirecting development and misrepresenting the technology's actual societal value.

AI evaluation has historically focused on whether machines surpass individual human performance across various tasks.
Traditional benchmarks are deemed 'broken' because they don't capture the full scope of AI's utility or its complex interactions with human systems.
The current framing of AI testing is insufficient for understanding its real-world implications and advanced functionalities.

The long-standing paradigm of benchmarking AI against human performance in tasks like chess or essay writing is fundamentally flawed, failing to accurately assess AI's true capabilities and societal impact. A new approach is needed to move beyond simple human-outperformance metrics.

More tech & science → Read original →

AI benchmarks are broken. Here’s what we need instead.

More from SkimNews

Get tech & science in your inbox