Before you dismiss this as another study that doesn't apply to your team, your modern tools, your clearly working setup — hear the actual numbers.
METR ran a real randomized controlled trial. Not a survey. Not a vibe check. They recruited 16 experienced developers, handed them 246 real issues from mature open-source projects, randomly assigned who could use AI and who couldn't, and measured completion time with a clock.
Before the tasks: developers predicted AI would make them 24% faster. After the tasks: developers reported AI saved them about 20% of their time. What the clock measured: AI made them 19% slower.
The prediction was wrong. The self-report was wrong. In the same direction. By a lot.
The Gap Between Feeling Productive and Being Productive
This is what makes the METR study uncomfortable in a way that a lot of AI coverage just isn't. It's not questioning whether AI is impressive. It's asking whether the feeling of productivity we get from AI tools corresponds to actual output — and finding a clean, measured answer: not necessarily.
The slowdown isn't mysterious once you think about it. Every AI suggestion you review and reject still costs time. Every context switch to read LLM output interrupts flow. Every "almost right" answer that needs fixing adds overhead pure typing wouldn't have. The tool feels powerful because it's generating things fast. You're still the one deciding what to keep.
There's also a subtler problem. When you write code from scratch, you're forced to think through the structure. When AI writes it, you're approving structure someone else (something else) chose. Approval is cognitively cheaper than creation — which is probably why it feels faster — but it's a different kind of work, and it skips some of the thinking that catches problems early.
But Doesn't This Study Use Old Tools?
METR is already ahead of this objection. They acknowledge the study used early-2025 tools, and they explicitly believe developers are faster with 2026 tools than early-2025 tools would predict. They're running a second, larger study (47+ developers) right now.
So maybe the productivity gains are real — just not as large, or not in the places, we think.
Here's what I'd bet stays true regardless of model improvements: the perception gap hasn't closed. Developers are still systematically overestimating AI's impact on their output. That gap — between how fast AI feels and how fast you actually are — is the part that matters for everything downstream.
Why Overestimating Matters
The problem isn't that AI makes you slower. The problem is that the entire industry is making decisions based on perceived productivity gains that haven't been independently measured.
CFOs are deferring headcount based on claimed efficiency. Engineering leaders are setting sprint commitments based on "we're faster now." Investors are pricing AI tooling companies based on the productivity narrative. If the actual gain is a fraction of the reported gain — or if it's real but inconsistent across tasks, teams, and tool versions — every one of those decisions has a rounding error the size of a truck in it.
The teams most at risk are the ones who adopted AI tools early, saw adoption rates shoot up, and called that success. High adoption is not the same as high productivity. It means people are using the thing. Whether the thing is making them better is a different question, and one that almost nobody has answered with a clock.
What to Actually Do With This
- Stop treating survey data as productivity data. "Developers report saving X hours per week" is a measure of how developers feel. Run a before/after on cycle time, PR throughput, or defect rate if you want something real.
- Segment your AI impact by task type. Boilerplate, documentation, and test generation probably do show genuine gains. Architecture decisions, debugging novel failures, and anything requiring deep system context probably don't — and might actually cost you time.
- Build for honest feedback loops. If your team is using AI and slowing down, they should feel safe saying so. "AI is making me slower on this kind" is useful signal. "We're all going faster, we think" is noise.
- Hold the perception gap accountable. If developers estimate AI saves them 20% and your data doesn't reflect that — even directionally — that's worth a frank conversation. The METR study isn't an argument against using AI tools. It's an argument against assuming they're working without checking. There's a difference. The first is a Luddite take. The second is just good engineering.
If your AI productivity story is built entirely on how fast the tools feel, it might not survive contact with a stopwatch.
Sources: METR — Measuring the Impact of Early-2025 AI on Developer Productivity · METR — Changing Our Experiment Design · ShiftMag — 93% of Developers Use AI, Why Is Productivity Only 10%?