AI Coding Tools Won't Fix Your Bottleneck | JAM Creative

By Jordan Hauge — Published March 5, 2026 — Category: AI Productivity, AI ROI

AI tools doubled PR volume across thousands of engineering teams. Delivery metrics didn't move. The problem isn't the tooling. Most companies accelerated the part of software development that was never actually slow.

A few months ago I published a piece on what happens when AI users merge 98% more PRs but company metrics stay flat. It got more attention than almost anything on this blog. Founders shared it. A few CTOs reached out. Some people pushed back with "the data is flawed" or "the tools keep improving."Maybe. But nobody disputed the core result.Individual output went up. Organizational outcomes didn't move. If you dig into why, the answer is uncomfortable: you gave your engineers a faster way to do the thing that wasn't the problem.Writing code was the cheapest partThink about where time actually goes in a software project.It's not in writing functions. It's the three weeks of back-and-forth before anyone has a clear spec. It's the PR sitting open for eight days because the one senior engineer who can approve it is blocked on four other things. It's the deployment that requires manual sign-off from someone who's perpetually in meetings. It's the feature that shipped, got used by real customers, and got re-specced six weeks later because the original brief was written without talking to anyone in engineering.AI coding tools don't touch any of that. They make the middle part faster while everything upstream and downstream stays exactly where it was.Faros AI ran telemetry on over 10,000 developers and found that AI coding assistants produced a 98% increase in merged pull requests. PR review time went up 91% in the same period. The bottleneck moved downstream. It didn't disappear.The METR study people are still arguing aboutIn July 2025, METR published a randomized controlled trial with 16 experienced developers working on real open-source projects. The headline finding: developers using AI took 19% longer to complete tasks than developers working without it (METR, 2025). The tools were Cursor Pro with Claude 3.5/3.7 Sonnet. Not outdated models. The actual frontier stack.A lot of people are still arguing about the sample size. That argument misses the more interesting part.Before the tasks started, developers estimated AI would speed them up by 24%. After finishing, they still believed they'd been 20% faster. The actual result was 19% slower. They couldn't measure their own productivity while using the tools. AI reduces the friction of staring at a blank file. You get into flow faster. The cognitive load drops. But flow isn't throughput.METR published a follow-up in February 2026 noting their second study had a design problem: developers were refusing to participate in no-AI conditions because they simply didn't want to work without the tools. That's its own finding. Engineers can't accurately assess what AI is doing to their output, and a lot of them don't want to know. That should concern any engineering leader deploying these tools without measuring what happens after.Cursor just paid $290M to tell you where the problem wentIn December 2025, Cursor acquired Graphite, a code review startup, for a price reported "well over" its $290 million valuation. CEO Michael Truell was direct about why:For most engineering teams, reviewing code looks the same as it did three years ago. It's becoming a larger portion of people's time as the time to write code shrinks.Read that carefully. The company that sold you faster code generation is now selling you a tool to handle the review backlog that faster code generation created.That's not a knock on Cursor.It's an honest read of where the constraint moved. But the solution is still about throughput: review faster, merge faster, ship faster. Whether the right things are being built doesn't show up in that acquisition thesis at all.Review is also harder now, not just more frequent. CodeRabbit found that AI-written code surfaces 1.7 times more issues than human-written code. Senior engineers now spend 4.3 minutes reviewing AI-generated suggestions versus 1.2 minutes for human-written code.You're not just reviewing more PRs. You're reviewing harder ones, for longer. And 76% of developers think AI-generated code needs refactoring before it's production-ready.More code. Harder review. More rework. Flat delivery metrics. Not a coincidence.What I saw before any of these studies existedI led product strategy and implementation for the Six Flags mobile app and website redesigns. Combined, those platforms processed over a billion dollars in transactions annually across 30 million users. Before that, I led an 11-month platform transformation at Albertsons across 1,720 pharmacy locations. Neither was a greenfield project with clean requirements and patient stakeholders.In both cases, the work that nearly derailed those engagements had nothing to do with how fast code got written.Misaligned specs that didn't surface until code was already in review.Senior engineers who were the single point of failure for every architecture decision, creating a queue that no amount of code generation could clear.Deployment processes requiring sign-off at stages where the risk had already been managed weeks earlier.The teams that shipped reliably were the ones who killed work early.The most valuable thing a product team can do is decide not to build something before anyone writes a line of code. AI has zero impact on that call.It takes someone with enough context to say "not yet" or "not this way" and have the standing to make it stick. That judgment is getting more valuable, not less.The DORA 2025 report confirmed what I'd seen firsthand: organizations with mature engineering processes saw AI amplify their performance. Organizations without them saw metrics stay flat. AI is a multiplier. It makes strong processes faster and broken ones more expensive.So where do you actually lookAWS CTO Werner Vogels used his final re:Invent keynote in December 2025, after 14 years on that stage, to say something worth paying attention to.Developers will spend less time writing code and more time reviewing it.On code review specifically: "The review becomes the control point to restore balance. It is where we bring human judgment back into the loop."The constraint isn't disappearing. It's shifting toward comprehension and judgment and away from execution. For operators who've built that muscle, that's good news. The craft has always been in the judgment.But most companies haven't adjusted how they measure anything. They handed engineers Copilot or Cursor, watched PR volume climb, and called it progress. No baseline for lead time. No tracking of change failure rate. No answer to whether any of it changed how fast working software actually reaches users.Before adding another tool, get honest about where work stalls. If it stalls in unclear specs before dev starts, AI genuinely helps with discovery and documentation. If it stalls in review because one senior IC is the bottleneck on every PR, the fix is smaller PRs and a better review process, not faster generation.If it stalls in deployment approvals at stages where risk was already managed upstream, that's a process problem. No code generation tool touches it.The teams that look back at 2025 and 2026 as the years they actually broke through won't be the ones who gave every engineer a $20/month Copilot subscription. They'll be the ones who figured out where work was stuck and used AI on that specific thing.98% more PRs is not the outcome.Shipping better product faster with fewer production incidents is. Right now most companies are optimizing hard for the first one and wondering why the second one isn't moving.