The Velocity Trap: Why 'Features Per Day' Misleads Everyone

This is Part 2 of the Velocity vs Value series.

In the pillar post I introduced the tension between velocity mandates and value delivery. This post goes deeper into why velocity as a target fails — and what actually works.

The Speedometer Problem

Velocity was designed as a planning tool. It helps teams estimate capacity. It’s a speedometer, not a gas pedal. You don’t make a car faster by taping over the speedometer and writing a bigger number.

But the moment someone puts velocity on a dashboard and reviews it quarterly, the game changes. Teams optimize for the metric, not the outcome:

Break one meaningful feature into five trivial tickets
Defer complex, high-value work that carries risk
Skip tests, cut code review short, ignore edge cases
Inflate estimates (“this is definitely an 8, not a 3”)
Avoid refactoring — it doesn’t produce features, so it doesn’t count

Goodhart’s Law in action: “When a measure becomes a target, it ceases to be a good measure.” The number goes up. The product stagnates.

The Feature Factory

The Feature Factory — assembly line of mediocrity

John Cutler’s “feature factory” concept has only gotten sharper with time. A feature factory:

Measures success by features shipped, not outcomes achieved
Never circles back to ask “did this feature actually help anyone?”
Confuses output with outcome

AI coding tools have given the feature factory a turbocharger. You can now ship low-value features at unprecedented speed — if you’re not careful about what you build.

My team builds internal tools for enterprise operations. If I optimized purely for velocity, I could ship a new button every day. A settings page nobody asked for. A report nobody reads. Feature per day? Done. But our users need the right tools, working reliably. That’s not a velocity problem — that’s a value problem.

What the Research Actually Says (And Doesn’t)

The data around AI-assisted development is nuanced — more nuanced than most articles acknowledge.

DORA’s “Change Failure Rate” needs context. The 2025 DORA Report shows that teams focusing exclusively on throughput metrics show higher change failure rates. But what counts as “failure”? This depends entirely on your environment, risk tolerance, and cost structure.

In some contexts — rapid prototyping, feature experimentation, A/B testing — you want frequent, cheap failures. They serve as triage for ideas. The ones that survive are battle-tested. Think of X (formerly Twitter) deliberately shedding years of unnecessary code. They broke things intentionally to arrive at a leaner system. That’s not reckless — that’s strategic.

The critical question isn’t “how often do deployments fail?” but “what does failure cost, and can we afford it?” In a multi-tenant enterprise platform with data sovereignty requirements, failure is expensive. In a prototype environment, failure is the learning mechanism. DORA metrics don’t make that distinction for you — your engineering judgment does.

The 470 GitHub PRs study deserves a second look. The widely cited finding that AI-generated PRs contain 1.7x more issues is real — but the why matters. Consider two underexplored factors:

First, democratization of contribution. AI tools lower the barrier to writing code. People who previously wouldn’t have attempted a pull request — junior developers, designers, product managers — are now submitting code. Some of that 1.7x increase isn’t “AI writes bad code” but “more inexperienced contributors are writing code with AI assistance.” The tool didn’t lower quality — it expanded who participates.

Second, the Sycophancy Trap. AI coding assistants are trained to be helpful — which means they tend to generate what you ask for, even when what you’re asking for is wrong. If a developer writes a vague or incorrect spec, the AI will cheerfully produce confident-looking code that implements the wrong thing. It won’t push back. It won’t say “are you sure about this architecture?” That agreeable behavior inflates defect rates not because the AI can’t code, but because it can’t argue.

Both factors suggest the problem isn’t the tool — it’s how organizations deploy it.

The Rework Tax — And When to Pay It

Rework rate — the percentage of deployments that are unplanned fixes — is DORA’s newest metric, and it’s the most revealing one for the AI era.

Track it weekly. If it trends up, one of two things is happening:

Your current development practices are generating debt. Velocity pressure, insufficient review, inadequate testing — the usual suspects. This is the rework tax, and you want to reduce it.
You’re carrying legacy burden that should be shed. Not all rework is bad practice. Sometimes high rework rate signals that your codebase needs strategic pruning, not more patches. If you’re spending 30% of capacity maintaining code that serves 5% of users, the answer isn’t “fix it faster” — it’s “deprecate it and invest elsewhere.”

The distinction matters. Sometimes the right move is to ship fast, break things, abandon what doesn’t work, and invest in better alternatives. X did it. Companies that shed legacy code strategically often see their rework rates spike temporarily and then collapse as the simplified codebase becomes maintainable again.

Real velocity = Gross velocity − Rework on things worth keeping

I aspire to have this conversation with stakeholders: “Right now 30% of our capacity goes to rework. Some of it is quality debt from shipping too fast. Some of it is legacy tax from code we should retire. Here’s a plan to address both.” The velocity pressure is real and intense — but the data gives you leverage to have that conversation with specifics, not just complaints.

What to Track Instead

1. Value Delivery Rate

Not “features shipped” but “features that users actually use.” Track adoption within the first two weeks. If nobody touches it, it wasn’t valuable — no matter how fast you shipped it.

2. Rework Rate (DORA)

Percentage of deployments that are unplanned fixes. Track it weekly. If it trends up, investigate: is this quality debt from moving too fast, or legacy burden that needs strategic deprecation? The treatment depends on the diagnosis.

3. Lead Time for Changes

From first commit to production deployment. This measures pipeline health, not developer speed. Target hours, not days — a 4-hour lead time with low rework rate means your team is genuinely fast and stable. A 4-hour lead time with 40% rework rate means you’re just churning.

4. User Time-to-Value

How long does a user spend interacting with your tool to accomplish their goal? Track the trend over time. You want this number decreasing. If you shipped 50 features but the average task still takes 45 minutes instead of 10, you haven’t created value — you’ve created complexity. Every release should make your users faster, not give them more buttons to click.

5. Revenue or Outcome Correlation

If you can tie tool usage to business outcomes — contracts closed, time saved, incidents resolved — that’s the ultimate metric. It’s harder to measure, but it’s the only one that answers “did we create value?”

The Conversation That Matters

When your boss says “increase velocity,” they usually mean one of three things:

“We’re not shipping fast enough” → Show them lead time and deployment frequency. Those are real speed metrics. If your lead time is 4 hours and your deployment frequency is daily, you’re fast. If not, the bottleneck is probably in your pipeline or review process, not in developer typing speed.
“I can’t see what the team is doing” → That’s a visibility problem, not a velocity problem. Shorten the reporting interval — weekly progress updates, a Success Stream of achievements, live dashboards showing work in flight. When stakeholders have continuous visibility, the pressure to prove speed through velocity numbers drops dramatically. Don’t wait for sprint reviews. Show progress as it happens.
“I need to justify the team’s value to leadership” → That’s an alignment opportunity. Show value delivered — contracts enabled, users served, user time-to-value trending down — not just tickets closed. Help your leadership tell a compelling story about impact, not activity.

The velocity trap is seductive because it gives everyone a simple number to point at. But your job as an engineering leader isn’t to maximize a number — it’s to promote the value your team creates and make that value impossible to ignore.

This is Part 2 of the Velocity vs Value series.

Next: AI Coding Tools: The Greenfield Fantasy vs Brownfield Reality → — real data on where AI accelerates work, where it creates hidden costs, and how to set realistic expectations.

The Speedometer Problem#

The Feature Factory#

What the Research Actually Says (And Doesn’t)#

The Rework Tax — And When to Pay It#

What to Track Instead#

1. Value Delivery Rate#

2. Rework Rate (DORA)#

3. Lead Time for Changes#

4. User Time-to-Value#

5. Revenue or Outcome Correlation#

The Conversation That Matters#