Velocity vs Value: How to Measure Success in the Age of AI

There’s a moment every engineering manager dreads. You’re in a steering meeting, your platform just survived a massive overhaul, your small-but-senior team is finally shipping on a solid foundation — and someone says:

“We need a feature per day. Just use the AI.”

I want to talk about what happens next — not the political maneuvering, but the measurement problem underneath it. Because the question isn’t whether AI makes us faster. The question is: faster at what? And how do we prove it?

The Velocity Mandate

Let me paint the picture.

One of my teams builds and maintains internal tooling for a large enterprise. Our tools help field engineers do their jobs: network security assessments, configuration migrations, and infrastructure visualization. Real work that ties directly to real contracts and revenue streams.

When the velocity mandate arrived, it came with AI-assisted development workflows — tools designed to accelerate the entire software development lifecycle. Feed in a specification, get working code out. The promise: dramatically faster feature delivery.

And honestly? For new, greenfield features, these tools deliver. They’re genuinely impressive at scaffolding, boilerplate, and rapid prototyping. But on a mature, domain-specific codebase with years of accumulated business logic? The results are mixed. Some features land cleanly. Others require significant rework. The gains are real but uneven.

That unevenness is the real story. Because even when AI tools work perfectly, velocity alone tells you nothing about value. And that’s a problem for everyone — developers who want to build meaningful things, and managers who need to demonstrate impact to leadership.

Goodhart’s Ghost

Charles Goodhart, a British economist, gave us one of the most useful laws in management: “When a measure becomes a target, it ceases to be a good measure.”

Features per day is the purest expression of Goodhart’s Law in software engineering. The moment you make it a target:

Developers break work into artificially small tickets
Complex but valuable work gets deferred (“too risky for the sprint”)
Tech debt accumulates invisibly
Quality drops, rework rises
The number goes up. The product stays flat.

It is possible to game any system. My responsibility is to create a game where winning benefits my teams, the product and our organization.

The AI Productivity Paradox

AI coding tools are transforming how we write software — but the productivity story is more nuanced than the headlines suggest. The 2025 DORA report — the most rigorous annual study of software delivery performance — found what many of us suspected:

AI improves throughput but can decrease stability when adopted without guardrails.

The numbers from recent research:

Greenfield projects (new code, no legacy): 30-40% productivity gains
Brownfield projects (existing codebases, complex dependencies): 0-10% gains, highly variable
Rework patterns: AI-generated code shows 1.7x higher defect density and 2x code churn compared to human-written code
Developer experience: 84% of developers use AI tools, but only 29-46% fully trust the output. 66% report spending significant time refining “almost-right” AI-generated code

These tools are valuable. They’re also not magic. The real gains come when teams understand where AI accelerates work and where it creates hidden costs — and when organizations measure both sides of that equation.

So What Do You Measure Instead?

Here’s where I had to come up with my own answer. Because “velocity is bad” isn’t a strategy — it’s a complaint. Your stakeholders need numbers. Your board needs a narrative. You need data to promote your team.

I started with a simple question: What is the actual value my platform delivers?

For us, the answer was concrete. Our users — field engineers, sales teams, technical architects — use our tools to win and deliver contracts. Those contracts have dollar values. Our internal telemetry system already tracks who uses which tools, how often, and for how long. We track AI token consumption by category: new feature, bugfix, refactor, maintenance, infrastructure.

The insight was connecting those dots:

Revenue Associated with Platform = sum of contract values where tool usage exceeds a meaningful threshold.

Not “features shipped.” Not “story points completed.” Not “tokens consumed.” Revenue enabled by the platform.

The Three-Dimensional Framework

Pure revenue attribution has its own pitfalls — correlation isn’t causation, and stakeholders will challenge your model. So I built a three-dimensional measurement system:

Dimension 1: Business Value (EBM)

Evidence-Based Management from Scrum.org defines four value areas:

Current Value: What does your product deliver to users today?
Unrealized Value: What’s the gap between what users need and what they have?
Time to Market: How quickly can you deliver and learn?
Ability to Innovate: What prevents you from delivering new value?

Revenue attribution fits squarely in Current Value. But the other three dimensions prevent you from optimizing only for today.

Dimension 2: Engineering Health (DORA 2025+)

The DORA framework expanded to five metrics in 2025:

Deployment Frequency — how often you ship
Lead Time for Changes — how fast from commit to production
Change Failure Rate — how often deployments cause incidents
Failed Deployment Recovery Time — how fast you recover
Rework Rate (new) — percentage of deployments that are unplanned fixes

Rework Rate is the critical metric in the AI era. When AI-generated code passes CI but causes issues in production, Rework Rate catches it. It’s the guardrail that keeps velocity honest.

Dimension 3: AI Cost Efficiency

Track token consumption by category:

New features vs. bugfixes vs. refactoring vs. maintenance vs. infrastructure
Cost per feature delivered
Token efficiency trends over time

This isn’t about minimizing AI usage — it’s about understanding where AI adds value and where it’s burning budget on churn.

The Dashboard That Tells the Story

The Dashboard That Tells the Story — metrics meeting narrative

Numbers without narrative are noise. I designed a six-screen value insights dashboard that tells a complete story to different audiences:

Executive Overview — KPI cards, dual-axis charts, and the Success Stream
Adoption & Usage Trends — who’s using what, growth patterns
Value Delivery Timeline — features mapped to contracts and revenue
AI Impact Breakdown — tokens, cost, efficiency by category
Revenue Correlation — usage scores vs. contract values (with explicit “correlation, not causation” disclaimers)
Technical Guardrails — DORA metrics + static code quality split by AI vs. human code

The Success Stream

Screen 1 deserves special attention. Raw numbers are comparable — and that’s their weakness. A table showing “2 man-days, 1.8M tokens, correlated with a $580K contract” invites spreadsheet thinking. Someone will ask why another team used fewer tokens, and the conversation derails into cost optimization instead of value celebration.

The Success Stream transforms those numbers into narrative. Instead of a metrics table, it’s a living feed of team achievements — releases, milestones, and value-linked events — told as stories:

“Code Team Six tactically delivered a custom firewall migration toolset for Petunia Team. This enabled a one-day cutover for the Badger Corp engagement. AI tools were applied to generate migration scripts (1.8M tokens). Correlated contract value: $580K USD.”

Same data. Completely different impact. The Success Stream builds your team’s brand internally. It communicates value in human terms. And it gives leadership something they can actually forward to their leadership — not a chart, but a story of impact.

Every metric in the dashboard feeds into the Success Stream. Features delivered, tools adopted, contracts enabled, AI contribution — woven into a narrative that says: this team creates value.

Technical Guardrails — The Team’s Own Lens

Screen 6 serves a different audience and a different purpose. It’s not there to challenge velocity or question AI adoption — managers need velocity visibility, and that’s legitimate. Screen 6 exists for the engineering team itself.

Its purpose: give the team clear, objective data on whether the current pace and methodology are sustainable. Are we accumulating technical debt? Is AI-generated code degrading our maintainability scores? Are security hotspots increasing? Is our rework rate trending up?

This isn’t about slowing down. It’s about steering. A team that can see its own quality trends can self-correct before problems compound. When cyclomatic complexity creeps up or duplication percentage rises, the team can proactively allocate time for refactoring — not because a manager told them to, but because the data shows it’s needed.

Screen 6 turns code quality from an invisible concern into a visible, shared responsibility. The team owns it. The team acts on it. And when they do, velocity becomes sustainable instead of borrowed.

The Conversation You Actually Need

Here’s what I’ve learned: the velocity debate isn’t really about metrics. It’s about alignment.

Your stakeholders want to know the team is delivering impact. Your team wants to build things that matter. AI tools are accelerating what’s possible. The gap is in how we communicate what’s happening.

Good metrics close that gap. When you can show:

“Here are the contracts our tools helped enable this month” (value)
“Here’s our deployment cadence and quality trends” (engineering health)
“Here’s how AI contributed — and what we’re watching” (honest AI assessment)
“Here’s what the team achieved, in their own words” (Success Stream)

…you’re not defending velocity. You’re promoting value. And that’s a conversation everyone wants to have.

The Series

This is the first post in a four-part series. Each post goes deep on one dimension:

The Velocity Trap → — Why pure velocity metrics mislead, and what Goodhart’s Law teaches us about engineering measurement
AI Coding Tools: Greenfield vs Brownfield → — Real data on where AI accelerates work, where it creates hidden costs, and how to set realistic expectations
Measuring What Matters → — Building a measurement system with DORA 2025+, Evidence-Based Management, and revenue attribution
Code Quality Guardrails → — Static analysis, AI vs. human code quality data, and building a dashboard your team actually owns

I’m not claiming to have solved this. I’m sharing what I’ve built, what I’ve learned, and where I’m still figuring it out. If you’re navigating velocity mandates, AI adoption, and the challenge of proving your team’s real impact — I hope this helps.

Because the goal was never to go fast. The goal is to go somewhere worth going. Preferably fast :)

Krzysztof Sajna is an IT engineering manager who builds internal platforms at scale. He writes about the messy intersection of technology, management, and reality at sajna.space.

Velocity vs Value: How to Measure Success in the Age of AI#

The Velocity Mandate#

Goodhart’s Ghost#

The AI Productivity Paradox#

So What Do You Measure Instead?#

The Three-Dimensional Framework#

Dimension 1: Business Value (EBM)#

Dimension 2: Engineering Health (DORA 2025+)#

Dimension 3: AI Cost Efficiency#

The Dashboard That Tells the Story#

The Success Stream#

Technical Guardrails — The Team’s Own Lens#

The Conversation You Actually Need#

The Series#