Measuring What Matters: From Output to Outcomes

This is Part 4 of the Velocity vs Value series. Previously: The Velocity Trap and AI Coding Tools: Greenfield vs Brownfield.

You understand that velocity alone doesn’t tell the full story. You’ve seen how AI tools accelerate delivery while introducing new quality considerations. Now comes the practical part: what do you actually measure to show your team’s real impact?

“We need better metrics” is easy to say. Your stakeholders need numbers. Your board needs a narrative. Your team needs to know what success looks like — and how to promote what they deliver. You need a framework that doesn’t require a PhD in data science to implement.

I’ve spent the last several months building a measurement system for my platform team. Here are the three frameworks that actually work, how they complement each other, and where they fall apart.

Framework 1: DORA 2025+ — The Engineering Health Check

The DORA (DevOps Research and Assessment) metrics are the gold standard for software delivery performance. In 2025, the framework evolved significantly:

The Five Metrics:

Deployment Frequency — How often you ship to production
Lead Time for Changes — Time from first commit to production deployment. Target hours, not days — 4 hours is a good benchmark.
Change Failure Rate — Percentage of deployments causing incidents
Failed Deployment Recovery Time — How quickly you recover (reclassified from “stability” to “throughput” — recovery is about flow, not just resilience)
Rework Rate (NEW) — Percentage of deployments that are unplanned fixes

Why Rework Rate changes everything:

Rework Rate is the metric the AI era needed. When you generate code with AI tools and ship it fast, your Deployment Frequency goes up. Your Lead Time goes down. Your velocity chart looks great — and that’s genuinely good.

But if some of that code causes problems downstream, Rework Rate catches it. It’s the honest complement to velocity. High velocity + low rework = your team is truly accelerating. High velocity + high rework = the gains are partially illusory.

The 2025 AI Paradox: DORA’s 2025 report confirmed the paradox: AI adoption improves throughput but can decrease stability when deployed without guardrails. The takeaway isn’t “don’t use AI.” It’s “use AI confidently, add guardrails, and measure the full picture.”

What changed in benchmarking: The old performance tiers (low, medium, high, elite) are gone. DORA 2025 replaced them with seven team archetypes, acknowledging that there’s no one-size-fits-all “elite” performance. A platform team maintaining critical infrastructure has a legitimately different performance profile than a greenfield product team. Both can be excellent.

Framework 2: Evidence-Based Management — The Value Lens

DORA tells you how healthy your engineering process is. It doesn’t tell you whether you’re building the right things. For that, I use Evidence-Based Management (EBM) from Scrum.org.

EBM defines four Key Value Areas:

Current Value (CV)

What does your product deliver to users today?

This is where revenue attribution lives. For my team: which contracts were enabled by our platform tools? How much time did users save? What’s the satisfaction score?

If you have usage telemetry — session time, feature adoption, tool utilization per user role — you can build a Usage Score. Weight it: time spent (40%), interactions (30%), AI assistance used (30%). Correlate it with business outcomes.

The attribution trap: You will never prove causation. A contract wasn’t closed because someone used your tool — it was closed because of sales, engineering, timing, pricing, and a dozen other factors. Your tool was one input.

Be honest about this. Label everything “correlation, not causation.” Stakeholders respect intellectual honesty more than you think — and they’ll distrust you forever if you oversell attribution.

Unrealized Value (UV)

What’s the gap between what users need and what they have?

This prevents the “optimize for today” trap. If you only measure current value, you’ll stop innovating. Unrealized Value forces you to look at the satisfaction gap: what do users wish your platform could do? What workflows are still manual?

Survey your users. Watch them work. Track feature requests by impact, not just count.

Time to Market (T2M)

How quickly can you deliver new capabilities and learn from them?

Note: this isn’t velocity. It’s the speed of the entire value cycle — from idea to deployed feature to user feedback to next iteration. A team with high velocity but slow feedback loops has poor Time to Market.

Measure idea-to-production lead time, not just coding speed. Include waiting time, review cycles, approval gates, deployment queues. In my experience, 60-80% of Time to Market is wait time, not work time.

Ability to Innovate (A2I)

What prevents you from delivering new value?

This is where tech debt, legacy systems, compliance overhead, and team capacity constraints live. If 40% of your engineering capacity goes to maintenance and rework, your Ability to Innovate is 60% at best.

Track the percentage of time spent on:

New features (value creation)
Bug fixes (value recovery)
Infrastructure and maintenance (value preservation)
Rework from defects (value loss)

This is your flow distribution. A healthy team should spend 50%+ on new value creation. If you’re below 30%, velocity pressure won’t help — you need to invest in stability first.

Framework 3: Value Stream Mapping — The End-to-End View

DORA measures engineering health. EBM measures value delivery. Value Stream Mapping shows you where the bottlenecks are.

A value stream for an internal platform might look like:

Idea → Spec → Design → Code → Review → Test → Deploy → Adopt → Impact

Map each stage:

Process time — actual work being done
Wait time — sitting in a queue, waiting for review, blocked on dependencies
Rework loops — sent back for changes, bugs found in testing

In my experience, most teams discover that:

Coding is 15-20% of the total time
Review and approval cycles are 30-40%
Post-deployment adoption and feedback take weeks or months
Nobody measures the Adopt → Impact stage at all

This is why AI coding tools have diminishing returns on overall delivery speed. They accelerate the 15-20% that was already the fastest part. Speeding up coding when you’re bottlenecked on reviews, approvals, and adoption is like widening one lane of a five-lane highway.

Putting It All Together: The Three-Dimensional Model

Here’s the system I built:

Business Value (EBM) — Revenue correlation, user satisfaction, adoption rates, unrealized value gap → “Are we building the right things?”

Engineering Health (DORA 2025+) — Deployment frequency, lead time, change failure rate, recovery time, rework rate → “Are we building things right?”

AI Cost Efficiency (Custom) — Token consumption by category, cost per feature, AI vs. human quality split → “Is AI helping or creating hidden costs?”

Each dimension has guardrails:

Business Value without Engineering Health = shipping fast, breaking things
Engineering Health without Business Value = gold-plating nobody asked for
AI Cost Efficiency without both = optimizing a tool nobody needs

Practical Implementation: The 4-Week Ramp

You don’t need a six-month data engineering project to start:

Week 1: Set up DORA metrics. Most CI/CD tools already track deployment frequency and lead time. Change failure rate comes from your incident management. Rework rate = (unplanned fix deployments / total deployments).

Week 2: Survey your users. Simple: “What do you use? What do you wish existed? What’s broken?” Map responses to Current Value and Unrealized Value.

Week 3: Pull usage telemetry. Sessions, features used, time spent. Build a simple usage score per user or team. Correlate (carefully) with business outcomes.

Week 4: Map your value stream. From idea to impact, how long does each stage take? Where are the waits? Where’s the rework?

After four weeks, you’ll have more insight into your team’s actual performance than a year of velocity charts ever provided.

The Narrative Matters: Success Stream

Data without story is noise. This is where the Success Stream concept becomes critical.

Don’t present a spreadsheet. Present a narrative:

Here’s what the team delivered — with context, not just ticket counts
Here’s what it enabled — revenue correlation, user impact, with honest caveats
Here’s the engineering health — DORA metrics, quality trends
Here’s how AI contributed — acceleration, efficiency, areas to watch
Here’s what we should build next — Unrealized Value, flow bottlenecks

Wrap the key achievements in human-readable stories: “The team delivered X, enabling Y, with Z business impact.” This is your Success Stream — a living narrative of value delivery that promotes your team’s work in terms everyone understands.

This is infinitely more powerful than “our velocity was 47 points this sprint.”

Final post in the series: Code Quality in the AI Era: The Guardrails You Can’t Skip → — when AI writes your code, how do you keep quality from silently degrading?

Start from the beginning: Velocity vs Value: How to Measure Success in the Age of AI →

Framework 1: DORA 2025+ — The Engineering Health Check#

Framework 2: Evidence-Based Management — The Value Lens#

Current Value (CV)#

Unrealized Value (UV)#

Time to Market (T2M)#

Ability to Innovate (A2I)#

Framework 3: Value Stream Mapping — The End-to-End View#

Putting It All Together: The Three-Dimensional Model#

Practical Implementation: The 4-Week Ramp#

The Narrative Matters: Success Stream#