This is Part 3 of the Velocity vs Value series. Previously: The Velocity Trap.


Here’s a demo that writes itself: an engineer types a natural language description, and an AI agent generates a complete microservice in three minutes. Working tests. Clean structure. The audience applauds.

Here’s what they don’t show you: that same AI agent, pointed at a five-year-old enterprise codebase with custom ORM patterns, undocumented business rules, and three layers of abstraction that made sense in 2021 — producing code that compiles, passes lint, and does the wrong thing in production.

I manage a team that lives in the second scenario. And I want to share what we’ve actually learned about AI coding tools — not the conference talk version, but the daily reality.

The Numbers Don’t Lie (But They Don’t Tell the Whole Truth)

Let’s start with the data, because this conversation is too important for vibes:

Greenfield productivity gains:

  • 30-40% improvement in new project development
  • AI excels at boilerplate, scaffolding, CRUD endpoints, standard patterns
  • Rapid prototyping becomes genuinely faster
  • The “vibe coding” workflow — describe what you want, iterate on output — works surprisingly well

Brownfield productivity gains:

  • 15-20% on simple modifications to existing code
  • 0-10% on complex tasks involving legacy systems
  • Negative impact when AI lacks sufficient context about architectural decisions
  • Performance declines sharply as codebase size increases due to context window limitations

The productivity paradox:

  • Initial output surges 30-40%
  • Subsequent rework and bug fixes reduce net gains to 15-20%
  • 84% of developers use AI tools, but only 29-46% trust the accuracy of outputs
  • 66% report spending more time fixing “almost-right” AI code than they saved generating it

Source: aggregated from Stack Overflow 2025 Developer Survey, DORA 2025 Report, and multiple peer-reviewed studies on AI-assisted development.

“Spec In, Feature Out” — The Waterfall in AI Clothing

The most aggressive deployment of AI I’ve encountered isn’t GitHub Copilot or cursor tab-completions. It’s what I’ll call the autonomous harness pattern: a rigid wrapper around an LLM coding agent that attempts to automate the entire SDLC.

The workflow looks like this:

  1. Write a Product Requirements Document (PRD)
  2. Feed it to the harness
  3. The harness generates specifications, designs, code, and tests
  4. Review the output
  5. Ship

In theory: Specification In, Feature Out. In practice: Waterfall In, Technical Debt Out.

Here’s why this pattern fails on brownfield codebases:

The Context Problem

Enterprise codebases contain decisions that aren’t in the code. Why does this service call that endpoint instead of the obvious one? Because three years ago, a data sovereignty requirement forced a routing change that was never documented. The AI doesn’t know this. It generates the “obvious” solution. It breaks in production.

Large language models have finite context windows. A monorepo with hundreds of thousands of lines of code, Terraform configurations, Kubernetes manifests, and custom orchestration logic simply doesn’t fit. The AI sees fragments. Engineers see the system.

The Agency Problem

Brownfield reality — the engineer and the machine, face to face with complexity

When you force an autonomous harness on senior engineers, you’re telling them: “Your judgment doesn’t matter. Feed the machine and review its output.”

Senior engineers aren’t code-typing machines. They excel in system-wide decision-making. They know which corners to cut and which to reinforce. They know when a requirement is wrong before they write a line of code. They know when “the spec says X” but the user actually needs Y.

An autonomous harness strips this agency. It transforms agile iteration into a waterfall process disguised in AI branding: requirements → design → code → review. No feedback loops. No “wait, this approach won’t work for our tenancy model.” Just: spec in, code out, problems later.

The Quality Problem

Studies show AI-generated code has:

  • 1.7x higher defect density than human-written code
  • 2x code churn (more rework cycles)
  • 75% more logic and correctness issues
  • Up to 2.74x more security vulnerabilities
  • Performance inefficiencies appearing 8x more frequently

And here’s the kicker: this code often passes automated tests. Human reviewers reject it for failing “soft” requirements — coding standards, repository conventions, complex project logic. Automated benchmarks significantly overestimate the production-readiness of AI code.

When you combine these quality issues with velocity pressure (“ship a feature per day”), you get code that passes CI, satisfies the velocity chart, and slowly poisons your codebase.

Where AI Actually Helps

I’m not anti-AI. My team uses AI tools daily. But we use them strategically, not as a replacement for engineering judgment:

AI shines at:

  • Generating boilerplate code and repetitive patterns
  • Writing initial test scaffolding
  • Documentation drafts and docstrings
  • Exploring unfamiliar APIs or frameworks
  • Prototyping new features in isolation before integrating
  • Refactoring well-understood, well-tested code
  • Onboarding — cutting “time to first PR” roughly in half

AI struggles with:

  • Complex domain logic with implicit business rules
  • Multi-service interactions with undocumented dependencies
  • Performance optimization in context-heavy systems
  • Security-sensitive code (authentication, authorization, data handling)
  • Architecture decisions that require understanding trade-offs across the system
  • Anything that requires knowing why the code is the way it is

The right mental model isn’t “AI replaces developers.” It’s “AI is a heavily caffeinated junior engineer who writes fast, doesn’t ask questions, and has no idea about your business domain.” You wouldn’t hand a junior your most complex brownfield task and walk away. Don’t do it with AI either.

The Organizational Trap

The biggest risk isn’t the technology — it’s the organizational response. Here’s a pattern I’ve seen play out:

  1. Leadership sees AI demos (greenfield, impressive)
  2. Leadership mandates AI-first development for all teams
  3. A “champion team” is created to evangelize the tool
  4. The champion team shows success on new, isolated projects
  5. Core teams struggle on existing codebases
  6. Core teams are told they’re “not using it right”
  7. Developer morale drops, best engineers start updating their LinkedIn

This is the organizational velocity trap amplified by AI hype. The tool works in the demo. The demo doesn’t match the reality. The gap gets papered over with metrics.

What I’d Recommend

If you’re an engineering leader being pressured to adopt AI coding tools aggressively:

  1. Separate greenfield from brownfield expectations. Set different productivity targets. A 30% gain on new services is realistic. A 5% gain on legacy systems is honest. Negative impact is possible and should be planned for.

  2. Preserve developer agency. AI should be a tool in the developer’s hand, not a wrapper around the developer. Let engineers choose when and how to use it.

  3. Track rework, not just output. If AI-generated code produces more rework, the net velocity gain is an illusion. DORA’s Rework Rate metric is designed exactly for this.

  4. Measure code quality separately for AI vs. human code. Use static analysis (cyclomatic complexity, cognitive complexity, maintainability index, security hotspots) split by origin. Show the data. Let it speak.

  5. Don’t let greenfield demos set brownfield expectations. The team that built a new app in a week with AI is not evidence that your five-year-old platform can be maintained the same way.

  6. Invest in context, not just tools. Better documentation, architecture decision records (ADRs), and domain knowledge transfer do more for AI effectiveness than any harness.

AI coding tools are genuinely useful. But they’re useful the way power tools are useful — in the hands of someone who understands the material, knows the plan, and can tell when the tool is about to cut through a load-bearing wall.


Next in the series: Measuring What Matters: From Output to Outcomes → — building a measurement framework that connects engineering work to business value.