How a 4-engineer team using an AI coding harness compares to an equivalent team without one — measured against the git commit history and grounded in published industry benchmarks.
Counted from git "Merged PR" commits on main. The two empty weeks (May 11/18) are the pre-velocity ramp.
Merged PRs per engineer per week. Baselines: DX (51k devs) & LinearB (8.1M PRs).
Our team reached 427 merged PRs in 8 weeks. A healthy non-AI 4-engineer team (median ~14 merged PRs/week) would need ~30 weeks (~7 months) to reach the same line.
95% of raw LOC is machine-generated scaffolding (knowledge graph, vendored ESB/SOAP schemas, planning docs). Only 4.8% is hand-written app code — which is why we lead with PRs, not LOC.
Conventional-commit classification of authored commits. 349 feat+fix is real product engineering.
Non-renovate PRs authored (Azure DevOps). One lead carries the majority; all four contribute substantively.
Our full-velocity rate: ~20.7 merged PRs / engineer / week.
| Non-AI baseline (cited) | PRs/eng/wk | Our multiple |
|---|---|---|
| Industry median (tech) | 3.5 | 5.9× |
| Top-quartile team (P75) | 4.3 | 4.8× |
| Top-decile / best case (P90) | 5.0 | 4.1× |
| Conservative claim | — | ~5× |
| Central estimate | — | ~6× |
The average merged PR carries ~363 lines of real application/test code — dead-center in the recommended 200–400 line band. These are normal, reviewable PRs.
git log (427 "Merged PR" commits) and Azure DevOps (425 completed PRs) agree to within 0.5% — two fully independent sources.
155 feat + 194 fix + 77 refactor + 56 test commits. 65k lines of hand-written tests gate the 90k lines of app code.
We don't use LOC as a productivity figure. 95% of bytes are openly disclosed as generated/vendored scaffolding produced by the harness for free.
A pre-registered RCT found experienced devs on large legacy codebases were 19% slower with agentic AI. Our opposite result fits the regime AI wins in: a greenfield build, no 1M-line legacy to fight.
Unmanaged AI adoption is tied to −7.2% delivery stability. Our harness-enforced tests, mandatory review, and ~7% iterate-and-discard rate are what keep speed stable.
PRs measure output, not outcome. The real proof is the 2026-10-26 go-live landing with quality. Recommend tracking change-failure-rate and escaped defects alongside throughput.
This is a single 4-engineer greenfield project. Gains may differ on legacy maintenance or larger teams. Treat as a strong directional signal, not a universal multiplier.
log --grep "Merged PR" on main, cross-checked against the Azure DevOps PR API (renovate-bot PRs excluded).
Churn classification: git log --numstat bucketed by path (app vs. generated vs. vendored vs. planning).
Benchmarks —
DX Core 4 Benchmark 2024 (51k developers);
LinearB 2025 Engineering Benchmarks (8.1M PRs, 4,800 teams);
DORA 2024 State of DevOps;
METR 2025 RCT;
PR-size norm 200–400 lines (SmartBear / Google / LinearB).