Pull requests are a symptom of low trust: here’s the fix
T*D is the new name for the Playbook trifecta: Test-Driven Development, Trunk-Based Development and Team-Focused Development, fused into one operating model.

Your team has probably spent more hours this year waiting on pull requests than it spent writing the code marooned inside them. Andrea Laforgia opens his March 2026 essay Stop Using Pull Requests with numbers from across the industry that count the cost. The piece is the most thorough reckoning I’ve seen of why the modern pull request workflow is mis-designed for the teams that use it. Go read it. Then come back here — because if you’re using the Delivery Playbook, Laforgia is describing the practice you already know. He just gives it a name. And I like it.1
That name is T*D — Test-Driven Development, Trunk-Based Development and Team-Focused Development, fused into one operating model. Hold on to it. We’ll come back to it.
Don’t break the flow
Chapter 2.8 Delivery processes & tools lays out what a well-wrought delivery pipeline does: ephemeral environments, test batteries, zero trust, release orchestration, observability, canary releases. The argument is that your pipeline is part of your product, and its job is reliability and repeatability through automation. The motto I’ve handed to more than one team — the only path to higher environments is through automation — comes straight out of that chapter.
Here’s the thing. If you map a typical team’s workflow against that pipeline, there is exactly one step where a human might still be gating the flow. Tests run automatically. Security scans run automatically. Infrastructure is provisioned automatically. Releases orchestrate automatically. And then — somewhere between “I pushed” and “it merged” — a colleague has to find time, open the diff, scroll through it and click Approve. That manual gate is the pull request. Everything else in the chapter is built around eliminating exactly this kind of step.
Pull requests are the routine human gate that survived a pipeline whose sole purpose is to remove them. Fix that, and the pipeline finally runs at the speed its design intended.
If you’re new, welcome to Customer Obsessed Engineering! I publish about one article each week. Free subscribers can read about half of every article, plus all of my free articles.
Three findings that should bother you more
Laforgia’s piece marshals the research that counts the cost. Three findings worth carrying into your next retro:
Code review finds fewer bugs than you think. In Bacchelli & Bird’s ICSE 2013 study, the authors manually classified roughly 570 review comments from Microsoft engineers and found that about one in eight — call it 12.5% — were about defects. The rest was structure, naming and style. A separate Microsoft study by Bosu and colleagues built a 1.5-million-comment dataset and asked a different question — were the reviews useful? — and concluded 64–68% were. Two distinct findings, often conflated. The honest read is that code review’s value lives in knowledge transfer and improving the code’s shape, not in catching bugs. And a blocking async queue is one of the clumsiest possible vessels for sharing knowledge.23
Most of the lead time is waiting. A change that takes ten minutes to write but waits four hours for review spends 96% of its existence in queue. Martin Fowler recounts one client whose team logged 130,000 hours waiting on pull requests in 2020 — 91% of which got no comment in return. The process extracted enormous delay and returned almost nothing.4
The lever is the speed of review, not review itself. The 2023 State of DevOps report, drawn from more than 36,000 professionals, found that accelerating code review alone improves software delivery performance by 50%. Elite performers who meet their reliability targets are 2.3 times more likely to use trunk-based development. DORA explicitly names heavyweight code review as a brake on delivery. The methodology that produced this question — the four DORA metrics — comes from Forsgren, Humble and Kim’s Accelerate; the 2023 numbers cited here come from the State of DevOps report that builds on it.56
A caveat that Laforgia treats carefully and I want to carry into the body rather than burying in the close: this evidence speaks loudest for intra-team work, where colleagues know each other and the codebase. Trust is a gradient. For distributed teams collaborating across time zones, open source contributions or any genuinely cross-team change, the case for a gate is stronger. The mistake is importing the gatekeeping pattern wholesale from those high-trust-gap contexts into the daily work of a co-located team that already trusts each other.
T*D is what the Playbook teaches as muscle memory
This is where the Playbook and Laforgia line up. Each leg of T*D slots directly into something you have already been building.
TDD/BDD fills the test batteries slot — properly. Chapter 2.8 says: “If tests fail, the build fails, and the code can’t be merged (no exceptions).” Test-driven and behavior-driven development are how you make that sentence true in practice. The tests exist before the feature exists. They encode the acceptance criteria the team agreed to. They run on every commit, not just at the end. When the gate that decides whether code reaches the next environment is automated and trustworthy, the human gate becomes redundant — there is no class of defect a tired reviewer at 4pm is going to catch that a green CI run did not. This is the same argument I made in The balanced power of TDD: defects must be prevented, not removed. Deming wrote it down in 1982 — Point Three: cease dependence on inspection to achieve quality — and the software industry has been re-discovering it ever since.7
Trunk-based development matches the Playbook’s stance on small increments. Chapter 3.0 Delivery is explicit: “Delivery is intended to move quickly. That means working in small increments, not ‘feature trains’ or monolithic releases.” That is trunk-based development by another name. A long-lived feature branch is a feature train waiting to happen. The Playbook already prefers the alternative — ephemeral environments per branch are fine, but the branches themselves should be short-lived, with code returning to trunk daily and incomplete work hidden behind feature flags. The point is: code the gate has to inspect must be small enough that automated checks will validate it in seconds.
Team-focused development is what you already started building when you killed the tester role. Pair programming and ensemble (mob) programming move review from a post-hoc inspection into a continuous one. In a 2005 controlled experiment, Müller and Tichy found that pair programming and solo-plus-review achieved comparable cost when quality was held constant — pairing did not cost more, it just moved the review work earlier. That experiment is small (38 students in one university), so don’t lean on it alone; Hannay and colleagues’ 2009 meta-analysis casts a wider net across the practitioner record — with the caveat that pairing’s benefits depend strongly on task complexity and developer experience. In both lines of evidence, what changes is when review happens, not whether. I’ve argued the structural version of this case in I don’t hire testers anymore: quality belongs inside the team, not bolted on at the boundary. Pairing is the same logic applied to review.89
This is what I’d like you to carry away: T*D is not three new things to adopt. For anyone running the Playbook, it is a name for the convergence of practices you have been heading toward. The pull request workflow is the last manual gate left after you’ve automated everything else. T*D is what you put in its place.
What to do Monday
Most teams cannot drop pull requests overnight. They shouldn’t try. The path is gradual, and every step buys real flow:
Optimise the PRs you have. Cap them at two or three hundred lines. Set a four-hour review SLA. Automate every style and lint check. Reduce required approvers to one. This is housekeeping, not transformation, but it removes the worst of the queue cost while you build the alternative.
Adopt Ship / Show / Ask. Rouan Wilsenach’s three categories give a team a way to graduate routine changes off the gated workflow first. Routine changes Ship — straight to trunk, no PR. Notable changes Show — they go to trunk immediately, with a PR opened in parallel as the venue for post-merge discussion. Genuinely uncertain or risky changes Ask — they get the blocking review treatment. Most work, once you look honestly, is Ship or Show.10
Pair on production code; merge to trunk daily. Feature flags carry incomplete work; automated tests are non-negotiable. This is also where the Playbook’s Quality Engineer rotation pays off — the engineer in that role this sprint is the one making sure the test batteries and the pipeline can carry the weight you’re about to put on them.
Move to ensemble programming where the team is ready. Continuous review during creation. No post-hoc inspection. This is where most teams will need the most patience — it’s a cultural shift, not a tooling change, and it rewards a coach in the room for the first few weeks.
Charity Majors gives you the lodestar: speed is safety. Small changes by single owners reach production fast, the blast radius stays small, and the fix is obvious because the intent is still fresh.11
The harder question
The question for your team is not how do we do pull requests better? It is why do we still need them? If the answer is fear — of regressions, of juniors, of complexity, of the codebase itself — address the fear directly through skills, automation and pairing rather than institutionalizing it as policy. The Playbook has been quietly pointing at this answer the whole time. Laforgia’s piece is the receipt for the empirical cost of ignoring it.
Read the source
Laforgia’s full essay carries the academic citations, the DORA data, the practitioner consensus and the honest caveats I didn’t hit here. He treats correlation versus causation carefully, draws the trust-as-gradient distinction in more depth than I did, and gives the transition path more nuance than four bullets allow. If you lead a team that still gates every change behind an async queue, read his piece end to end and bring it to your next retro. Then open the Playbook to chapter 2.8 and ask which manual gates are still left in your pipeline. There won’t be many, and that’s the point.12
This newsletter grows by word of mouth… I’d really, truly appreciate it if you could refer a friend. Your referrals make it worthwhile.
Andrea Laforgia, Stop Using Pull Requests, March 19, 2026.
Alberto Bacchelli & Christian Bird, Expectations, Outcomes, and Challenges of Modern Code Review, ICSE 2013. The figure cited here — roughly one in eight comments addressing defects — comes from the authors’ manual classification of approximately 570 comments at Microsoft.
Amiangshu Bosu, Michaela Greiler & Christian Bird, Characteristics of Useful Code Reviews: An Empirical Study at Microsoft, MSR 2015. This is a distinct, larger study covering roughly 1.5 million review comments, which is sometimes conflated with Bacchelli & Bird; its headline finding is that 64–68% of reviews were judged useful (a usefulness measure, not a defect rate).
Martin Fowler, Pull Request (bliki entry), martinfowler.com. The 130,000-hour figure is attributed to a colleague’s analysis of a single client in 2020.
Google Cloud, Accelerate State of DevOps Report 2023.
Nicole Forsgren, Jez Humble & Gene Kim, Accelerate: The Science of Lean Software and DevOps (IT Revolution, 2018). Background methodology behind the DORA metrics; specific figures cited above are from the 2023 report.
W. Edwards Deming, Quality, Productivity, and Competitive Position (MIT, 1982), re-titled Out of the Crisis in 1986. Point Three of the Fourteen Points: cease dependence on inspection to achieve quality.
Matthias Müller & Walter Tichy, Two Controlled Experiments Concerning the Comparison of Pair Programming to Peer Review, Journal of Systems and Software, 2005. Small-n controlled experiment with 38 students; useful as a point of evidence, not a final word.
Jo E. Hannay, Tore Dybå, Erik Arisholm & Dag I. K. Sjøberg, The effectiveness of pair programming: A meta-analysis, Information and Software Technology, 2009. A broader synthesis than Müller & Tichy; finds pairing’s benefits depend strongly on task complexity and developer experience.
Rouan Wilsenach, Ship / Show / Ask, martinfowler.com, September 2021.
Charity Majors, Shipping Software Should Not Be Scary, charity.wtf, August 19, 2018. Source of the “speed is safety” framing.
Andrea Laforgia, Stop Using Pull Requests, op. cit.
