One Reviewer Is Three Bugs Away From Done

I finished a long-form draft this weekend and ran it through what I expected to be a routine review pass. The deterministic style audit caught the usual mechanical violations: forbidden words, em-dashes I had snuck in, parallel construction warnings I deliberately let stand. Mechanical pass clean.

Then I dispatched three independent reviewers in parallel, each with a different mandate. Fact-check verified citations against actual sources. Alignment review compared the draft against the upstream plan and prior chapters. Synthesis review walked the internal logic end to end.

I expected a consolidation step that produced a tidy finding or two. What I got was three reports that found three completely different classes of bugs.

The fact-check reviewer caught four likely-fabricated citations. Author quintets attached to the wrong journal. A real author's title from one paper paired with the citation tuple of a different paper. Article ID off by one digit. A conference proceedings that did not exist in the year cited. Standard hallucination patterns, nearly invisible in flow but each one a defense liability.

The alignment reviewer found that I had quietly drifted on a numerical range between two chapters. One chapter said the sample range was 50 to 150. The newer chapter said 50 to 140 because the math actually worked out to that. Both numbers were defensible in isolation. Together they were a mismatch waiting to be picked up by anyone who read both chapters in the same sitting. The reviewer also flagged a methodological substructure I had introduced without flagging it for the upstream decision-maker.

The synthesis reviewer found the bug that mattered most. Deep inside the analysis section, I had operationalized one of the dependent variables in a way that did not match its conceptual definition from the framework section. The framework said the variable was a probability-weighted product of two factors. The analysis treated it as the raw count of one of those factors. Two paragraphs apart in the same chapter, a real construct mismatch that no fact-checker or alignment reviewer would have flagged because each was looking at a different part of the document.

Three reviewers, three orthogonal failure modes. Combined into one prompt, the same agent would have collapsed the findings into a generic punch list and missed the specificity that produced each catch.

This is a pattern, not a one-time result.

The fact-check failure mode is verifiability against external sources. The reviewer's job is to confirm that every claim in the document has a source and that the source is real. The work is shaped by the citation list and the bibliography. A reviewer with a broader mandate dilutes its attention across other concerns and stops verifying every entry.

The alignment failure mode is fidelity to upstream commitments. The reviewer's job is to compare the document against the plan, the prior chapters, the decision records, the project files. The work is shaped by the upstream artifacts. A reviewer with a broader mandate stops cross-referencing and starts summarizing.

The synthesis failure mode is internal consistency. The reviewer's job is to walk the document end to end and check that every promise the document makes to itself is delivered, every cross-reference resolves, every variable is used the same way in every section. The work is shaped by the structure of the document under review. A reviewer with a broader mandate stops walking the structure and starts producing a summary of impressions.

Each role's effectiveness depends on its scope being narrow enough to produce specificity. The combined reviewer's effectiveness collapses because no single role gets enough attention budget to produce its specific catch.

The same dynamic appears in code review. A linter will not catch a security vulnerability. A security reviewer will not catch a performance regression. A performance profiler will not catch an architectural drift from the documented design. Each role exists because each role's attention pattern matches a specific class of bug. Folding them together does not make a stronger reviewer; it makes a vague reviewer that skims everything and catches whatever is loudest.

The lesson is to resist the urge to combine. When you have a complex artifact with multiple distinct quality dimensions, do not give one reviewer the union of all the mandates and call it a quality pass. Split the mandates. Dispatch the reviewers in parallel. Consolidate the findings into a single fix list at the end. The wall-clock cost is the same. The catch rate is dramatically higher.

I had assumed the long-form artifact was close to done before the reviewers ran. Twenty-two batched fixes later, what was actually close to done was the first version of a draft that knew which questions to ask of itself. The reviewers were not duplicating a job I could have done myself. They were doing three jobs that only resemble each other from a distance.