All insights

AI in Operations

When the ledger becomes the inbox

If your review-counter accumulates instead of clearing, it has stopped being a gate and started being a guilt counter. Redesign it.

Jason Walker

.6 min read

I asked my code-review hook how many entries were unreviewed. It said 652.

The hook had been doing exactly what I told it to do. Every code edit appended a row to a markdown ledger. Every session-end checked the unreviewed count. If the count exceeded a threshold, the hook printed a message telling me to run the code-reviewer agent before stopping. The message was polite. The message was clear. The message had been printing for six weeks.

The bug is not in the hook. The bug is in the contract.

A counter that goes up when work happens and only goes down when a human takes a separate action is structurally an inbox. Inboxes accumulate. The only inboxes that ever reach zero are the ones a human has decided are worth clearing on a rhythm they can sustain. Code-review backlogs are not those inboxes. The human's incentive to clear them is the avoidance of a hypothetical future bug; the cost is concrete present time. The hypothetical future bug wins about as often as you would expect, which is almost never. The threshold stops being a signal and starts being a number you scroll past on every session end.

A review counter ticking upward over weeks while no entries get cleared, illustrating the inbox accumulation pattern
A counter that only clears on human action is not a gate. It is an inbox.

Tonight I dispatched five parallel code-reviewer agents at the backlog, partitioned by directory, with each agent writing findings to its own scratch JSON so they would not race each other on the shared ledger. The review took about twenty minutes and surfaced one critical finding, seventeen high, twenty-two medium, seventeen low. The critical was a database password that had been committed to git in two files for six weeks. The high findings included three Python scripts that had parse-blocking bugs in them from the moment they were committed, because an LLM had generated them and nobody had ever tried to run them. The mediums included seventeen separate places in the codebase that reimplemented the same three-line atomic-file-write pattern, because the helper that did it correctly was inlined in one script and nobody knew to extract it. The lows were paper cuts.

None of it was technically broken. The droplet was up. The cron was firing. The crypto paper trade was profitable. The dissertation was on track. The system was running the entire six weeks the credential was leaking. What had accumulated was not failure. What had accumulated was the latent debt that comes from rapid AI-assisted single-author build with no pre-merge review gate. Each script was generated for an immediate need, satisfied that need, and never got revisited. Each pattern was reinvented by the next agent that touched a similar problem, because the agent did not know the helper existed and the LLM did not surface it. The system worked. The system was also full of holes that a different human or a different agent could have walked through any time they wanted.

The fix is not "review harder." Review-harder is the suggestion the ledger was making, in the polite stop-hook message, every session for six weeks. Review-harder does not work because the incentive math has not changed. If accumulation reached 652 under the old contract, accumulation will reach 1,200 under a sterner version of the same contract.

A pre-write hook intercepting a credential before it lands in git, contrasted with a post-merge ledger waiting for human review
Tools that prevent the bad write close the loop without human attention. Tools that prompt for review eventually produce an unread inbox.

The fix is to redesign the gate so the human is not the bottleneck. Three specific moves, in increasing order of leverage:

First: build a PreToolUse hook that scans every Write and Edit operation for credential patterns before the write lands. A database connection string with an embedded password should never reach disk, let alone git. The hook is a pure function from input to allow-or-block. It does not depend on a human noticing the row in a ledger and choosing to act. I built one tonight, took maybe twenty minutes, tested it three ways. The next time someone tries to commit a literal credential, the hook stops the write at the source.

Second: extract every pattern that gets reinvented more than three times into a shared utility, then write a lint rule that fails any new code path that reinvents the pattern locally. The atomic-write pattern is a perfect example. Three lines of code, reinvented at seventeen separate sites, each site mildly different, each site equally racy with the git-sync cron that force-resets the working tree every five minutes. A shared atomic_write_text in infrastructure/core/atomic_io.py plus a code-review rule that flags inlined tempfile-then-replace patterns closes the loop without anyone reading the ledger.

Third: when a review-counter starts accumulating instead of clearing, redesign the counter. Not "set a sterner threshold." Not "find time to clear it." Redesign it. Replace count-based thresholds with age cutoffs, severity tags, or sampled review. Tag every ledger row with the directory it touches and only require review on the directories where review pays for itself. Accept that some directories will never have reviewed entries and either delete them from the ledger or stop counting them.

The deeper pattern is that tools that prompt humans to review eventually produce an unread inbox. Tools that prevent the bad write from happening close the loop without human attention. The first class of tools is friendly and respectful. The second class of tools actually works. Six weeks of polite stop-hook messages produced exactly the same code as zero stop-hook messages. The credential-lint hook I built tonight will block its first attempted commit somewhere in the next six weeks, and at that moment the loop will have closed because the hook did not need me to do anything.

If you are building anything with an AI coding assistant on a personal or rapid-build project, watch for the moment when one of your gates starts accumulating instead of clearing. That is the moment to redesign the gate. Not the moment to feel bad about the count.

Keep reading

Weekly writing from inside the work.

Practitioner-researcher essays four times a week. No spam, unsubscribe in one click.

Subscribe

Weekly writing from inside the work.

Field observations and framework critiques from a practitioner-researcher running cybersecurity at scale. AI in operations, FAIR risk research, and the leadership patterns that hold both together. No spam. Unsubscribe in one click.