AI in Operations
Parallel Agents Are a Voice Problem, Not a Research Pattern
Five agents, fifty five posts, three minutes. The job worked because the prompt encoded the author's taste, not because parallelism is magic.
Jason Walker
.5 min read
I needed to add pull-quotes to every essay on this site. Fifty-five posts. One or two sentences each, surfaced as a visual anchor. It is the kind of curation that an editor at a longform magazine would spend a week on. I had three minutes.
So I gave the job to five AI agents in parallel. Eleven posts each. They came back with sixty-three pull-quote selections. The quality was consistent across batches. The picks were thesis-bearing, not random sentences. Every post got covered. Total wall-clock time: about three minutes.
If you have used Claude Code or any agent orchestrator, the parallel dispatch part is not new. The interesting question is not "can five agents work in parallel?" The interesting question is why this worked at all.
Most people think about parallel agents as a research pattern. Each agent fact-checks a different claim. Each agent reads a different paper. Each agent verifies a different vendor. The work splits cleanly because the criteria are objective: did this citation actually appear, does this statute say what we claim, is this number sourced. You can run twenty such agents and they will not disagree about ground truth.
Editorial curation looks different. Picking the highest-leverage sentence in an essay is judgment. Two readers will pick two different sentences. Five agents could plausibly hand you twenty-three different "best lines" across a single post and you would have no principled way to choose. That is the default expectation.
It did not happen, and the reason it did not happen is the load-bearing piece of this whole approach.
Voice signal is not a stylistic preference. It is a decision function.
The orchestrator prompt I sent each agent was about three hundred words. Two hundred fifty of those words described the task: read the whole post first, do not quote titles, do not quote list items, prefer sentences that stand alone semantically, default to one quote per post, allow a second only when the post has two distinct load-bearing arguments. The rest was a voice guide.
Here is what the voice guide said, near-verbatim: the author is a Marine-trained State CISO writing for senior operators. His voice is direct, declarative, no hedging. The best pull-quotes will sound like him: short, confident, often containing the word "is" or "not." Then three concrete examples. "The bridge either holds or it does not." "The task is never the point. The outcome is." "Confidence is not a security control."
That voice guide was the entire difference between sixty-three good picks and sixty-three random ones. Without it, the agents would have defaulted to the cookie-cutter LLM failure mode: pick the most "interesting" sentence, which usually means the most surface-level surprising one, which usually means the wrong one. The thesis lines in good essays are rarely the most surprising sentences. They are the sentences that, if removed, would leave the argument structurally exposed.
Three concrete examples beat ten adjectives. If I had written "concise, declarative, confident" the agents would have produced concise-declarative-confident sentences that did not sound like me. Specific examples made the taste tractable. The agents could pattern-match against three real reference points instead of resolving fuzzy adjectives into their own default voice.
The generalization is what makes this worth writing about. Editorial curation is one case. There are many others. Triage decisions: which alerts to escalate, which findings to surface, which incidents to write up. Prioritization: which of fifty backlog items deserve this sprint. Synthesis: which of twenty interview transcripts contain the load-bearing quotes. All of these look like subjective per-item judgment that resists parallelization. All of them are actually the same shape. You can parallelize them if you can encode your decision function as a voice signal with concrete examples.
The pattern is: a task is per-item independent, the criteria are subjective but tractable, and you can name what "good" looks like with examples. If all three are true, you can run five agents in parallel. The orchestrator prompt does the work of holding consistency, not the agent count.
The failure mode worth naming: if you get inconsistent picks across batches, the voice signal was inadequate, not the parallelism. The fix is more examples, not fewer agents.
This matters because the default reaction when someone proposes "give it to five agents" is some version of "but they will all come back with different answers." That is true if your prompt does not tell them how you decide. It is false if your prompt does. The prompt is the load-bearing element. The agent count is just throughput.
If you are a CISO or an engineering leader who has not yet given a per-item judgment task to a parallel-agent run, this is the unlock. Write the voice signal. Include three concrete examples. Five agents will hand you back consistent work in the time it takes to refill your coffee. The throughput is real. The consistency is engineered.
The work of the orchestrator is not to coordinate the agents. It is to encode the decision.
Keep reading
Weekly writing from inside the work.
Practitioner-researcher essays four times a week. No spam, unsubscribe in one click.