When Five Scripts Disagree About What a 'Work Task' Means. Jason Walker.

I was debugging a weekly report that undercounted work tasks. The filter logic looked correct. The API calls were returning data. The output was clean. But the number was wrong by about fifteen percent.

So I opened the other four report scripts that touch the same data source and compared their definitions of "work task." Each script had its own keyword list. Each had its own filter logic. Each had been written at a different time, by a different version of me solving a slightly different problem. They all answered the same question: "what counts as a work task?" They gave five different answers.

That is not a code duplication problem. That is a data architecture problem.

The Shape of the Divergence

Five scheduled report scripts, each around 1,200 lines. Roughly 400 lines in each script were identical API boilerplate: authentication, pagination, rate limiting, error handling, retry logic. The remaining 800 lines per script handled the actual report generation, filtering, formatting, and output.

The boilerplate duplication was obvious and annoying, but it was not the real issue. The real issue was buried in the filter functions. Each script maintained its own definition of what constitutes a "work task" versus a personal task versus a project milestone. The keyword lists were similar but not identical. One script checked for six keywords. Another checked for nine. A third used a broader regex pattern that caught items the others missed. A fourth had a pillar-based filter that was too permissive, pulling in personal items that happened to share a tag with work projects.

None of these scripts were wrong in isolation. Each one worked correctly given its own assumptions. The problem was that those assumptions diverged over time as each script evolved independently.

Why DRY Misses the Point

The standard engineering instinct here is DRY: Don't Repeat Yourself. Extract the common code into a shared module, reduce duplication, move on. That framing is technically correct but conceptually insufficient.

If the only problem were repeated API boilerplate, the extraction would be a straightforward refactor. Pull the authentication and retry logic into a shared client, import it everywhere, done. That saves maintenance effort but does not solve the harder problem.

The harder problem is that five independent implementations of "what is a work task" means there is no canonical answer to that question. The data schema allows multiple interpretations. Each script chose one. Over months of independent evolution, those interpretations drifted apart.

Extracting a shared module is not just a code quality decision. It is a data architecture decision. You are declaring: this is what a work task means. This is the canonical filter. This is the single definition that every consumer of this data must use. If you want a different definition, you extend the shared module. You do not write your own.

What the Extraction Actually Looked Like

I built a shared core module with three layers.

The first layer handled all API interaction: authentication, pagination, rate limiting, caching. Every script had reimplemented this independently. Consolidating it eliminated roughly 3,000 lines of duplicated API code across the five scripts.

The second layer handled vault parsing. Each script had been making live API calls to read and filter vault files, which meant minutes of rate-limited requests per run. The shared module introduced a local file parser that reads the vault directly from disk. Parsing 2,553 files dropped from minutes to 0.2 seconds. This was not possible when each script managed its own API client, because no single script had justified the investment in a local parser. The shared module made it worth building once.

The third layer handled the canonical definitions: what is a work task, what are the pillar categories, what keywords map to which domains, what filters apply to which report types. This is where the real value lives. Every script now imports the same definition. When that definition changes, it changes everywhere.

The five scripts dropped from roughly 1,200 lines each to between 300 and 700 lines each. The reduction came from removing duplicated infrastructure and duplicated definitions. The report-specific logic, the part that actually differs between scripts, stayed.

What Extraction Revealed

The most valuable part of the extraction was not the code savings. It was the audit that extraction forced.

When you consolidate five independent filter implementations into one canonical definition, you have to decide which one is right. That comparison surfaced three problems.

First, one pillar filter was too broad. It was pulling personal items into work reports because the tag overlap between two categories was wider than intended. This had been silently inflating one report's numbers for weeks.

Second, the personal keyword lists varied. One script recognized seven terms. Another recognized four. The canonical list, once I sat down and wrote it deliberately, has eleven. Three scripts had been missing items.

Third, the task status logic differed. Two scripts treated "in progress" and "active" as equivalent. The other three did not. The canonical definition now handles both, explicitly, with a documented rationale.

None of these bugs would have been found through normal testing. Each script passed its own tests. The divergence only became visible when the implementations were placed side by side during extraction.

When to Extract

The conventional trigger for extraction is "I see the same code in multiple places." That is a fine trigger for boilerplate. It is the wrong trigger for data architecture.

The right trigger: multiple consumers of the same data are producing different answers to the same question.

If two scripts read the same API and process the response differently because they need different things from it, that is fine. They have different purposes. Let them diverge.

If two scripts read the same API and process the response differently because they have different definitions of the same concept, that is a data architecture problem. They are not doing different things. They are doing the same thing with incompatible assumptions.

The diagnostic question is not "do these scripts share code?" It is "do these scripts agree on what the data means?"

If the answer requires you to open five files and compare filter functions line by line, you have found your extraction point. Not because DRY says so. Because your system has five competing definitions of a concept that should have one.

The Deeper Principle

Shared modules are often framed as an engineering convenience. Less code to maintain, fewer places to update, cleaner architecture. All true. But when the shared module encodes a definition, not just a utility, it becomes the system's statement of what the data means.

That is a design decision, not a refactoring decision. Once five scripts import the same definition, changing that definition changes the behavior of five reports simultaneously. That is power and risk in equal measure.

The extraction is worth it precisely because of that coupling. One definition means one place to audit, one place to verify. Five definitions means silent drift, contradictory outputs, and a debugging session that starts with "why is this number wrong by fifteen percent" and ends with "because we never agreed on what a work task is."

Do your consumers agree on what the data means? If they do not, the extraction is not optional. It is overdue.

When Five Scripts Disagree About What a 'Work Task' Means

The Shape of the Divergence

Why DRY Misses the Point

What the Extraction Actually Looked Like

What Extraction Revealed

When to Extract

The Deeper Principle

Weekly writing from inside the work.

Your Risk Program Is Running on Yesterday's Weather

Why Cyber Podcasts Are the Safety Briefing Nobody Skips

The Accountability Gap Nobody Wants to Own

Weekly writing from inside the work.