CLAUDE.md as Guardrails: Compressing AI Context. Jason Walker.

By December 2025, my root instruction file had grown to 611 lines and 40 kilobytes. It contained everything: cloud infrastructure details, Telegram bot commands, API server endpoints, database connection patterns, scheduled service configurations, team definitions, and dozens of context standards. Every session, the AI system would load all of it, burning token budget on reference material that wasn't needed for that day's work.

The irony struck me: I had built an entire progressive context-loading system for spawned AI agents (loading only what is relevant for their task) while violating that principle systematically at my own root level. I was preaching architectural discipline while practicing sprawl.

This essay is about the discipline audit I ran, the 55% compression I achieved without losing capability, and what it reveals about how we should architect AI systems at any scale.

The Bloat Audit

I started by categorizing every line in the instruction file:

Behavioral guardrails. Standing instructions like "auto-commit after Write/Edit operations" and "detect learning signals during conversations." These are load-critical. They shape session behavior from the first message. ~60 lines, 1.2 KB.

Conventions and routing. File structure, naming patterns, how tasks flow from inbox to project to execution. These are reference material used occasionally. ~120 lines, 2.1 KB.

Operating standards. The 22 architecture decisions (AD-001 through AD-022), pillar definitions, device ecosystem details, integration notes. Used during planning and design decisions, but not every session. ~250 lines, 8.3 KB.

Reference documentation. Detailed API specs, scheduled service configurations, cloud infrastructure diagrams, integration endpoints, secrets management protocols. Used on-demand when building or debugging. ~180 lines, 28.4 KB.

The reference category was the killer. It contained things like cloud server details (IPs, SSH access, container configurations, database specs), scheduled service inventories with polling intervals, a complete table of all agents across multiple teams with their roles, and detailed cloud provider configuration notes.

Every one of these details was genuinely useful when I needed it. Which was approximately 5-10% of the time. The other 90% of sessions, I was carrying 28 KB of infrastructure documentation that was not relevant.

The Progressive Disclosure Principle Applied Reflexively

I'd architected the system to load agent context progressively: Tier 1 (always) includes priorities and blockers; Tier 2 (on-demand) includes background context, reference materials, and deeper details. Agents never load Tier 2 unless the task demands it.

I should have applied the same principle to myself.

The solution was to extract all reference material into separate files, keep only the links and a brief summary in the root instruction, and load the full reference files on-demand via the read tool. This is semantically identical to what Tier 2 context loading does for agents, except I'm doing it consciously as a human system designer.

The Architecture

The new system has two parts:

CLAUDE.md (root instruction). 319 lines, 18 KB. Contains:

Behavioral guardrails (standing instructions, signal detection, auto-commit logic)
Core conventions (file naming, frontmatter, wikilink patterns)
The four architecture pillars and design philosophy (2 paragraphs)
The memory protocol (structure, keys, session boundaries)
Planning cascade (how monthly/weekly/daily planning flows)
Task tiers definition
Pillar tracking rules
Directory structure (brief, with pointer to reference if detail needed)
One line per reference documentation: "Routing tables, cloud infrastructure, API details available in .claude/reference/ on-demand"

.claude/reference/ (on-demand reference).* Five extracted files:

routing.md - intelligent routing logic, escalation signals, team decision framework
cloud-infra.md - cloud server details, SSH, containers, database specs
api-server.md - API endpoints, authentication, sandbox policy
telegram-bot.md - bot commands, syntax, examples
integrations.md - service list, database access, known issues, credentials management

Total reference: 560 lines, 11.8 KB, all organized by topic.

The root instruction file now says: "Detailed documentation extracted to .claude/reference/, read on demand" and points to each file by topic. When I need to debug a connection issue, I read cloud-infra.md. When I am setting up a new integration, I read integrations.md. When I am routing a task, I read routing.md.

What This Saves

Token budget. 22 KB less instruction data per session = roughly 5,500 tokens saved per session. Over 200 annual sessions, that's 1.1M tokens freed for actual work. At Claude 3.5 Sonnet pricing, that's approximately $15-20/year, which sounds small until you account for latency: faster context loads mean faster first-token times.

Cognitive load. I spend 60 seconds reading the root instruction file at the start of a new agent team or major project. Previously, I was reading 611 lines hoping to remember where something was. Now I read 319 lines of essentials and know where to look for details.

Editing friction. When I discovered a new integration pattern or cloud service detail, I used to think "should I update the root file?" Now the answer is clear: update the reference file where it belongs. This removes the coupling between "root instruction maintenance" and "operational documentation maintenance."

The Irony

Here's what ate at me during the audit: I had written an entire architecture decision (AD-014) about "progressive context loading for agents." The abstract was literally: "Load only relevant context for the agent's current task. Tier 1 (always): critical instructions. Tier 2 (on-demand): reference material, search results, contextual data."

I had built the infrastructure to implement this for spawned agents. Then I ignored my own design at the root level. Every session, I was loading Tier 2 reference material that wasn't needed, burning tokens, adding latency, creating decision friction when I wanted to update something.

The audit wasn't an external discovery. It was enforcing the principles I'd already articulated but hadn't applied to myself. The moment I recognized that, the refactor became inevitable.

Lessons for AI System Design

This pattern generalizes. In enterprise environments, I've seen the same thing: instruction documents for AI systems grow to 50-100 KB, mixing critical behavioral guardrails with reference material that's used 10% of the time. The thinking usually goes: "The AI should have access to everything it might need."

That's backwards. The AI should have fast, reliable access to what it needs, and efficient access to what it might need. These are different things.

The discipline isn't "minimize the instruction file." The discipline is "be explicit about what must be loaded, what should be available on-demand, and what shouldn't be loaded at all."

In state government, we are applying the same logic to agency cybersecurity documentation. Instead of requiring agencies to embed every policy, procedure, and control detail in a comprehensive manual, we are building a reference architecture where the root document (the CIS Controls framework, mapped to NIST) is always available, but detailed agency-specific guidance is linked and loaded on-demand. The result is faster onboarding, better compliance tracking, and easier maintenance.

The principle is the same whether you're managing an AI system, a personal operating system, or an enterprise security program: load critical stuff always, make useful stuff accessible on-demand, and don't burden the system with what it doesn't need.

The Closing Principle

Every token of documentation you load displaces a token of actual work. This is true for AI systems, human cognition, and organizational capacity.

The refactor didn't change what the system can do. It didn't require any new infrastructure. It was pure discipline: extracting reference material, organizing it by topic, and loading it on-demand instead of upfront. Result: 55% reduction in root instruction size, zero loss of capability, measurable improvements in first-token latency and cognitive accessibility.

If your instruction files, runbooks, or operating procedures are bloated, they probably need this treatment. Ask yourself: what must be known at session start, what should be accessible with one read command, and what can stay archived until needed? Then architect accordingly.

The smallest architectural improvement is often the one you've already designed but haven't enforced against yourself yet.

CLAUDE.md as Guardrails: Compressing AI Context

The Bloat Audit

The Progressive Disclosure Principle Applied Reflexively

The Architecture

What This Saves

The Irony

Lessons for AI System Design

The Closing Principle

Weekly writing from inside the work.

Your Risk Program Is Running on Yesterday's Weather

Why Cyber Podcasts Are the Safety Briefing Nobody Skips

The Accountability Gap Nobody Wants to Own

Weekly writing from inside the work.