CLAUDE.md as Guardrails: Compressing AI Context
Reducing AI instruction sets by 55% without losing capability. Lessons from engineering a personal operating system.
Jason Walker
State CISO, Florida
By December 2025, my root instruction file had grown to 611 lines and 40 kilobytes. It contained everything: cloud infrastructure details, Telegram bot commands, API server endpoints, database connection patterns, scheduled service configurations, team definitions, and dozens of context standards. Every session, the AI system would load all of it, burning token budget on reference material that wasn't needed for that day's work.
The irony struck me: I had built an entire progressive context-loading system for spawned AI agents—loading only what's relevant for their task—while violating that principle systematically at my own root level. I was preaching architectural discipline while practicing sprawl.
This essay is about the discipline audit I ran, the 55% compression I achieved without losing capability, and what it reveals about how we should architect AI systems at any scale.
The Bloat Audit
I started by categorizing every line in the instruction file:
Behavioral guardrails. Standing instructions like "auto-commit after Write/Edit operations" and "detect learning signals during conversations." These are load-critical. They shape session behavior from the first message. ~60 lines, 1.2 KB.
Conventions and routing. File structure, naming patterns, how tasks flow from inbox to project to execution. These are reference material used occasionally. ~120 lines, 2.1 KB.
Operating standards. The 22 architecture decisions (AD-001 through AD-022), pillar definitions, device ecosystem details, integration notes. Used during planning and design decisions, but not every session. ~250 lines, 8.3 KB.
Reference documentation. Detailed API specs, scheduled service configurations, cloud infrastructure diagrams, integration endpoints, secrets management protocols. Used on-demand when building or debugging. ~180 lines, 28.4 KB.
The reference category was the killer. It contained things like:
"Droplet details: IP 165.227.198.40, SSH root@165.227.198.40, 6 Docker containers, PostgreSQL + pgvector, 200K+ rows."
"Scheduled services: PLAUD watcher (15m), Todoist sync (1h), embeddings (30m), git sync (5m)."
Complete table of all 37 agents across 8 teams with their roles.
"DigitalOcean configuration: ssh tunnel required, docker container IP 172.18.0.2, not localhost, use db-tunnel.sh discover if IP changes."
Every one of these details was genuinely useful—when I needed it. Which was approximately 5-10% of the time. The other 90% of sessions, I was carrying 28 KB of infrastructure documentation that wasn't relevant.
The Progressive Disclosure Principle Applied Reflexively
I'd architected the system to load agent context progressively: Tier 1 (always) includes priorities and blockers; Tier 2 (on-demand) includes background context, reference materials, and deeper details. Agents never load Tier 2 unless the task demands it.
I should have applied the same principle to myself.
The solution was to extract all reference material into separate files, keep only the links and a brief summary in the root instruction, and load the full reference files on-demand via the read tool. This is semantically identical to what Tier 2 context loading does for agents, except I'm doing it consciously as a human system designer.
The Architecture
The new system has two parts:
CLAUDE.md (root instruction). 319 lines, 18 KB. Contains:
- Behavioral guardrails (standing instructions, signal detection, auto-commit logic)
- Core conventions (file naming, frontmatter, wikilink patterns)
- The four architecture pillars and design philosophy (2 paragraphs)
- The memory protocol (structure, keys, session boundaries)
- Planning cascade (how monthly/weekly/daily planning flows)
- Task tiers definition
- Pillar tracking rules
- Directory structure (brief, with pointer to reference if detail needed)
- One line per reference documentation: "Routing tables, cloud infrastructure, API details available in .claude/reference/ on-demand"
.claude/reference/ (on-demand reference).* Five extracted files:
routing.md— intelligent routing logic, escalation signals, team decision framework (was 180 lines, now its own file)cloud-infra.md— Droplet IP, SSH, Docker containers, PostgreSQL details, devices (was 45 lines, now ~80)api-server.md— API endpoints, authentication, sandbox policy (was 30 lines, now ~60)telegram-bot.md— bot commands, syntax, examples (was 25 lines, now ~50)integrations.md— service list, database access, known issues, credentials management (was 110 lines, now ~140)
Total reference: 560 lines, 11.8 KB, all organized by topic.
The root instruction file now says: "Detailed documentation extracted to .claude/reference/ — read on demand" and points to each file by topic. When I need to debug the SSH tunnel, I read cloud-infra.md. When I'm setting up a new integration, I read integrations.md. When I'm routing a task, I read routing.md.
What This Saves
Token budget. 22 KB less instruction data per session = roughly 5,500 tokens saved per session. Over 200 annual sessions, that's 1.1M tokens freed for actual work. At Claude 3.5 Sonnet pricing, that's approximately $15-20/year, which sounds small until you account for latency: faster context loads mean faster first-token times.
Cognitive load. I spend 60 seconds reading the root instruction file at the start of a new agent team or major project. Previously, I was reading 611 lines hoping to remember where something was. Now I read 319 lines of essentials and know where to look for details.
Editing friction. When I discovered a new integration pattern or cloud service detail, I used to think "should I update the root file?" Now the answer is clear: update the reference file where it belongs. This removes the coupling between "root instruction maintenance" and "operational documentation maintenance."
The Irony
Here's what ate at me during the audit: I had written an entire architecture decision (AD-014) about "progressive context loading for agents." The abstract was literally: "Load only relevant context for the agent's current task. Tier 1 (always): critical instructions. Tier 2 (on-demand): reference material, search results, contextual data."
I had built the infrastructure to implement this for spawned agents. Then I ignored my own design at the root level. Every session, I was loading Tier 2 reference material that wasn't needed, burning tokens, adding latency, creating decision friction when I wanted to update something.
The audit wasn't an external discovery. It was enforcing the principles I'd already articulated but hadn't applied to myself. The moment I recognized that, the refactor became inevitable.
Lessons for AI System Design
This pattern generalizes. In enterprise environments, I've seen the same thing: instruction documents for AI systems grow to 50-100 KB, mixing critical behavioral guardrails with reference material that's used 10% of the time. The thinking usually goes: "The AI should have access to everything it might need."
That's backwards. The AI should have fast, reliable access to what it needs, and efficient access to what it might need. These are different things.
The discipline isn't "minimize the instruction file." The discipline is "be explicit about what must be loaded, what should be available on-demand, and what shouldn't be loaded at all."
At FLDS, we're applying the same logic to agency cybersecurity documentation. Instead of requiring agencies to embed every policy, procedure, and control detail in a 200-page manual, we're building a reference architecture where the root document (the CIS Controls framework, mapped to NIST) is always available, but detailed agency-specific guidance is linked and loaded on-demand. The result is faster onboarding, better compliance tracking, and easier maintenance.
The principle is the same whether you're managing an AI system, a personal operating system, or an enterprise security program: load critical stuff always, make useful stuff accessible on-demand, and don't burden the system with what it doesn't need.
The Closing Principle
Every token of documentation you load displaces a token of actual work. This is true for AI systems, human cognition, and organizational capacity.
The refactor didn't change what the system can do. It didn't require any new infrastructure. It was pure discipline: extracting reference material, organizing it by topic, and loading it on-demand instead of upfront. Result: 55% reduction in root instruction size, zero loss of capability, measurable improvements in first-token latency and cognitive accessibility.
If your instruction files, runbooks, or operating procedures are bloated, they probably need this treatment. Ask yourself: what must be known at session start, what should be accessible with one read command, and what can stay archived until needed? Then architect accordingly.
The smallest architectural improvement is often the one you've already designed but haven't enforced against yourself yet.