Compound engineering for the implementation phase

9 min read · ai · tooling · context-engineering

Compaction eats your decisions

A coding agent in implementation mode burns tokens fast. Reading files, capturing build output, dumping git status. A 1M context window sounds like plenty until a single grep across a monorepo lands half the codebase in the transcript.

When the context window fills, the session compacts. Compaction is lossy. A decision made fifty turns ago comes back as a paragraph summary, and the constraint that mattered is gone. The next turn rewrites code you already agreed not to touch.

The fix is not a bigger context window. The fix is loading less into the one you have.

Composing the root CLAUDE.md with @import references

Claude Code reads a user-level config from ~/.claude/CLAUDE.md. The ~/.claude/ directory is hidden by default on macOS and Linux, but it is the file every session loads first, before any project-specific config. Call it the root Claude file: whatever it says applies to every project, every agent, every turn.

Keeping that file thin is the point. The pattern Claude Code uses, @filename.md, inlines the referenced file at session start. So the root file stays a small index and each tool gets its own focused page:

## Subagent Model Selection

When spawning subagents via the Agent tool, use `model: "sonnet"` for routine tasks like:
- Codebase exploration and search
- File reading and summarization
- Running tests or builds
- Simple code generation or edits

Reserve the default (Opus) for tasks requiring deep reasoning, complex
architecture decisions, or multi-step problem solving.

## References

@CRG.md
@RTK.md
@PLANNOTATOR.md
@CLAUDE-TOKEN-EFFICIENT.md
@WORKTRUNK.md

The whole file is 17 lines. Each @import resolves to a self-contained module:

Splitting the config this way pays off for three reasons. Authoring is local: tweaking the RTK rules touches one 18-line file, not a 500-line monolith. Reuse is easy: the same RTK.md ships across machines and into project-level CLAUDE.md files via @~/.claude/RTK.md. And the agent gets a clean mental model of what each module is for, because each file is named after the tool it describes rather than the agent behaviour it shapes.

Trimming the agent’s own output with CLAUDE-TOKEN-EFFICIENT.md

The agent’s own output is context too. Every “Let me check…” preamble, every restated user instruction, every paragraph of closing fluff feeds back into the next turn.

A one-page global instruction trains it out:

# Token Efficient

- Think before acting. Read existing files before writing code.
- Be concise in output but thorough in reasoning.
- Prefer editing over rewriting whole files.
- Do not re-read files you have already read unless the file may have changed.
- No sycophantic openers or closing fluff.
- Go straight to the point. Lead with the answer or action, not the reasoning.
- Do not restate what the user said. Just do it.
- If you can say it in one sentence, don't use three.

A 30% shorter assistant turn keeps the next user turn 30% cheaper to process, and the saving compounds across a long session.

Structural code search with code-review-graph

A grep for a function name returns every occurrence: the definition, every call site, every comment that mentions the name. The agent reads all of them and decides relevance after the fact, billed per token.

code-review-graph (CLI code-review-graph, MCP server) builds a local knowledge graph of the codebase. Symbols are nodes. Calls, imports and tests are edges. The agent queries the graph instead of the filesystem:

query_graph callers_of "parsePost"     # who calls this?
query_graph tests_for "parsePost"      # which tests cover it?
get_impact_radius "src/utils/slugify"  # blast radius of a change
semantic_search_nodes "rate limit"     # find code by concept

The response is structural. A list of nodes with locations, not the source lines themselves. The agent reads files only after the graph tells it which ones matter. A typical “where is X used?” turn drops from a fifty-file grep dump to a dozen-node response.

Registration is the friction point. A graph the agent doesn’t know about gets ignored. crg-here is a one-shot zsh function that registers the current repo and builds the graph in one step, idempotent so it is safe to run on every clone:

crg-here() {
    local repo
    repo=$(git rev-parse --show-toplevel) || return 1
    code-review-graph register "$repo" --alias "$(basename "$repo")"
    code-review-graph build --repo "$repo"
}

Registration writes to ~/.code-review-graph/registry.json, which a launchd watcher picks up to start indexing in the background. The MCP server is wired per-project in .mcp.json so the agent gets the graph tools the moment it opens the repo:

{
  "mcpServers": {
    "code-review-graph": {
      "command": "uvx",
      "args": ["code-review-graph@2.3.3", "serve"],
      "type": "stdio"
    }
  }
}

Keeping the graph fresh is a PostToolUse hook on Edit|Write|Bash. Every time the agent modifies a file, the graph updates. A SessionStart hook prints code-review-graph status at the top of every session so a stale graph surfaces immediately:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write|Bash",
        "hooks": [
          { "type": "command", "command": "code-review-graph update --skip-flows", "timeout": 30 }
        ]
      }
    ],
    "SessionStart": [
      {
        "matcher": "",
        "hooks": [
          { "type": "command", "command": "code-review-graph status", "timeout": 10 }
        ]
      }
    ]
  }
}

Filtering command output with rtk

git status in a busy worktree returns hundreds of lines. npm install spills deprecation warnings. cargo build dumps the dependency graph on every run. Each one lands in context verbatim.

rtk is a transparent shell wrapper that filters noisy CLI output before it reaches the agent:

rtk git status      # collapse untracked, drop cleanup hints
rtk gain            # show token savings analytics
rtk gain --history  # per-command breakdown
rtk discover        # scan history for missed opportunities

A PreToolUse hook rewrites bare commands automatically. The agent calls git status, and rtk filters under the hood without any explicit instruction:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          { "type": "command", "command": "~/.claude/hooks/rtk-rewrite.sh" }
        ]
      }
    ]
  }
}

The hook is intentionally thin. It reads the agent’s Bash tool input, hands the command to rtk rewrite, and acts on the exit code:

ExitMeaningHook action
0Rewrite found, no permission ruleReturn rewritten command, allow
1No RTK equivalentPass through unchanged
2Deny rule matchedPass through, let Claude Code deny
3Ask rule matchedReturn rewritten command, no auto-allow

All the actual filtering rules live in the rtk Rust binary, so the hook never has to change when a new wrapper is added. A cached version check at the top short-circuits the rest of the hook for old rtk versions instead of failing silently.

Run rtk gain yourself to see the savings. On a working day it reports five-figure token totals, all context the agent never had to spend deciding which lines were signal.

What you get back

Sessions that finish what they start. Compaction is the moment the agent forgets the constraint that mattered (“don’t touch this file”, “we already rejected that approach”). Pushing it from turn 30 to turn 60 means a feature that used to need two sessions and a hand-off prompt now ships in one. No more re-pasting the spec at turn 25 because the next turn is going to lose it.

Faster turns. Less context to upload means less time waiting for the model to start streaming. Filtered output means less time scrolling past noise to find the line that decides the next move. A graph query returns in milliseconds and answers a question that a grep would have spent thirty seconds of file I/O on. The session feels responsive instead of grinding.

Bigger problems become tractable. A monorepo that used to overflow context on the second grep now stays addressable for a full debugging session. Refactors that touch ten files no longer require a planning phase to triage which files the agent is allowed to read first. Structural reads cut another order of magnitude off codebase exploration, so the ceiling on what fits in one conversation moves up, and the kind of task you can hand the agent moves with it.

Less babysitting. The hook catches the noisy command before it costs context. The instruction file catches the verbose reply before it costs the next turn. Structural reads catch the over-eager grep before it costs the next ten. You stop having to interrupt the agent to course-correct on resource use and start trusting it to manage its own budget. That is the real compounding effect: the work the three pieces do is work you no longer do yourself.

Two gotchas worth knowing

CLI over MCP when both exist

An MCP server adds JSON-RPC framing, a schema header, and tool descriptions to every call. A CLI command returns plain output. For deterministic operations like git, file reads, or a build, the CLI is cheaper. Reach for MCP when the operation is stateful or the schema earns its bytes (Xcode build, browser automation, a graph database).

Images over PDFs

A PDF of an API doc burns thousands of tokens on layout and font metadata before any text reaches the model. A screenshot of the same page is a few thousand tokens of pixels and renders identically for the agent’s purposes. When the source is the rendered page, hand it the image.