Compound engineering for the implementation phase

May 28, 2026 8 min read · ai · tooling · context-engineering

Compaction eats your decisions

A coding agent in implementation mode burns tokens fast. Reading files, capturing build output, dumping git status. A 1M context window sounds like plenty until a single grep across a monorepo lands half the codebase in the transcript.

When the context window fills, the session compacts. Compaction is lossy. A decision made fifty turns ago comes back as a paragraph summary, and the constraint that mattered is gone. The next turn rewrites code you already agreed not to touch.

The fix is not a bigger context window. The fix is loading less into the one you have.

Composing the root `CLAUDE.md` with `@import` references

Claude Code reads a user-level config from ~/.claude/CLAUDE.md. The ~/.claude/ directory is hidden by default on macOS and Linux, but it is the file every session loads first, before any project-specific config. Call it the root Claude file: whatever it says applies to every project, every agent, every turn.

Keeping that file thin is the point. The pattern Claude Code uses, @filename.md, inlines the referenced file at session start. So the root file stays a small index and each tool gets its own focused page:

## Subagent Model Selection

When spawning subagents via the Agent tool, use `model: "sonnet"` for routine tasks like:
- Codebase exploration and search
- File reading and summarization
- Running tests or builds
- Simple code generation or edits

Reserve the default (Opus) for tasks requiring deep reasoning, complex
architecture decisions, or multi-step problem solving.

## References

@CRG.md
@RTK.md
@PLANNOTATOR.md
@CLAUDE-TOKEN-EFFICIENT.md
@WORKTRUNK.md

The whole file is 17 lines. Each @import resolves to a self-contained module:

CRG.md tells the agent when to reach for code-review-graph for structural and graph analysis of the codebase
RTK.md documents the wrapper meta-commands like rtk gain and rtk discover
PLANNOTATOR.md lists the slash commands for the visual plan review tool
WORKTRUNK.md covers wt worktree commands and the post-merge cleanup rule
CLAUDE-TOKEN-EFFICIENT.md trims the agent’s own replies so each turn costs less to feed back into the next

Splitting the config this way keeps edits local. Tweaking the RTK rules touches one 18-line file, not a 500-line monolith. The same RTK.md can ship across machines and into project-level CLAUDE.md files via @~/.claude/RTK.md. The file names also give the agent a clean mental model: each module is named after the tool it describes, not the behaviour it nudges.

Trimming the agent’s own output with `CLAUDE-TOKEN-EFFICIENT.md`

The agent’s own output is context too. Every “Let me check…” preamble, every restated user instruction, every paragraph of closing fluff feeds back into the next turn.

A one-page global instruction trains it out:

# Token Efficient

- Think before acting. Read existing files before writing code.
- Be concise in output but thorough in reasoning.
- Prefer editing over rewriting whole files.
- Do not re-read files you have already read unless the file may have changed.
- No sycophantic openers or closing fluff.
- Go straight to the point. Lead with the answer or action, not the reasoning.
- Do not restate what the user said. Just do it.
- If you can say it in one sentence, don't use three.

A 30% shorter assistant turn keeps the next user turn 30% cheaper to process, and the saving compounds across a long session.

Structural code search with `code-review-graph`

A grep for a function name returns every occurrence: the definition, every call site, every comment that mentions the name. The agent reads all of them and decides relevance after the fact, billed per token.

code-review-graph (CLI code-review-graph, MCP server) builds a local knowledge graph of the codebase. Symbols are nodes. Calls, imports and tests are edges. The agent queries the graph instead of the filesystem:

query_graph callers_of "parsePost"     # who calls this?
query_graph tests_for "parsePost"      # which tests cover it?
get_impact_radius "src/utils/slugify"  # blast radius of a change
semantic_search_nodes "rate limit"     # find code by concept

The response is structural. A list of nodes with locations, not the source lines themselves. The agent reads files only after the graph tells it which ones matter. A typical “where is X used?” turn drops from a fifty-file grep dump to a dozen-node response.

Registration is the friction point. A graph the agent doesn’t know about gets ignored. crg-here is a one-shot zsh function that registers the current repo and builds the graph in one step, idempotent so it is safe to run on every clone:

crg-here() {
    local repo
    repo=$(git rev-parse --show-toplevel) || return 1
    code-review-graph register "$repo" --alias "$(basename "$repo")"
    code-review-graph build --repo "$repo"
}

Registration writes to ~/.code-review-graph/registry.json, which a launchd watcher picks up to start indexing in the background. The MCP server is wired per-project in .mcp.json so the agent gets the graph tools the moment it opens the repo:

{
  "mcpServers": {
    "code-review-graph": {
      "command": "uvx",
      "args": ["code-review-graph@2.3.3", "serve"],
      "type": "stdio"
    }
  }
}

Keeping the graph fresh is a PostToolUse hook on Edit|Write|Bash. Every time the agent modifies a file, the graph updates. A SessionStart hook prints code-review-graph status at the top of every session so a stale graph surfaces immediately:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write|Bash",
        "hooks": [
          { "type": "command", "command": "code-review-graph update --skip-flows", "timeout": 30 }
        ]
      }
    ],
    "SessionStart": [
      {
        "matcher": "",
        "hooks": [
          { "type": "command", "command": "code-review-graph status", "timeout": 10 }
        ]
      }
    ]
  }
}

Filtering command output with `rtk`

git status in a busy worktree returns hundreds of lines. npm install spills deprecation warnings. cargo build dumps the dependency graph on every run. Each one lands in context verbatim.

rtk is a transparent shell wrapper that filters noisy CLI output before it reaches the agent:

rtk git status      # collapse untracked, drop cleanup hints
rtk gain            # show token savings analytics
rtk gain --history  # per-command breakdown
rtk discover        # scan history for missed opportunities

A PreToolUse hook rewrites bare commands automatically. The agent calls git status, and rtk filters under the hood without any explicit instruction:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          { "type": "command", "command": "~/.claude/hooks/rtk-rewrite.sh" }
        ]
      }
    ]
  }
}

The hook is intentionally thin. It reads the agent’s Bash tool input, hands the command to rtk rewrite, and acts on the exit code:

Exit	Meaning	Hook action
0	Rewrite found, no permission rule	Return rewritten command, `allow`
1	No RTK equivalent	Pass through unchanged
2	Deny rule matched	Pass through, let Claude Code deny
3	Ask rule matched	Return rewritten command, no auto-allow

All the actual filtering rules live in the rtk Rust binary, so the hook never has to change when a new wrapper is added. A cached version check at the top short-circuits the rest of the hook for old rtk versions instead of failing silently.

Run rtk gain yourself to see the savings. On a working day it reports five-figure token totals, all context the agent never had to spend deciding which lines were signal.

What changes in a long session

Compaction is where the agent forgets the constraint that mattered: “don’t touch this file”, “we already rejected that approach”, “keep the public API stable”. Pushing that point from turn 30 to turn 60 can be the difference between one implementation session and a hand-off prompt that restates half the project.

The turns also get less tedious. Filtered output means less scrolling past install logs to find the one failing line. A graph query returns the handful of call sites the agent needs instead of dumping every textual match for a function name.

That changes what you can hand to the agent. A monorepo that used to overflow context on the second grep can stay addressable for a full debugging session. A refactor that touches ten files no longer needs a separate planning phase just to decide which files the agent is allowed to read.

The useful part is boring: fewer interruptions. The hook catches the noisy command before it costs context. The instruction file cuts the verbose reply before it becomes part of the next turn. Structural reads stop the over-eager grep before it fills the transcript.

Two gotchas worth knowing

CLI over MCP when both exist

An MCP server adds JSON-RPC framing, a schema header, and tool descriptions to every call. A CLI command returns plain output. For deterministic operations like git, file reads, or a build, the CLI is cheaper. Reach for MCP when the operation is stateful or the schema earns its bytes (Xcode build, browser automation, a graph database).

Images over PDFs

A PDF of an API doc burns thousands of tokens on layout and font metadata before any text reaches the model. A screenshot of the same page is a few thousand tokens of pixels and renders identically for the agent’s purposes. When the source is the rendered page, hand it the image.