Compound engineering for the plan and review loop

5 min read · ai · tooling · code-review

The bottleneck has moved to review

A coding agent produces a plan in seconds and a diff in minutes. Reading them takes the rest of the session. Generation is cheap now; deciding what’s worth shipping is not.

It sharpens when two agents run at once. Claude Code in one terminal, Codex in another, both touching the same repo. Within minutes one is rebasing on top of the other’s uncommitted changes. A plan comes back as eighty bullet points to read in a terminal. A large diff lands, and a margin note that should have been raised before the build now costs a re-plan.

The tools below target that loop. Each wraps a primitive that already exists: git worktrees, the agent’s own plan output, or the brainstorming conversation. None of them tries to make the judgement call for you. Plenty of agentic tooling leans toward auto-approve and auto-merge. These make the review pass cheap enough that it happens on every change, not only the changes you remember to inspect.

Parallel branches with worktrunk

Two agents in the same checkout collide fast. Git worktrees solve it: one branch per directory, all sharing one object store. worktrunk (binary wt) is a thin Rust CLI that makes the lifecycle ergonomic enough to actually use:

wt switch -c add-search   # create branch + worktree, cd into it
wt list                   # show every active worktree
wt merge main             # merge current into main, auto-clean
wt remove                 # remove worktree, delete branch if merged

Vanilla git is git worktree add ../foo feature/foo && cd ../foo && ... followed by manual cleanup of the worktree, the branch, and the directory. wt collapses each step into one verb, and the shell integration actually changes your working directory.

Each Claude Code or Codex session gets its own worktree, with its own installed dependencies, its own dev server port, its own dirty state:

$ wt list
* main           /Users/me/repo
  add-search     /Users/me/repo-add-search       (claude-code)
  fix-tokens     /Users/me/repo-fix-tokens       (codex)
  prep-release   /Users/me/repo-prep-release

When a branch merges, wt remove deletes the worktree and the branch in one step. The discipline is to keep wt list empty of merged branches. Stale worktrees pile up fast and bring back the collisions the worktrees were meant to prevent.

Spec-to-plan with superpowers

A plan is only as good as the spec it came from. A bullet list assembled from a one-line prompt isn’t a spec.

superpowers ships a methodology rather than a tool. The skills auto-trigger when you start describing a feature, so you don’t invoke them by name. Four matter for this loop:

  • brainstorming: runs before any creative work. Teases a spec out of the conversation in chunks short enough to read, instead of jumping to code.
  • writing-plans: turns the signed-off spec into an implementation plan structured for TDD, with each step narrow enough that a junior could follow it.
  • executing-plans: runs the plan in a separate session with review checkpoints between steps.
  • subagent-driven-development: fans independent steps out to subagents so the main session keeps its context clean.

A writing-plans output looks something like this:

## Plan: Add full-text search to blog

1. Add `fuse.js` dependency
2. Create `SearchIndex.astro` that builds a JSON index at build time
3. Create `SearchBox.svelte` with an input field, debounced query, and result list
4. Wire `SearchBox` into the header layout
5. Add test: build succeeds, index contains all non-draft posts

Each step is narrow enough to review in isolation and small enough that a wrong step costs one revision, not a re-plan.

Plan review with plannotator

A wrong abstraction, a missed edge case, intent read backwards. These show up in the plan. They’re cheap to fix only before the 600-line diff lands.

plannotator renders a Claude Code plan as a local webpage you can annotate in the margin, then ships your feedback back to the agent. It installs as a plugin, so the surface is slash commands:

/plannotator-review     # annotate a PR diff
/plannotator-annotate   # annotate a plan markdown
/plannotator-last       # annotate the last rendered assistant message
/plannotator-archive    # browse saved plan decisions

A hook picks up plans automatically when the agent enters plan mode, so there is no manual export step. Margin comments come back as a follow-up prompt in the same session, so a note like “split this step into a separate PR” reaches the agent without you retyping anything. The archive keeps every annotated plan around, which turns the review pass into something you can revisit when a decision later looks wrong.

How they compose

The output of each tool is the input to the next.

You describe a feature. brainstorming asks three rounds of questions and produces a signed-off spec. writing-plans turns that spec into a five-step plan. plannotator opens the plan in a browser; you annotate two steps, and the corrections ship back as a follow-up prompt. The agent revises the plan in the same session.

wt switch -c add-search creates an isolated worktree. The agent works through the revised plan with checkpoints between steps. A second agent can run on a different branch in the same repo without colliding. When the diff lands, plannotator reopens the same annotation surface for the PR. The review pass that started on the plan continues on the code.

What changes

The agent has to write down intent before code. That sounds small until you compare it with the usual one-line prompt followed by a confident diff. A signed-off spec gives the plan something to answer to.

The plan becomes the cheapest place to catch the wrong abstraction, missing edge case, or oversized step. Fixing those in markdown costs a reply. Fixing them after a 600-line diff usually means asking the agent to unwind work it already believes is done.

The worktree keeps the implementation contained while the review thread stays connected to the original plan. You still make the judgement call. The difference is that the loop leaves fewer excuses to skip it.