The review loop is the only loop in Shelf that’s purely about process. There’s no test file to run, no config to validate, no CI job to trigger. The deliverable is a written policy that makes the second-opinion review portable across tools. That’s harder than it sounds, because it means encoding decisions that most teams leave in someone’s head.
Expected experience walkthrough
Writing the playbook
The shipped docs/review-loop-playbook.md breaks every possible review finding into three buckets. Here’s how to think about each one.
Blocking findings are patterns where the cost of merging the bad code is higher than the cost of stopping the PR. In Shelf, these are specific and concrete:
- An API handler reads user identity from the request body instead of
locals.user. This is a real security boundary—the request body is attacker-controlled,locals.useris server-validated. - An admin route doesn’t call
requireAdministrator. Missing authorization on an admin route isn’t a style nit. - A Drizzle query on user-owned data doesn’t scope by the current user. This means one user can see or modify another user’s shelf entries.
- A
catchblock swallows an error and returns 2xx. The caller thinks everything is fine. It isn’t. - A Playwright spec uses
page.waitForTimeout, a raw CSSpage.locator, or logs in manually instead of using storage state. These are banned by the ESLint rules and byCLAUDE.md. If a reviewer catches them, the static layer has a hole. - A change removes the dossier script, the JSON reporter, or the retained trace config. Those are the diagnostic lifeline for every future failure.
Notice how none of these say “the code looks messy” or “this could be refactored.” Blocking findings are about correctness, not taste.
Judgment findings are the reviewer saying “this smells off but I could be wrong.” Unclear naming, error messages that could be more specific, helpers that might deserve extraction. These are worth reading. They’re often worth fixing. But the author can push back with reasoning, and that’s fine. The playbook’s job is to make sure judgment findings don’t get treated as blocking or as noise.
Noise findings are the stuff the reviewer should never have mentioned. In Shelf, this includes generated files under playwright-report/, snapshot PNGs, HAR fixtures, sample-config.json, and lockfiles. If a reviewer keeps flagging these, the problem isn’t the reviewer’s taste—it’s the reviewer’s instruction file. Update it.
The rule of three
This is the escalation ramp. When the same kind of finding shows up on three different PRs:
- First two times: fix the PR and move on. It might be a coincidence.
- Third time: add a rule to
CLAUDE.mdencoding the pattern. Now every agent that reads the project instructions knows about it. - Still recurring after the rule: add a lint rule, a test assertion, or a CI gate. The finding has graduated from advisory to automated.
The rule of three is the feedback mechanism that turns review findings into upstream prevention. Without it, the reviewer is just a very expensive grep that runs on every PR forever.
Choosing an alternate review surface
The lab asks you to pick one surface besides Bugbot. The three natural choices for Shelf:
- GitHub Copilot review: add
.github/copilot-instructions.mdwith the same blocking rules from the playbook. Copilot reads this file automatically on review requests. - Codex review: add a repository prompt file that a Codex reviewer reads before reviewing the diff. Same rules, different format.
- Claude Code with the
quick-reviewskill: the agent already readsCLAUDE.md, which contains the blocking patterns. The instruction surface is the project instructions file itself.
You don’t need all three. You need one that works. The test is: if Bugbot disappeared tomorrow, could the alternate reviewer pick up the same job without someone explaining the rules in a Slack thread?
The instruction surface
Whatever tool you pick, it needs an instruction file in the repository. Not in a SaaS dashboard. Not in a team wiki. In the repo, checked in, versioned, reviewable. The whole point of this lab is that the review policy survives a tool change. If the policy lives in Bugbot’s settings page, it doesn’t survive anything.
The instruction file should tell the reviewer the same things the playbook tells a human:
- What correctness means in this codebase (auth patterns, data scoping, error handling).
- What patterns are explicitly forbidden (the ESLint bans, the
CLAUDE.mdrules). - What categories of finding matter most (blocking over judgment, judgment over noise).
- What to ignore entirely (generated files, fixtures, lockfiles).
The re-review path
After a fix lands, the reviewer needs to rerun. The playbook documents this explicitly: push a new commit, the reviewer reruns automatically on the next push. On the second pass, the reviewer applies the same blocking rules. If it stays silent on a remaining blocking pattern, the instruction file has a gap—fix the instructions, not the test.
This seems obvious until you’re under pressure and the reviewer just approved a PR that still has an unscoped Drizzle query. If the re-review path isn’t explicit, “just push again” turns into “I’ll fix it in the next PR,” which turns into “oh, that’s been in production for three months.”
Judgment criteria
The playbook is working when:
- A new team member can read
docs/review-loop-playbook.mdand know exactly which findings are stop-the-line serious. - The alternate reviewer can pick up a PR without any verbal context and produce findings in the right categories.
- The rule of three has actually fired at least once—meaning a repeated finding has graduated from review comment to
CLAUDE.mdrule or lint rule. - Noise findings are rare, because the instruction surface tells the reviewer what to skip.
- The re-review path after a fix is boring and predictable: push, rerun, done.
If any of those aren’t true, the playbook needs tightening. The most common failure mode is blocking rules that are too vague—“security issues” instead of “reads user identity from the request body.” Specificity is what makes the rules automatable.