Short lab. Two halves. The first half wires the screenshot gate into Shelf. The second half deliberately breaks a component and watches the loop fire.
Setup
Make sure you’re on the hardened Shelf from the Module 3 lab. You’ll need storage-state authentication and seeding in place. Visual regression without those is a nightmare.
Part one: wire the screenshot gate
Create tests/end-to-end/visual.spec.ts:
import { test, expect } from './fixtures';
test.describe('visual regression', () => {
test('shelf page', async ({ page, seeded }) => {
await page.goto('/shelf');
// wait for the content to render so the screenshot is stable
await expect(page.getByRole('heading', { name: "Alice's Shelf" })).toBeVisible();
await expect(page).toHaveScreenshot('shelf-page.png', { fullPage: true });
});
test('design system', async ({ page }) => {
await page.goto('/design-system');
await expect(page.getByRole('heading', { name: 'Design System' })).toBeVisible();
await expect(page).toHaveScreenshot('design-system.png', { fullPage: true });
});
test('book detail page', async ({ page, seeded }) => {
await page.goto('/books/OL1');
await expect(page.getByRole('heading', { name: 'Station Eleven' })).toBeVisible();
await expect(page).toHaveScreenshot('book-detail.png', {
fullPage: true,
mask: [page.getByTestId('last-updated')],
});
});
});Update playwright.config.ts to disable animations and hide carets:
expect: {
toHaveScreenshot: {
animations: 'disabled',
caret: 'hide',
scale: 'css',
maxDiffPixelRatio: 0.01,
},
},Generate the baselines:
bun playwright test visual.spec.ts --update-snapshotsCommit the baseline images. They should be in tests/end-to-end/visual.spec.ts-snapshots/. Yes, you commit PNGs to git. That’s the deal.
Part one acceptance criteria
-
tests/end-to-end/visual.spec.tsexists and contains at least threetoHaveScreenshotassertions. -
playwright.config.tssetsanimations: 'disabled',caret: 'hide', andscale: 'css'underexpect.toHaveScreenshot. -
bun playwright test visual.spec.tspasses on a clean run (no diffs). - Running the same test five times in a row produces zero false positives:
for i in {1..5}; do bun playwright test visual.spec.ts || break; doneexits zero every iteration. - The committed baseline files exist at
tests/end-to-end/visual.spec.ts-snapshots/(verify withls—all three PNGs should be present). -
.gitignoredoes not ignore snapshot PNGs (grep -i snapshot .gitignorereturns nothing obviously conflicting).
Part two: break something, and watch it fire
Now we’re going to simulate the loop. Open src/lib/components/Button.svelte (or wherever Shelf’s button component lives). Change the padding:
<!-- before -->
<button class="px-4 py-2 ...">
<!-- after -->
<button class="px-6 py-3 ...">Run the screenshot test again:
bun playwright test visual.spec.tsIt fails. Open the HTML report:
bun playwright show-reportFind the failing test, look at the three-panel view (baseline, actual, diff). The diff image should clearly show the buttons in the screenshots have changed size.
Now simulate the agent loop. You have two options depending on your setup:
Option A—manual. Copy the diff image path into your Claude Code conversation and ask: “The visual regression test failed. Here’s the diff. What changed, and is the change intentional?” Let the agent look at the image and describe what it sees.
Option B—hooked. If you’re using Claude Code with a hook that auto-attaches test failures, just run the test and let the hook fire. (We’ll wire this kind of hook properly in Module 9. For today, manual is fine.)
Either way, notice what the agent says. It should identify that the buttons got bigger, guess correctly whether the change cascaded to other layout (on Shelf, it probably pushed some cards wider), and propose either reverting the change or updating the baseline.
Part two acceptance criteria
- The button padding change produces a failing
toHaveScreenshotassertion. -
playwright-report/contains a diff image for the failing test. - You showed the diff to your agent and got a response that correctly identifies what visually changed.
- You either: (a) reverted the button change and re-ran to green, or (b) ran
--update-snapshotsas an intentional baseline update and committed the new baselines as a separate commit. - Your git history shows the experiment as discrete commits so the sequence is legible later.
Stretch goals
- Add a
design-systemroute that exercises every component in every state, if Shelf doesn’t already have one. Add a single screenshot test for the whole route. Consider this your poor-person’s Chromatic. - Set up a second Playwright project for a different viewport (e.g.,
iPhone 13from Playwright’s device presets) and regenerate baselines for it. Now your visual regression covers mobile too. - Try the same flow with Chromatic instead of built-in snapshots, just to see the difference in review UX. Don’t commit the Chromatic setup permanently unless the whole team is on board—the built-in is the long-term pattern for this workshop.
- Intentionally introduce a subtle change (e.g., change a font weight from 500 to 600) and see if the
maxDiffPixelRatio: 0.01tolerance catches it. Tune the tolerance if needed.
The one thing to remember
A failed screenshot test is a conversation opener. The diff image is the message. Once the agent can read the diff, the loop closes itself.