Steve Kinney

Post-Merge and Post-Deploy Validation

CI going green is a great feeling. It is also not the same thing as “the deployment is healthy.”

I’ve shipped enough “perfectly green” pull requests that still broke on the real environment to stop pretending otherwise. Wrong environment variables. Wrong redirect URL. Wrong cookie domain. Background job didn’t boot. CDN served the old asset manifest. Database migration passed and the route still exploded on first real traffic. You only need a few of these before “CI passed” stops sounding like a victory speech.

So, this module is about the loop after the merge gate.

Prerequisite

This lesson assumes you’ve already wired CI as the Loop of Last Resort. CI is the last pre-merge gate. Post-deploy validation is the first post-merge one.

The shape of the loop

If you’re using GitHub Actions environments or an equivalent deployment system, the loop should look like this:

graph LR
  Merge["Merge to main"] --> Deploy["Deploy to preview or production"]
  Deploy --> Smoke["Run post-deploy smoke check"]
  Smoke -->|Pass| Monitor["Watch health window"]
  Smoke -->|Fail| Rollback["Rollback or stop the rollout"]
  Monitor -->|Healthy| Done["Done"]
  Monitor -->|Errors spike| Rollback

That’s the whole lesson in a diagram.

The key idea: deployment is not the end of automation. Deployment is the beginning of the next loop.

What post-deploy validation is actually proving

This layer is not rerunning your whole test suite for the drama of it. It is proving a smaller, sharper thing:

  • the deployed URL is reachable
  • the core route renders
  • authentication still works if the route depends on it
  • the primary read path and one primary write path still function
  • the environment-specific wiring is correct

That is a smoke check, not a second CI pipeline.

For Shelf, the post-deploy smoke check can stay tiny:

  • load the home page or shelf page
  • confirm the critical heading renders
  • confirm the seeded data or a known empty state shows up
  • if auth is required, prove the storage state or login bootstrap still works against the deployed URL

Small is a feature here. If the post-deploy check takes ten minutes, nobody trusts it as a stop-ship signal.

What the smoke spec actually looks like

A post-deploy smoke test is a regular Playwright spec in a dedicated directory so it does not run as part of the normal end-to-end suite. The only thing that makes it “post-deploy” is that it reads its target URL from an environment variable the deployment workflow injects, so the same file can run against a preview, a staging environment, or production without any code changes.

// tests/smoke/post-deploy.spec.ts
import { expect, test } from '@playwright/test';

const smokeBaseUrl = process.env.SMOKE_BASE_URL ?? 'http://127.0.0.1:4173';

test.use({ baseURL: smokeBaseUrl });

test('home page renders and exposes sign in', async ({ page }) => {
  await page.goto('/');

  await expect(page.getByRole('heading', { level: 1 })).toBeVisible();
  await expect(page.getByRole('main').getByRole('link', { name: 'Sign in' })).toBeVisible();
});

Two assertions. That’s it. The shipped Shelf version lives at tests/smoke/post-deploy.spec.ts and is invoked through a dedicated playwright.smoke.config.ts so the default test:e2e suite does not accidentally pick it up.

The deployment workflow that runs the smoke check

The deployment workflow is where the smoke check gets its teeth. A minimal GitHub Actions shape:

# .github/workflows/deploy.yml
name: Deploy

on:
  push:
    branches: [main]
  workflow_dispatch:

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
      - run: npm ci --ignore-scripts
      - run: npm run build
      - name: Deploy
        id: deploy
        run: |
          # Replace with your real deploy command. The important part
          # is that this step prints the deployed URL so the smoke job
          # can read it from the job output.
          echo "url=https://shelf-preview.example.com" >> "$GITHUB_OUTPUT"

  smoke:
    needs: deploy
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
      - run: npm ci --ignore-scripts
      - run: npx playwright install --with-deps chromium
      - name: Run post-deploy smoke check
        env:
          SMOKE_BASE_URL: ${{ needs.deploy.outputs.url }}
        run: npm run test:smoke
      - name: Upload report on failure
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: smoke-report
          path: playwright-report/
          retention-days: 7

Two jobs. The first job does whatever your host needs to deploy and writes the deployed URL into $GITHUB_OUTPUT so the next job can read it. The second job installs Playwright, runs npm run test:smoke with SMOKE_BASE_URL pointing at that URL, and uploads the Playwright report if the check fails. If the smoke job goes red, the deploy either gets rolled back automatically or the workflow posts a stop-ship signal.

No deploy target yet?

If your Shelf clone doesn’t have a hosted deploy target, you can still practice the same loop locally: run npm run build && npm run preview -- --host 127.0.0.1 --port 4173 in one terminal and SMOKE_BASE_URL=http://127.0.0.1:4173 npm run test:smoke in another. Record the gap in your runbook and wire up the real deploy workflow when you have a target.

Preview targets count

You do not need a full production rollout to teach this loop well.

A deployment preview, a staging target, or even a build-and-preview job that exposes a stable URL is enough to practice the shape:

  • deploy candidate artifact
  • run smoke check against the deployed URL
  • upload artifacts on failure
  • decide whether to proceed

That is why I prefer teaching this on a preview target first. The loop is the point. The host is an implementation detail.

The health window after the smoke test

Passing the smoke test is necessary. It is still not the whole story.

There is usually a short window after deploy where you want at least one more signal:

  • error rate stayed flat
  • request latency did not jump
  • logs are not filling with obvious exceptions
  • jobs and background workers are still healthy

I am not asking you to build a full observability stack in this course. I am asking you to define what would trigger a rollback. If you do not define that rule ahead of time, every bad deploy becomes an argument instead of a decision.

Rollback rules should be written before you need them

This is one of those delightfully unglamorous engineering habits that saves an unreasonable amount of stress.

Write the rollback trigger down:

  • smoke check fails: rollback
  • critical route 500s: rollback
  • authentication broken: rollback
  • error spike above agreed threshold in first N minutes: rollback or pause

Make the rule specific enough that an agent or a human can follow it without theater.

Ambiguous rollback rules

“If it looks bad, we should probably revert” is not a rollback policy. That is a future argument in a pull request thread while users are already feeling the problem.

What the agent needs here

The agent needs the exact same thing it has needed all day:

  • a named command to run
  • a target URL
  • artifacts when the check fails
  • a documented stop condition

If the deployment workflow says “run npm run test:smoke against SMOKE_BASE_URL and upload the Playwright report on failure,” the agent can work with that. If the workflow says “manually look at the deploy and decide if vibes are good,” congratulations, you have rebuilt yourself as the relay.

What goes in CLAUDE.md

## Post-merge and post-deploy

- A green pull request is not the end of the loop. After merge or after
  a deploy preview is available, run the post-deploy smoke check against
  the deployed URL.
- Use the named smoke-test command and the deployment URL provided by the
  workflow or environment.
- If the smoke check fails, treat that as a stop-ship signal. Do not
  wave the deploy through in the summary.
- If rollback conditions are met, recommend rollback explicitly instead
  of describing the failure passively.

Success state

You have this loop when:

  • a deployment or preview target exposes a stable URL
  • a named smoke test runs against that URL automatically
  • rollback triggers are written down before the deploy fails

The one thing to remember

CI proves the change was mergeable. Post-deploy validation proves the deployment itself is healthy. Those are different claims, and if you only automate the first one, the second one is still your problem.

Additional Reading

Last modified on .