Writing
-
Thoughts on AI Safety
A cautious, nuanced case for AI optimism: why safety, interpretability, bias, and alignment matter as much as raw capability.
-
Using Codex from Claude: Getting a Second Opinion from a Different Model Family
I wired OpenAI's Codex into Claude Code as a consulting subagent—a different training lineage I can tap for architecture calls, stuck bugs, and security reviews. Here's how the `codex-advisor` agent works and why it isn't actually an MCP server anymore.
-
Multi-Model Agentic Coding: Letting the Other Model Do the Typing
The follow-up to my Codex-as-advisor setup. Same second model family, opposite stance: instead of asking Codex for an opinion, I hand it the implementation, let it work in an isolated worktree, and run an approval loop where I own the diff and Codex never gets to commit.
-
Sentinels: The Quiet Power of a Touched File
How I use sentinel files to gate the risky moves my coding agents make—exiting plan mode, opening a pull request, addressing review feedback, backing off a rate-limited model. The whole mechanism is a file on disk and a hook that checks for it.
-
Claude Ultraplan: Planning in the Cloud, Executing Wherever
Ultraplan hands the planning phase of a coding task off to a Claude Code on the web session running in plan mode, then lets you review it in the browser and decide where to execute. Here's what it actually changes about your workflow, what it costs, and where the sharp edges are.
-
Playwright vs. Chrome DevTools MCP: Driving vs. Debugging
Playwright and Chrome DevTools both ship official tools for letting AI agents drive a browser, but they're optimized for different jobs. Here's how Playwright CLI, Playwright MCP, and Chrome DevTools MCP actually fit together, and how to pick between them without guessing.
-
Entering the Mind of Ralph Wiggum
A while-true loop, a prompt file, and a clean context window on every iteration. The Ralph Loop is the dumbest-sounding technique that actually works—and the reason it works will change how you think about programming LLMs.
-
Memory Systems for AI Agents: What the Research Says and What You Can Actually Build
The old short-term/long-term taxonomy doesn't capture what modern agent memory systems actually do. A new three-axis framework—Forms, Functions, and Dynamics—maps the design space from flat vector stores to RL-driven memory management. Here's what the research says and what you can build today.
-
Temporal's Developer Skill Is a Promising First Draft
Temporal shipped one of the first major infrastructure vendor agent skills. The diagnosis is right and the architecture is sound. The execution has some fixable gaps.
-
The Anatomy of an Agent Loop
Every major AI agent runs the same core loop. The 6-line version is easy. The production-hardened version—with context compaction, loop detection, cost budgets, and graceful termination—is where things get interesting.
-
Agent Skills, Stripped of Hype
Agent skills are not a new capability—they're a context management strategy. Their value comes from routing and progressive disclosure, not from smarter prompts.
-
Designing a Build System That Runs Untrusted Code
A deep technical walkthrough of what it takes to design a build system that securely executes arbitrary customer repositories and turns build output into deployable artifacts—covering the pipeline, the security model, the architecture, and the operational realities.
-
Designing an AI Gateway and Durable Workflow System
A two-layer architecture for production AI systems: a gateway that abstracts providers, enforces policies, and tracks costs, paired with a durable workflow engine that makes long-running agentic tasks survive failures, pause for human approval, and replay deterministically.
-
MCP Apps and the Missing Middle of AI Tooling
MCP servers return data. MCP Apps let them ship a UI alongside that data—so the tool author, not the client, decides how results look.
-
My Ridiculous AI-Assisted Development Workflow
A walkthrough of the system I use to ship code with AI agents—from planning in Linear to worktrees, linting gauntlets, and a small army of code review bots.