Stop Babysitting Your AI Agents
Give agents a way to check their own work, run several in parallel, and let them loop in the background — the way the Claude Code team works internally.
TL;DR
If you spend half your day staring at Claude waiting for it to finish, or playing QA on whatever it just produced, you're using it wrong. Give the agent a way to check its own work, run several in parallel, and let it loop in the background. This is how the Claude Code team uses it internally.
Based on Sid Buddhisara's talk "Stop babysitting your agents" (Sid is a founding engineer on Claude Code).
The core idea
Most of our tooling — linters, type checkers, compilers, IDEs — was built for humans to be faster. But humans aren't writing most of the code anymore. Agents are. So the question to keep asking is:
What does an agent need from your codebase that a human takes for granted?
The talk lays out three things that build on each other:
- Verification — teach Claude to check its own work.
- Multi-Claude — once Claude is reliable, run several at once.
- Background loops — take your keyboard out of the hot path entirely.
1. Verification — give Claude a feedback loop
Think about how you verify your work on a normal feature:
Write code → build it → run it → check the side effects (UI, logs, DB) → run unit tests → deploy
That exact playbook works for Claude too. The only thing missing is giving it the tools and instructions to actually do each step. When Claude can run your build, your tests, hit your dev server, tail your logs, and read your DB — it stops asking you whether it worked. It just finds out.
Practical things to set up first (the table-stakes Sid mentions):
- A high-quality
CLAUDE.md— the single highest-leverage thing you can do. - Connect your real tools (Slack, Jira/Linear, BigQuery, Datadog, etc.) — "if it's useful for you, it's useful for Claude."
Then put the whole loop in one prompt:
implement the validation function and run tests until they passClaude writes the code, runs the tests, reads the failures, fixes them, runs again. No babysitting.
2. Multi-Claude — parallelize the boring parts
Once Claude can verify its own work, you can trust it to run unsupervised — which means you can run several in parallel on separate branches/worktrees. One Claude refactoring auth, another writing tests for the new endpoint, a third chasing a flaky CI failure. Worktrees are first-class in Claude Code for exactly this.
3. Background loops — /loop and Routines
This is the part most people haven't seen yet. /loop turns Claude Code from "thing I prompt" into "thing that's already working when I check back in."
/loop — quick polling in a session
Requires Claude Code v2.1.72+. Tasks are session-scoped (they run while your session is open and survive --resume).
The pattern is simple: interval + prompt.
# Address review comments and auto-rebase every 5 minutes
/loop 5m address any new review comments on my PR and rebase if needed
# Watch a deploy
/loop 5m check if the deployment finished and tell me what happened
# Let Claude pick the interval based on what it sees
/loop check whether CI passed and address any review comments
# Re-run a packaged workflow on a schedule
/loop 20m /review-pr 1234
# Bare /loop — runs the built-in maintenance prompt
# (continues unfinished work, tends the PR, runs cleanup passes)
/loopYou can also drop a loop.md file at .claude/loop.md (project) or ~/.claude/loop.md (user) to set a default prompt for bare /loop — handy for "keep this release branch healthy" type jobs.
Recurring loops auto-expire after 7 days. Press Esc to stop one that's waiting.
One-shot reminders
Just ask in plain English — no /loop needed:
remind me at 3pm to push the release branch
in 45 minutes, check whether the integration tests passedRoutines — when the loop should outlive your session
/loop is session-scoped. If you close Claude Code, it stops. For overnight or week-long automation that should run without your machine on, use Routines — they run on Anthropic's cloud infrastructure, no open session required, and can also trigger on GitHub events or via API.
Quick rule of thumb:
| Want it to… | Use |
|---|---|
| Poll while I work | /loop |
| Run when my laptop is on, but no session needed | Desktop scheduled tasks |
| Run reliably even if I'm offline | Routines |
What to try this week
Pick one. Don't try to do all of these at once.
- Audit your
CLAUDE.md. If it's empty or stale, fix that first. Nothing else matters as much. - Add a verification step to your next prompt. Instead of "implement X", say "implement X and run the tests until they pass". Notice how much less you have to follow up.
- Try
/loop 10m /your-existing-commandon a PR you'd normally babysit (CI checks, review comments, rebases). Let it run for an afternoon and see what it gets done. - Set up one Routine for something recurring — a daily dependency check, a weekly stale-PR report, a nightly flaky-test hunt.
Why this matters
The bottleneck used to be how fast you could type. Then it was how fast the model could think. Now, for a lot of work, the bottleneck is you waiting on the model. Verification + parallelism + background loops is how we stop that being true.