Jeba Prince
← Writing
4 min readAI, Claude Code, Workflow

Stop Babysitting Your AI Agents

Give agents a way to check their own work, run several in parallel, and let them loop in the background — the way the Claude Code team works internally.

TL;DR

If you spend half your day staring at Claude waiting for it to finish, or playing QA on whatever it just produced, you're using it wrong. Give the agent a way to check its own work, run several in parallel, and let it loop in the background. This is how the Claude Code team uses it internally.

Based on Sid Buddhisara's talk "Stop babysitting your agents" (Sid is a founding engineer on Claude Code).

The core idea

Most of our tooling — linters, type checkers, compilers, IDEs — was built for humans to be faster. But humans aren't writing most of the code anymore. Agents are. So the question to keep asking is:

What does an agent need from your codebase that a human takes for granted?

The talk lays out three things that build on each other:

  1. Verification — teach Claude to check its own work.
  2. Multi-Claude — once Claude is reliable, run several at once.
  3. Background loops — take your keyboard out of the hot path entirely.

1. Verification — give Claude a feedback loop

Think about how you verify your work on a normal feature:

Write code → build it → run it → check the side effects (UI, logs, DB) → run unit tests → deploy

That exact playbook works for Claude too. The only thing missing is giving it the tools and instructions to actually do each step. When Claude can run your build, your tests, hit your dev server, tail your logs, and read your DB — it stops asking you whether it worked. It just finds out.

Write codeBuildRunCheckRun testsShip ✓pass↻ fail → fix → run again
The verification loop, running on its own. Claude writes the code, runs the tests, reads the failures, fixes them, and goes again — until they pass.

Practical things to set up first (the table-stakes Sid mentions):

  • A high-quality CLAUDE.md — the single highest-leverage thing you can do.
  • Connect your real tools (Slack, Jira/Linear, BigQuery, Datadog, etc.) — "if it's useful for you, it's useful for Claude."

Then put the whole loop in one prompt:

implement the validation function and run tests until they pass

Claude writes the code, runs the tests, reads the failures, fixes them, runs again. No babysitting.

2. Multi-Claude — parallelize the boring parts

Once Claude can verify its own work, you can trust it to run unsupervised — which means you can run several in parallel on separate branches/worktrees. One Claude refactoring auth, another writing tests for the new endpoint, a third chasing a flaky CI failure. Worktrees are first-class in Claude Code for exactly this.

Refactor authauth-refactorrunning…Write endpoint testsendpoint-testsrunning…Chase a flaky CI failureflaky-cirunning…
Three Claudes, three worktrees, all working at once — an auth refactor, endpoint tests, and a flaky-CI hunt. You supervise none of them.

3. Background loops — /loop and Routines

This is the part most people haven't seen yet. /loop turns Claude Code from "thing I prompt" into "thing that's already working when I check back in."

You set it once/loopevery 5mREADY WHEN YOU'RE BACK CI passed Comments addressed Rebased onto main
/loop takes your keyboard out of the hot path. It runs on an interval in the background and has results waiting when you check back in.

/loop — quick polling in a session

Requires Claude Code v2.1.72+. Tasks are session-scoped (they run while your session is open and survive --resume).

The pattern is simple: interval + prompt.

# Address review comments and auto-rebase every 5 minutes
/loop 5m address any new review comments on my PR and rebase if needed

# Watch a deploy
/loop 5m check if the deployment finished and tell me what happened

# Let Claude pick the interval based on what it sees
/loop check whether CI passed and address any review comments

# Re-run a packaged workflow on a schedule
/loop 20m /review-pr 1234

# Bare /loop — runs the built-in maintenance prompt
# (continues unfinished work, tends the PR, runs cleanup passes)
/loop

You can also drop a loop.md file at .claude/loop.md (project) or ~/.claude/loop.md (user) to set a default prompt for bare /loop — handy for "keep this release branch healthy" type jobs.

Recurring loops auto-expire after 7 days. Press Esc to stop one that's waiting.

One-shot reminders

Just ask in plain English — no /loop needed:

remind me at 3pm to push the release branch
in 45 minutes, check whether the integration tests passed

Routines — when the loop should outlive your session

/loop is session-scoped. If you close Claude Code, it stops. For overnight or week-long automation that should run without your machine on, use Routines — they run on Anthropic's cloud infrastructure, no open session required, and can also trigger on GitHub events or via API.

Quick rule of thumb:

Want it to…Use
Poll while I work/loop
Run when my laptop is on, but no session neededDesktop scheduled tasks
Run reliably even if I'm offlineRoutines

What to try this week

Pick one. Don't try to do all of these at once.

  1. Audit your CLAUDE.md. If it's empty or stale, fix that first. Nothing else matters as much.
  2. Add a verification step to your next prompt. Instead of "implement X", say "implement X and run the tests until they pass". Notice how much less you have to follow up.
  3. Try /loop 10m /your-existing-command on a PR you'd normally babysit (CI checks, review comments, rebases). Let it run for an afternoon and see what it gets done.
  4. Set up one Routine for something recurring — a daily dependency check, a weekly stale-PR report, a nightly flaky-test hunt.

Why this matters

The bottleneck used to be how fast you could type. Then it was how fast the model could think. Now, for a lot of work, the bottleneck is you waiting on the model. Verification + parallelism + background loops is how we stop that being true.