Loop Engineering

Loop Engineering Guide: build safe autonomous agent loops

Loop engineering is the layer above prompts, context, and harnesses: scheduled coding-agent loops that discover work, hand it to isolated agents, verify results independently, persist state, and stop at human gates.

Based on the Loop Engineering orange book
Generator + evaluator separated
Five moves, six parts, clear guardrails
loop.yml
Scheduled trigger
cron: 0 8 * * 1-5

Disposable worktree

Every run gets a clean branch boundary for review.

Implementer + verifier

Agents do the work and independently check the result.

No merge, deploy, or issue close happens without a human gate.

Definition

What is loop engineering?

Loop engineering is the practice of designing systems that prompt coding agents automatically. Instead of hand-feeding one prompt at a time, you build a loop that discovers work, assigns it to agents, verifies the output, saves state, and runs again on a schedule or trigger.

The core shift

The old workflow is: you prompt the agent, wait, inspect, then prompt again. The loop-engineered workflow is: you design the system that prompts the agent, gives it tools and context, checks its work, and asks you only where judgment or irreversible action is required.

The four-layer stack

1

Prompt engineering

Write one good instruction for one model response.

2

Context engineering

Decide what should be in the model window right now.

3

Harness engineering

Arm one agent run with tools, permissions, recovery, and a done condition.

4

Loop engineering

Schedule the harness, coordinate agents, preserve state, and make the work repeat safely.

Loop engineering sits one floor above the harness: the harness runs once; the loop makes it run itself over and over.

Loop anatomy

How loop engineering turns prompts into autonomous agent systems

Loop engineering turns a repeated coding workflow into a transparent system: trigger it, triage the work, preserve state, isolate changes, verify output, and route the final decision to a person.

1

Scheduled triggers

Run daily, hourly, on CI, or when a queue changes.

2

Triage skills

Classify the task and choose the right playbook.

3

STATE.md for persistent loop memory

Carry decisions, constraints, and prior attempts forward.

4

Worktrees for isolated agent execution

Give the agent an isolated branch and filesystem.

5

Implementer agents for code changes

Make the smallest useful change with local context.

6

Verifier agents for independent review

Run tests, inspect diffs, and challenge assumptions.

7

Human review gates

Open a PR, request review, or ask before acting.

Transparent by design

The loop leaves artifacts: branch names, logs, test output, summaries, PR links, and state updates a maintainer can audit.

Measurable readiness

A loop is production-ready when it can show success rate, review burden, rollback path, and the exact point where automation stops.

The loop cycle

The five moves of an agent loop

Every useful loop has the same skeleton. Remove one move and it either stops, spins uselessly, or creates unreviewed risk.

01

Discovery

The loop finds work on its own from CI, issues, commits, inboxes, analytics, docs, or queues instead of waiting for a human to list tasks.

02

Handoff

The loop packages one bounded task with context, constraints, and an isolated workspace so an agent can act without stepping on other work.

03

Verification

A separate verifier checks tests, diffs, requirements, and failure modes. The agent that generated the change should not grade its own homework.

04

Persistence

The result, decision, run log, and next state are written somewhere durable: a PR, issue, state file, board, or database.

05

Scheduling

A timer, webhook, CI event, or queue trigger makes the loop run again. Automation is what turns one agent run into a real loop.

A loop is not just repeated prompting. It is discovery plus handoff plus verification plus persistence plus scheduling, with evidence left behind for a maintainer to audit.

Primitives

Six components of a production loop engineering system

The orange book maps loop engineering to six concrete parts: automations, worktrees, skills, connectors, sub-agents, and memory. Together they turn a clever prompt into an operating system for agent work.

Automations and scheduling

Cron jobs, webhooks, CI triggers, cloud routines, and queue events wake the loop without a human pressing go.

Worktrees for isolated agent execution

Each agent gets its own branch and filesystem boundary so parallel fixes do not collide or hide each other's changes.

Skills as reusable engineering judgment

A skill stores the reusable triage and decision logic. Update the skill once instead of pasting a giant instruction block into every schedule.

Connectors and MCP integrations

Connectors give the loop reach: issues, GitHub, CI, docs, Slack, databases, calendars, and PR systems become part of the chain.

Generator and evaluator sub-agents

Split the writer from the critic. One agent creates; another agent, ideally with different instructions or a different model, tries to prove it wrong.

State and memory on disk

The agent forgets; the repo does not. Durable state files, logs, PRs, and boards carry context across turns and days.

Patterns

Loop engineering patterns for software teams

These are practical loops that fit Grok, Claude Code, Codex, GitHub Actions, or a custom runner because the architecture is tool-agnostic.

PR babysitter loop

Watch stale pull requests, summarize blockers, rerun checks, and nudge the right owner before context disappears.

Daily triage loop

Scan new issues each morning, label urgency, find duplicates, and prepare a short queue for human approval.

CI sweeper loop

Detect repeated failures, isolate flaky tests from real regressions, and open a minimal diagnostic PR.

Post-merge cleanup loop

After a merge, remove temporary flags, update docs, close related tasks, and verify the repository is quiet.

Dependency sweeper loop

Group safe upgrades, run compatibility checks, and escalate only packages that need human tradeoff decisions.

Changelog drafter loop

Turn merged work into release notes with links, scope, risk, and migration notes ready for review.

Risks

Risks and costs of unattended agent loops

A loop that can work by itself can also make mistakes by itself. SEO fluff would hide that; real loop engineering puts the risks on the page and designs around them.

Verification debt

The loop opens more output than humans have verified. Errors can accumulate quietly behind a green-looking automation dashboard.

Guardrail: install an independent evaluator and keep a human review gate before merge, deploy, billing, deletion, or issue closure.

Comprehension rot

The codebase changes faster than your mental model. You still own the system, but you no longer understand what the loop shipped.

Guardrail: review summaries, read representative diffs, and require state updates that explain why the code changed.

Cognitive surrender

The loop becomes convenient enough that you stop having an opinion and accept whatever the agent hands back.

Guardrail: keep explicit decision points where a human must choose, reject, or redirect the loop.

Token blowout

Retries, fan-out, long context, and parallel agents can turn a small automation into an unpredictable bill.

Guardrail: set budget caps, retry limits, model tiers, timeout rules, and escalation thresholds before production.

Readiness

Loop engineering safety: observability, reversibility, and human gates

The goal is not an agent that acts mysteriously. The goal is a narrow automation loop with clear permissions, recorded state, independent verification, and a final human decision.

Safety rails

Scope tools, branch writes, and destructive actions. A loop should know exactly where it is allowed to operate.

Observability

Emit run summaries, test output, changed files, prompt inputs, and links to the artifacts reviewers need.

Readiness metrics

Track success rate, time saved, review edits, false positives, failure classes, and rollback frequency.

Tool-agnostic architecture

Keep the loop design portable across Codex, Claude Code, Grok, GitHub Actions, and MCP-capable tools.

Human approval before merge
Independent verifier step
State updated after every run
Clear rollback or discard path

Start here

First loop engineering checklist

A beginner loop should be small, useful, and boring. Start read-only, add write access after it proves value, then tighten the evaluator before scaling.

Discovery source

What does the loop read on a timer: CI failures, issues, commits, support tickets, analytics, or docs?

State file

Where does cross-round memory live: STATE.md, a board, an issue comment, a database row, or a run log?

Independent evaluator

Who says no: a second agent, a test suite, a deterministic gate, or a human reviewer?

Isolation

Does each parallel agent get a worktree, branch, sandbox, or permission boundary?

Token and retry cap

Who stops runaway loops, repeated failures, expensive models, or oversized context windows?

Human review point

Which exact step pauses for a person before merge, deploy, close, delete, charge, or publish?

Loop engineering FAQ

Direct answers for people searching what loop engineering is, how coding-agent loops work, and how to build one safely.

Start building safe autonomous coding-agent loops

Use automation to remove repetitive prompting, not judgment. Start with one narrow recurring workflow, add state, isolation, verification, and a human gate, then expand only when the loop earns trust.