An interactive session has a human in the loop — watching, nudging, aborting. A headless run does not. The same agent that is pleasant and corrigible in a live terminal becomes a different animal when it executes at 3am against production branches with no one watching. This chapter is about that difference and the design patterns that make the difference survivable.

Representation

Every agent invocation has three axes: who authorizes it, who observes it, and what it can touch. In an interactive session these are collapsed — you are authorizing, observing, and bounding in real time. In a pipeline run, each axis is a separate design decision that must be specified in advance.

Interactive mode is the default mental model for most practitioners, which is why the first experience of moving an agent into CI is disorienting. The agent that paused for your approval on destructive actions now either fails closed (refuses) or — if misconfigured — fails open (runs unconfirmed destructive actions). The agent that asked clarifying questions now has no one to ask. The agent that self-corrected based on your feedback has no feedback signal.

The practitioner’s instinct — this is just the agent I already use, with one less window — is wrong in the same way that treating ssh as “just a terminal that’s far away” is wrong. The distance changes the failure modes. A mistake in an interactive session is visible and cheap; a mistake in a headless run may ship to production before anyone sees it.

Two shifts follow from the interactive/headless distinction.

The first: permissions move from dynamic consent to static policy. The interactive agent asks “may I run this bash command?” and the human answers in the moment. The headless agent either has permission declared in advance or does not have it. The expressive middle ground — “yes, but with this modification” — disappears.

The second: observability moves from synchronous to asynchronous. The interactive agent’s reasoning is visible as it happens; a mistaken turn can be interrupted mid-sentence. The headless agent’s reasoning is only visible in the log it emits, read later, after the action has already landed. Retrospective logs have to be structured for this; free-text traces that are fine to skim live are nearly unreadable in a CI artifact viewer.

Operation

Three deployment shapes cover nearly all practical automation: one-shot batch runs, CI-triggered agents, and scheduled or event-driven agents. Each has a distinct permission and observability profile.

Shape 1: one-shot batch runs

The simplest case — a script or Makefile target invokes the agent non-interactively to do one bounded task, then exits. Typical uses: scripted refactors across many files, generating boilerplate from a spec, updating a dataset of docs. No CI, no schedule — a human runs it on demand and reads the output.

The three tools converge on a -p / print / prompt flag for non-interactive mode and diverge on how permissions are granted. The underlying question each tool answers in its own vocabulary: what is the agent allowed to do without asking, given no one is here to ask?

Shape 2: CI-triggered agents

The agent runs inside CI (GitHub Actions, GitLab pipelines, Jenkins) in response to a repository event: a PR opened, an issue mentioned, a label applied. It has no UI; its output is posted back as a PR comment, a review, or a new commit.

CI-triggered agents surface two failure modes that rarely appear interactively:

Context starvation. The CI runner does not know what you know. It has the repo, it has the PR diff, it has the comment thread — it does not have your memory of last week’s discussion about why that file looks weird. The briefing doc (see Ch 7) is the primary answer to this: the same file that bootstraps an interactive agent bootstraps the CI agent.

Credential leakage. The agent in CI runs with real credentials — a GitHub token with write access to the repo, possibly deploy tokens, possibly cloud keys. If the agent can be persuaded to dump those credentials into a log, a PR comment, or a generated file, the leak is permanent. The mitigation is a scoped-token discipline: CI runs use tokens with the narrowest possible permissions, logs are scrubbed, and prompts from external contributors are treated as untrusted input.

Shape 3: scheduled or event-driven agents

The agent runs on a cron schedule or in response to an external event (webhook, queue message, file drop). Nothing in the repo triggered it; the agent wakes up, reads some state, decides what to do, and acts.

This is the most powerful shape and the one with the highest variance in outcomes. Examples that work well: nightly stale-branch cleanup, weekly dependency-update PRs, monitoring-alert triage. Examples that go wrong: agents that rewrite arbitrary files on every run (drift), agents that retry a failing task forever (runaway cost), agents that notice their own prior runs and modify them recursively (meta-chaos).

Structured logging for headless runs

Interactive sessions let you skim the trace in real time and abort on anything weird. Headless runs require structured logs that survive into CI artifact viewers and can be queried after the fact. The minimum viable log has four things per run: the prompt, the final output (and all intermediate actions), the exit status, and the resource usage (tokens, wall-clock, tool calls). A human can reconstruct what happened from those four.

The observability stack

A production automation setup needs three layers beyond the agent itself: a trigger layer (CI event, schedule, webhook), an execution layer (the headless agent run), and an output layer (PR comment, commit, Slack message, dashboard). Each layer can fail independently. The trigger may fire twice; the execution may partially succeed; the output may be malformed. Treat the stack the way you’d treat any distributed system: idempotent handlers, structured logs, bounded retries.

Evolution

Automation is where the field is changing fastest. Three axes worth tracking.

Emerging: agent-as-service deployment. Several practitioners are running agents as long-lived services rather than one-shot invocations: a daemon that receives queued tasks, executes them, and posts results. This blurs the line between “automation pipeline” and “internal platform.” Expect purpose-built deployment patterns (containerized agents, IAM-scoped service accounts, per-agent telemetry) to crystallize over the next 12 months. The current DIY patterns work; a convergent shape has not yet formed.

Emerging: policy engines for agents. A recurring theme in enterprise deployments (see Ch 14) is the move from per-tool allow/deny lists to declarative policy engines — OPA-style rules that express “agents from repo X can touch paths matching Y but not Z.” None of the CLI agents ships this natively in 2026; early adopters roll their own wrappers. Vendor-first-class support is likely within a release cycle or two.

Emerging: differential trust for prompts. In CI-triggered pipelines, some prompts come from trusted sources (your team’s commits, your own comments) and some come from untrusted sources (external PR authors, forked repos). The tools treat both the same in 2026 — a prompt is a prompt. Expect differentiation here: tagged prompts, source-aware refusal heuristics, separate tool permission sets per trust tier. The exfiltration failure mode described in Recovery above is the forcing function.

Quick reference