Engineer Your Agents Like Systems


I have been watching teams run into the same failure mode with agentic operations. They adopt an agent, point it at something in the reliability loop, and then treat it as a black box they trust. The reasoning is understandable: if you picked the model carefully and wired it up to the right tools, you did the work. Let it run.

That is not how you would treat anything else in production. And it is not how this ends well.

The scaffolding post I wrote a few weeks ago made the case that the model is a runtime, not the tool. The tool is everything you build around it. If the scaffolding is what matters, then the scaffolding is what you have to engineer. That means the agent itself gets the same treatment you would give any production system.

SLOs on the agent’s behavior. You can define what success looks like: correct action rate, false escalation rate, latency on triage. If you cannot measure it, you are not running operations, you are running a hope. Pick two or three signals and track them the same way you track your services.

Gated autonomy you can roll back. You would not push a code change to every host at once without a staged rollout. The same logic applies to how much you let an agent act on its own. Start with read-only. Add alerting actions. Add remediation in a narrow scope. Expand only when the previous stage has shown stable behavior. Every level of autonomy should have a way to pull it back.

Reasoning failures as first-class events. When a service fails, you write a postmortem. When an agent takes a wrong action or misclassifies a situation, most teams write it off as a model limitation and move on. That is the wrong frame. The wrong action is a failure. It has a cause, and that cause is findable. You review the inputs it saw, the tools it called, and the decision it made. You treat it like a bug, not a surprise.

None of this is exotic. It is the SRE discipline that teams already know, applied one layer up. The agent is not magic. It is a system. Engineer it accordingly.