proactive-agents/content/posts/devin-auto-triage.mdx at main · AgentWorkforce/proactive-agents

title	Devin auto-triage and the always-on oncall
summary	Devin's auto-triage monitors Slack, Sentry, and PagerDuty around the clock and investigates incidents on its own. Here's what the architecture reveals.
date	2026-05-19
lastModified	2026-05-19
accent	rose
dropcap	true

Cognition announced auto-triage yesterday, and it's the most complete production example of a proactive agent I've seen built specifically for incident response. Not a chatbot you ask about errors. Not a cron job that summarizes your alerts every morning. An always-on agent that watches your Slack channels, your Sentry dashboard, your PagerDuty incidents, and your Linear backlog simultaneously, investigates what it finds, and either opens a pull request or posts an investigation summary with next steps.

I've been writing about what it takes to make agents proactive for weeks now. Most of the products I've covered have one or two of the three primitives wired up. Pulse has a clock but barely a listener. CodeRabbit has a strong listener surface but runs event detection on a thirty-minute cron. Auto-triage claims all three, and the architecture underneath is worth taking apart.

The broadest listener surface on a coding agent

The signal surface is what caught my attention first. Auto-triage connects to Slack, Linear, GitHub, Sentry, Datadog, PagerDuty, and custom webhooks. When a bug report lands in a Slack channel, or a Sentry alert fires, or a Linear ticket transitions to a "Bug" label, auto-triage picks it up and starts investigating.

That's seven input sources, each with its own payload format, its own authentication model, its own delivery semantics. We wrote about the webhook tax earlier in this series: adding one webhook provider to a proactive agent takes roughly a sprint. Four providers takes most of a quarter. Seven is a serious infrastructure investment, and it explains why most agent products settle for one or two integrations at launch.

The investigation flow, based on Cognition's documentation, works something like this: a bug report comes in ("Pro users seeing 'undefined' on the billing page since Friday's deploy"), and auto-triage pulls error logs from Datadog, queries the read replica to verify data state, traces the breaking commit in git history, writes a fix with a regression test, and opens a PR. When the fix isn't obvious, it posts an investigation summary instead.

The hardest part of monitoring seven sources isn't connecting them. It's recognizing that a Slack thread, a Sentry alert, and a Linear ticket are all describing the same incident. Auto-triage claims to deduplicate across sources and link related threads to earlier investigations. If that works reliably, it solves a problem that most alerting systems still punt to the human oncall.

The listener here is genuinely always-on, not scheduled. Cognition explicitly contrasts this with webhook-triggered automation (an implicit comparison to Cursor's recently launched automations), positioning auto-triage as a persistent agent that maintains context rather than a stateless function that fires and forgets.

The manager and the fleet

The architecture underneath auto-triage uses what Cognition calls a manager-agent pattern. One agent maintains long-running context across all investigations: what incidents have been seen before, which ones are related, how the team typically handles certain categories of bugs, who owns which services. When a new signal arrives, the manager agent can spin up sub-Devins to investigate in parallel, each working in its own sandboxed environment.

This is a different model from a single-agent loop. A single agent handling triage would have to serialize investigations, context-switching between incidents and holding everything in one increasingly long conversation. The manager pattern separates coordination from execution. The manager remembers; the sub-agents do the digging.

The long-term memory is the piece that interests me most. Auto-triage remembers prior findings, recurring issues, and how your team handled specific bug categories. When the same class of error appears again, the agent already knows where to look and who to assign. Over time, the routing should get better as the system learns team preferences.

This memory architecture addresses something I wrote about in what makes proactive agents hard: the cold-start problem across runs. Most agents forget everything between sessions. They wake up, do their work, and lose all context. An agent that accumulates institutional knowledge about your incidents, your codebase, and your team's preferences is qualitatively different from one that cold-starts every time a Sentry alert fires.

Whether the memory actually works well at scale is a different question. The gap between "remembers prior findings" in a product description and "reliably correlates a Tuesday Slack thread with a Thursday Sentry alert about the same root cause" in production is wide. But the architectural commitment is right.

Playbooks as a judgment mechanism

The detail that surprised me is how teams teach auto-triage what to do. In the Linear integration, you don't just point the agent at a bug label and say "fix it." You write a triage playbook: a concrete sequence of investigation steps that describes how a human engineer would approach the problem. Search the codebase for relevant files, check git history for recent changes, look at error patterns in the monitoring tool, verify the data state, and so on.

The playbook isn't a system prompt. It's an operational runbook, the kind of document that experienced oncall engineers keep in their heads or buried in a Confluence page. By making it explicit and machine-readable, teams are encoding their institutional triage knowledge into a format the agent can follow.

From the documentation, the workflow uses edge detection on Linear label transitions, so it only fires on newly triaged tickets rather than retroactively processing the entire backlog. Playbooks can be chained: a "Clear Fix" label can trigger a separate fix playbook. The investigation results sync back to the Linear ticket as structured output.

We've been writing about [the judgment problem](/posts/why-proactive-is-hard/) as an open research question. The [PARE benchmark](/posts/forty-two-percent/) showed that proactive agents succeed only 42% of the time, and the models that score highest are the ones that stay quiet when uncertain. Auto-triage takes a different approach to judgment: rather than relying on the model to figure out when and how to investigate, the playbook tells it exactly what steps to follow. The agent's judgment is bounded by human-authored process. Whether that's a feature or a limitation depends on how predictable your incidents are.

The playbook pattern also explains why Cognition positions auto-triage as learning over time. The playbooks are a starting point, but the memory layer accumulates which steps actually led to successful resolutions and which were dead ends. In theory, the investigation gets sharper with each incident.

The cost question

Always-on monitoring means always spending. Devin's pricing starts at $20/month for the Pro tier, which includes a usage quota measured in ACUs (Agentic Computing Units, roughly 15 minutes of active autonomous work per unit). Pay-as-you-go rates run $2.00–2.25 per ACU depending on plan. Multiple reviewers report that actual monthly costs climb to $300–500 once the agent is actively investigating incidents.

We covered the economics of always-on agents in what proactive agents actually cost. The pattern is consistent: the headline price gets you in the door, and the metered usage is where the real bill lives. For auto-triage specifically, every investigation spins up compute, model inference, and networking. A team with high alert volume could burn through the included quota in a few days.

There's also what developers are calling the "babysitting tax." Reviews of Devin generally report 10–20 minutes of overhead per task for prompt crafting, session monitoring, and reviewing output. The code Devin generates has a reported defect rate 1.5–2x higher than senior-developer-authored code. Automation Atlas rates Devin 7.5 out of 10, strong for well-scoped repetitive tasks but weaker for ambiguous product work.

For triage specifically, the economics could work differently than for general coding. A well-scoped investigation is closer to Devin's sweet spot than an open-ended feature build. If the playbook bounds the investigation steps and the agent consistently surfaces useful summaries, the cost of automated first-response could compare favorably to the cost of pulling a senior engineer off their current work to look at every alert.

[Hari Subbaraj at Modal](https://modal.com) offered a customer perspective: "Devin Automations feels like a step forward from other auto-triage tools we've tried. It monitors our channel, works with our codebase and observability stack, and comes back with useful investigation." The key qualifier: "useful investigation." If the investigations are useful even when they don't produce a fix, the agent earns its cost as a first-responder, not as an autonomous fixer.

Where auto-triage sits in the landscape

Through the three primitives:

	Auto-triage	CodeRabbit Agent	Junior (Sentry)
Clock	Always-on, continuous	30-min cron	On @-mention
Listener	Slack, Linear, GitHub, Sentry, Datadog, PagerDuty, webhooks	Slack + 12 tool integrations	Slack + MCP plugins
Inbox	Slack threads, Linear tickets, GitHub PRs	Slack threads, PR comments	Slack threads
Memory	Persistent across investigations	Per-repo learnings	Markdown skills (no persistent memory)
Scope	Incident triage + fix	Code review + expanding	General-purpose

Auto-triage has the most complete primitive coverage of any coding agent I've analyzed in this series. The combination of continuous monitoring, broad signal surface, persistent memory, and bounded investigation through playbooks is architecturally sound. CodeRabbit's agent has breadth across tools but still runs detection on a scheduled loop. Junior is composable and open-source but reactive by design, responding to @-mentions rather than watching for incidents on its own.

The real test is whether the memory and deduplication hold up under production alert volume. An oncall engineer processing fifty alerts a week builds up contextual knowledge that's difficult to replicate in a language model: which services are flaky, which alerts are noise, which error patterns indicate a real regression versus a transient blip. Auto-triage's long-term memory is an attempt to accumulate that knowledge programmatically. The architecture is pointed in the right direction. Whether it gets there is something we'll only know from teams running it for months, not days.

Russell Kaplan, Cognition's co-founder, framed auto-triage as part of a broader shift: "Having to manually prompt your coding agent to do work will soon feel like a UX bug." I think he's right about the direction. The harder question is whether any specific agent has accumulated enough context about your systems, your team, and your incident patterns to earn that always-on responsibility. Auto-triage is the most serious attempt I've seen so far. Ask me again after a team has run it through a full quarter of production alerts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The broadest listener surface on a coding agent

The manager and the fleet

Playbooks as a judgment mechanism

The cost question

Where auto-triage sits in the landscape

FilesExpand file tree

devin-auto-triage.mdx

Latest commit

History

devin-auto-triage.mdx

File metadata and controls

The broadest listener surface on a coding agent

The manager and the fleet

Playbooks as a judgment mechanism

The cost question

Where auto-triage sits in the landscape