Agentic Healing
in Production 🩺
Jack McNicol • Founding Engineer @ SuperIT
Small team building our Sparky agent, FastAPI/React
The dream
Autonomously fixing issues at scale.
Flow state
Update the code so the agent will flow through the code well.
What is the new state of the world?
Read the sessions
Where did it go off track — then find its way back?
Show me recommendations to help the agent reach its goal faster. No -> Human review better
Back pressure
Belt and suspenders
PostTool checks, type checks, property, e2e, failure states, not coverage
Warnings = errors
A noisy build teaches the agent that warnings don’t matter.
Make every warning fail the build — a clean signal for the agent, and for you.
How we use it
Backwards?
Events, Schedules, Humans → Agents
Managing change.
Events
Sentry, Logfire, Linear bug, Slack message
Workflows to manage context, investigate, build RCA, RCA to solutions, RCA to proof
Event workflow
The agent’s hypothesis is only as good as the telemetry it has.
Triage → Discovery → RCA → Solution → Proof
Recurring
Cron-actions. Lowest urgency, widest blast radius.
Dependabot, Security scans, Pick up bug ticket. Schedules allow token cost planning.
The developer
Product growth done here, less bug fixing.
Review with skills
A skill is a reusable procedure: input → checklist → output.
Cheap deterministic rules first. ast-grep/custom. Once trusted, add to scheduler.
The PR is the guardrail
Let agents run wild behind it. Human review? Merge Queues.
Greptile · Cursor BugBot · Claude Code Review. Stack reviewers, don’t replace them. Tag a human required
Heal on trigger
Production event fires → agent investigates → posts findings in 1–2 minutes.
Logfire / PostHog webhook → parallel searcher sub-agents → report in Slack.
How we use it
It's Fixed!
Said Claude...
Reality: it changed some stuff.
Make it prove its work
Validating the agent’s work is the bottleneck. Compress it.
“Have the agent prove its work in the format that’s fastest for you to validate.” - A love letter to Pi | Lucas Meijer
Are you ready for the mythos wave?
Time to exploit is now hours.
The trap
Agents multiply what you already have — foundations or debt.
Skip the foundation and you get the automation confidently making things worse.
What’s the point?
Stop doing the work AI is now better at — so you can do the work it can’t.
Allows us to move to the important parts. Where should we as engineers be focused?
Thanks 🩺