Rovonn Russell/ Blog

Systems

What I Learned Building 15 AI Agents for a Storytelling Company

Fifteen agents in, here is what actually held up — and the three architectural mistakes I had to unlearn before any of it became reliable enough to run a business on.

rovonn-russell4 min read
A workspace with multiple terminal windows running AI agents

Direct answer: After shipping 15 production AI agents at Impact Loop, the single most important lesson is this: the agent should be the dumbest part of the system, not the smartest. Push every decision you can into deterministic scripts, and let the LLM only do the parts that genuinely require judgment. That one shift took my agents from "impressive demo, fragile in production" to "boring, reliable, runs every morning at 6 a.m. without me."

This post is the plain version of what I'd tell myself if I could go back to agent #1.

The math nobody talks about

Here's the math that quietly kills most agent projects: if a single LLM call is right 90% of the time, and your workflow chains five of them together, your end-to-end success rate is 0.9⁵ = 59%. Two out of five times, your agent fails — and worse, it fails creatively, in a different place every time, which makes debugging brutal.

The fix isn't a smarter model. The fix is fewer LLM decisions per workflow.

The three mistakes I had to unlearn

Mistake 1: One big mega-prompt that did everything. My first agent tried to scrape leads, classify them, enrich them, draft an email, and send it — all inside one monster system prompt. It "worked" maybe 40% of the time. I rebuilt it as five tiny scripts where the LLM only made one judgment call per script, and the success rate jumped past 95%.

Mistake 2: Letting the agent format its own output. Free-text outputs are how you get downstream parsing errors. Every modern agent should be calling tools with structured JSON, not "please respond in this format." The day I stopped trusting prompts to format output correctly was the day my agents started shipping.

Mistake 3: No self-annealing loop. When an agent failed, I used to debug it once, fix the script, and move on. The breakthrough was treating every failure as a chance to update the instructions file the agent reads, so the next run is permanently smarter. The system gets stronger with use instead of weaker.

What an actually-reliable agent looks like

The agents that survived contact with real clients all share the same shape. They have a tiny instructions file that tells the LLM what to do. They have a scripts/ folder full of deterministic Python that does how. The LLM picks which script to run, the scripts do the work, and any decision the scripts can't make gets handed back up to the LLM as a single, narrow question.

That separation is the entire game. Once I started building this way, the work stopped feeling like wrestling and started feeling like wiring up Lego.

The agents I'd build first if I were starting over

If I were starting over today and could only build three agents, they'd be: an inbox-triager that classifies and routes incoming email, a lead-research agent that turns a name into a one-page brief, and a content-publishing agent that takes a draft and ships it through the full SEO + distribution pipeline. Those three give you back roughly 8–12 hours a week, which is enough leverage to fund building everything else.

The honest part

Most AI agent demos you see online don't run twice. Mine didn't either, for a long time. The shift from "demos well" to "boots up at 6 a.m. for 90 days straight without me touching it" wasn't a model upgrade — it was a discipline upgrade. Smaller scopes. Deterministic glue. Structured outputs. Self-annealing instructions.

If you're building your first agent and it feels fragile, the answer probably isn't a better prompt. The answer is less prompt.

Want help thinking through your AI build?

If you're trying to figure out where AI agents actually fit in your business — and where they're going to waste your money — let's talk.

Book a Strategy Session

Frequently Asked Questions

Should you build AI agents with one big prompt or many small steps?

Many small deterministic steps. A single agent making 5 decisions in a row at 90% accuracy each only succeeds 59% of the time. Pushing decisions out into deterministic scripts is the difference between a demo and a production system.

What is the biggest mistake people make when building their first AI agent?

Trying to make the agent do everything — research, decide, execute, verify, and report — in one long context. The agents that actually work are the ones where the LLM is doing only the parts a script can't do, and everything else is hard-coded.

Ready to work with Rovonn Russell?

Book a Strategy Session