When AI goes down a rabbit hole, use it to your advantage

Published on

You know when you have one of those coding sessions with AI.

Where the LLM happily dives down rabbit holes, digs itself into a hole, attempting more and more fixes, until finally, it concedes that you were right all along and yes, it had been trying to fix the wrong problem.

These sessions are frustrating, but they can also be gold when you use them to improve the scaffolding for your LLM.

The key is to take have the LLM reflect on what went wrong, and turn that messy session into useful data.

A concrete example

The example here came from a real session building a small personal Mac app called Clip Capture (used for screen capture/recording and clip handling for videos).

The issue was simple on paper: no audio was being captured, so transcription was empty.

But AI very quickly veered onto the wrong path.

It decided the issue was detection of the audio, and audio levels, and set about fixing that problem.

I, on the other hand, suspected it was actually that OBS (which ClipCapture interacts with) was actually selecting the wrong microphone (and ignoring what I’d set in ClipCapture).

Spoiler, it turns out I was correct - the OBS source and the source selected in the app were out of sync.

This was a classic case of the AI trying to fix the problem, and looking in all the wrong places.

What Reflect does (and what it avoids)

After sessions like this, I’ve taken to running a “Reflect” skill for the messy session.

The skill asks the model to inspect the session and look for a few specific things:

The first phase is analysis, not fixing.

That separation matters. If you immediately ask the model to rewrite docs or patch files, it can skip the thinking and jump straight into “helpful mode”.

Reflect works better when it reports what happened first, then proposes guidance updates as a second step.

There’s also a documentation pass that checks existing guidance (for example CLAUDE.md, agent files, skills, and related docs), then suggests what to add so the same pattern is less likely next time.

First result: useful diagnosis, but too specific

On this session, Reflect quickly surfaced the right narrative:

That’s already useful.

But it’s also very specific, so I nudged it to find the more general issue we could mitigate regardless of project.

The principle worth keeping: check the evidence first

From there, the guidance became more practical:

A short version was adopted:

Check the evidence.

Essentially, when a bug spans multiple modules, or systems, don’t assume where it lives, prove it.

The reflect skill then went on to create guidance for future sessions, with specific examples and guardrails for checking evidence (logs etc.) before forming theories on the the root cause.

Second result: ownership boundaries

Reflect also flagged another issue that’s easy to miss during a long session: unclear system ownership boundaries.

In this case:

That boundary clarity helps with bug fixing, but it also prevents quiet scope creep.

If every bug becomes an excuse to absorb responsibility from neighbouring systems, complexity rises quickly. You end up building features you don’t need, in places you don’t need them.

The practical rule here is simple: for any feature or bug spanning multiple tools/services, state ownership first. Then implement.

A simple workflow you can run after any “off the rails” session

Putting aside the specific findings in this example, the reflect skill is useful any time a session goes off the rails, this is enough to start:

1) Run a reflection pass immediately

Use the actual conversation while it’s fresh in your mind.

2) Ask for evidence

Get the LLM to look for corrections, loops, repeated failure patterns, and explicit user preferences.

3) Keep analysis separate from fixes

Get a diagnosis first. Then decide what should become persistent guidance.

4) Push from specific incident to general principle

If output is too project-specific, ask: “What is the more general issue here regardless of stack?“

5) Convert only the strongest insights into guidance

Short, testable rules win.

Examples from this session:

6) Iterate the reflector itself

If the reflection itself drifts (too specific, too verbose, wrong abstraction level), refine the reflect prompt too (it’s iteration all the way down).

The reflect skill/prompt is part of your system, so it needs tuning like anything else.

Why this matters

The point isn’t to make every session perfect.

You’ll still have sessions where the model gets stuck, overcomplicates things, or needs nudging.

The point is to spot repeating patterns in how the LLM works in your system, and build guidance to stop it from making the same mistakes over and over again.

A short reflection pass turns frustrating sessions into reusable constraints.

In short:

Here’s a version of a reflect skill I’ve been experimenting with:

Reflect Skill

Your skills aren't becoming obsolete. They're becoming essential.

Practical engineering principles for building software that works - with or without AI.

    Join 7,000+ developers. Hype-free.
    Next Up
    1. AI broke your architecture? Here's how to fix it