When AI goes down a rabbit hole, use it to your advantage
You know when you have one of those coding sessions with AI.
Where the LLM happily dives down rabbit holes, digs itself into a hole, attempting more and more fixes, until finally, it concedes that you were right all along and yes, it had been trying to fix the wrong problem.
These sessions are frustrating, but they can also be gold when you use them to improve the scaffolding for your LLM.
The key is to take have the LLM reflect on what went wrong, and turn that messy session into useful data.
A concrete example#
The example here came from a real session building a small personal Mac app called Clip Capture (used for screen capture/recording and clip handling for videos).
The issue was simple on paper: no audio was being captured, so transcription was empty.
But AI very quickly veered onto the wrong path.
It decided the issue was detection of the audio, and audio levels, and set about fixing that problem.
I, on the other hand, suspected it was actually that OBS (which ClipCapture interacts with) was actually selecting the wrong microphone (and ignoring what I’d set in ClipCapture).
Spoiler, it turns out I was correct - the OBS source and the source selected in the app were out of sync.
This was a classic case of the AI trying to fix the problem, and looking in all the wrong places.
What Reflect does (and what it avoids)#
After sessions like this, I’ve taken to running a “Reflect” skill for the messy session.
The skill asks the model to inspect the session and look for a few specific things:
- Where I corrected the AI
- Where the AI looped, spun, or repeated failed approaches
- Where I explicitly set preferences (“do it this way”, “always do this”)
- Where missing context likely caused bad decisions
The first phase is analysis, not fixing.
That separation matters. If you immediately ask the model to rewrite docs or patch files, it can skip the thinking and jump straight into “helpful mode”.
Reflect works better when it reports what happened first, then proposes guidance updates as a second step.
There’s also a documentation pass that checks existing guidance (for example CLAUDE.md, agent files, skills, and related docs), then suggests what to add so the same pattern is less likely next time.
First result: useful diagnosis, but too specific#
On this session, Reflect quickly surfaced the right narrative:
- Significant effort went into Clip Capture audio processing
- The decisive issue was OBS mic routing
- Time was spent fixing the wrong layer before proving the source path
That’s already useful.
But it’s also very specific, so I nudged it to find the more general issue we could mitigate regardless of project.
The principle worth keeping: check the evidence first#
From there, the guidance became more practical:
- Verify the actual signal path before fixing
- Identify source of truth at each step
- Confirm outputs with direct evidence
A short version was adopted:
Check the evidence.
Essentially, when a bug spans multiple modules, or systems, don’t assume where it lives, prove it.
The reflect skill then went on to create guidance for future sessions, with specific examples and guardrails for checking evidence (logs etc.) before forming theories on the the root cause.
Second result: ownership boundaries#
Reflect also flagged another issue that’s easy to miss during a long session: unclear system ownership boundaries.
In this case:
- What does Clip Capture own?
- What does OBS own?
- Where should fixes happen?
- What should never be rebuilt in the app because the external tool already handles it?
That boundary clarity helps with bug fixing, but it also prevents quiet scope creep.
If every bug becomes an excuse to absorb responsibility from neighbouring systems, complexity rises quickly. You end up building features you don’t need, in places you don’t need them.
The practical rule here is simple: for any feature or bug spanning multiple tools/services, state ownership first. Then implement.
A simple workflow you can run after any “off the rails” session#
Putting aside the specific findings in this example, the reflect skill is useful any time a session goes off the rails, this is enough to start:
1) Run a reflection pass immediately#
Use the actual conversation while it’s fresh in your mind.
2) Ask for evidence#
Get the LLM to look for corrections, loops, repeated failure patterns, and explicit user preferences.
3) Keep analysis separate from fixes#
Get a diagnosis first. Then decide what should become persistent guidance.
4) Push from specific incident to general principle#
If output is too project-specific, ask: “What is the more general issue here regardless of stack?“
5) Convert only the strongest insights into guidance#
Short, testable rules win.
Examples from this session:
- Check the evidence first
- State ownership boundaries before implementing
- Close tracking loops when work is done
6) Iterate the reflector itself#
If the reflection itself drifts (too specific, too verbose, wrong abstraction level), refine the reflect prompt too (it’s iteration all the way down).
The reflect skill/prompt is part of your system, so it needs tuning like anything else.
Why this matters#
The point isn’t to make every session perfect.
You’ll still have sessions where the model gets stuck, overcomplicates things, or needs nudging.
The point is to spot repeating patterns in how the LLM works in your system, and build guidance to stop it from making the same mistakes over and over again.
A short reflection pass turns frustrating sessions into reusable constraints.
In short:
- AI coding sessions often fail in a repeatable way: debugging the wrong layer first, then looping.
- A two-phase Reflect flow works well: analyse session behaviour first, then update guidance.
- The most useful output is usually a general principle, that emerges form the project-specific post-mortem.
- Treat your guidance system as iterative: tune the reflector, refine docs structure, and keep improving session by session.
Here’s a version of a reflect skill I’ve been experimenting with:
Your skills aren't becoming obsolete. They're becoming essential.
Practical engineering principles for building software that works - with or without AI.