How I debug AI-generated code: a systematic approach

AI-generated code has a characteristic failure mode that's different from human-written code bugs. Human bugs tend to be logic errors — an off-by-one, a wrong condition, a missed edge case. AI bugs tend to be contextual errors — code that's correct in isolation but wrong for the specific system it's being integrated into.

Understanding this difference changes how you debug.

The category of AI bugs

After two years of using AI coding tools in production, I've found that AI-generated code fails in roughly four categories:

1. Dependency mismatch. The model generates code using an API or import that's slightly wrong for your version. It uses createClient from @supabase/supabase-js v2 when you're on v1. It uses a function that exists in the latest React types but not in the version pinned in your package.json. The code looks right but the types don't match.

2. Pattern violation. The model doesn't know your codebase conventions. It generates a React component using useEffect + useState for data fetching when your codebase uses RTK Query. It adds inline styles when your project uses Tailwind exclusively. The code works but doesn't fit.

3. Scope overreach. The model generates more than you asked for — extra error handling that contradicts your existing error boundary, additional state management that duplicates something that already exists, configuration options for features you don't use. This isn't a bug per se, but the extras introduce complexity that needs to be unwound.

4. Silent assumption. The model makes an assumption about your environment that it doesn't document — that process.env.API_URL is defined, that a particular table exists in the database, that the user is always authenticated. The code runs fine in the model's mental environment, and fails in yours.

The debugging approach

Step 1: Read it before running it

This sounds obvious but it's easy to skip. Before copy-pasting generated code into your editor, read it as you'd read a PR. Ask:

What imports does this add? Are they compatible with your current versions?
What does it assume about the environment? What env vars, globals, or context does it expect?
What did it add that you didn't ask for?

The goal is to catch category 2, 3, and 4 bugs before they cause a runtime error.

Step 2: Run the TypeScript compiler first

TypeScript will catch category 1 bugs (dependency mismatch) faster than any other technique. Before running the app, run tsc --noEmit and read the errors. A type mismatch on an import often reveals which version the model assumed.

Step 3: Isolate before integrating

If the generated code is a function or a hook, test it in isolation before integrating it. A simple unit test with the expected inputs tells you faster whether the core logic is correct without the noise of the surrounding application.

For UI components, render them in Storybook or a test renderer before dropping them into the page.

Step 4: When something is silently wrong

This is the hardest category — code that runs but doesn't produce the right output. The debugging approach here is the same as for any bug: binary search the problem space. Comment out half the generated code and see if the problem remains. Narrow until you've isolated the incorrect block.

AI bugs in this category are often in the data transformation layer — the model assumed the input had a different shape than it actually does. Add a console.log at the point of transformation and compare actual input to the assumed input.

Step 5: When to regenerate vs. when to fix manually

Regenerating is faster when the bug is in a foundational assumption (category 4) that would require significant rework. Give the model the missing context explicitly: "The user is not always authenticated. Handle the unauthenticated case by returning null and showing a login prompt."

Fixing manually is faster when the bug is small and localized — a wrong variable name, a missing null check, an incorrect condition. Don't regenerate a 100-line function for a 2-line fix.

The meta-skill

The underlying skill is knowing what "correct for your system" looks like, not just "correct in general." This is why code review remains valuable even for AI-generated code. The model doesn't know your system. You do.

Debugging AI-generated code is mostly a matter of rapidly answering: is this correct for my system, or just plausible in general? The bugs are rarely subtle logic errors. They're almost always mismatches between the model's assumed context and your actual context.

Once you see this pattern, you start providing that context upfront in the prompt — which is why spec-first prompting leads to fewer debugging sessions in the first place.