A Code Review Checklist for AI-Generated Code
AI writes plausible code that passes your tests and fails your principal review. This is the practical checklist for catching what the model misses before it ships.
AI writes plausible code. Plausible isn't the same as correct, and it's especially not the same as maintainable in your codebase. The failure modes are specific and repeatable. This checklist covers the things you should check on any AI-generated diff before you let it merge.
Correctness
- Does it handle the empty / null / zero case?
- Does it handle the case where the array has one element vs many?
- Are off-by-one errors hiding in the loop bounds?
- Are edge cases tested, not just the happy path?
Security
AI is decent at OWASP basics and worse at subtle, codebase-specific threats. Always re-check:
- Authentication and authorization. Did it copy a permission check from the right place?
- Input validation. Is everything from the network parsed against a schema?
- SQL safety. Parameterized queries, not string concatenation.
- Secrets. Did it accidentally log a token or write one to a file?
- SSRF/XSS surfaces. Any new endpoint that takes a user-supplied URL or HTML?
Codebase fit
- Does it use the right utility from the right module, or did it reinvent one?
- Does it match naming conventions in adjacent files?
- Is the error handling consistent with the rest of the project?
- Did it import from internal packages, not duplicate code?
Maintainability
- Are there comments explaining the obvious that should be deleted?
- Are there no comments where a non-obvious decision needs explanation?
- Are abstractions appropriate for one caller, not a hypothetical second one?
- Is the public API as small as it can be?
Performance
- Any
O(n²)loops on collections that could grow? - Database queries inside loops? (Classic AI mistake.)
- Blocking I/O on the request path that could be async?
- Caching where appropriate — but not where it adds complexity?
Tests
AI-generated tests have a specific failure mode: they confirm the implementation it just wrote, not the spec the implementation should satisfy. Read every test and ask, "would this still pass if the code were broken in a realistic way?"
Specific failure modes
- Confidently wrong APIs. The library has been renamed, the method moved, or the signature changed. Always check imports.
- Hallucinated dependencies. A package that doesn't exist, or one that does but doesn't do what the model thinks.
- Quietly removed code. The model rewrites a function and drops a critical branch.
- Wrong type assumptions. The model assumes a value is a string when your codebase has it as
string | null.
Reading habit
Tab-accept and merge is not review. Read every diff line by line, run the tests yourself, and ask "why this and not the alternative?" at least once per change. That's the difference between "AI made me faster" and "AI gave me twice the bugs in half the time."
Continue reading
Related essays
Debugging with AI: Techniques That Actually Work
Debugging is the most underrated thing AI assistants are good at — when you use them right. Here's how to turn a stuck investigation into a five-minute fix.
Using AI for Documentation and Refactoring (Without Losing Your Voice)
AI is great at the parts of writing and refactoring no one wants to do — and terrible at the parts that need taste. Here's how to use it for both without the output reading like a press release.
Writing Better Prompts for Claude: The Engineering Edition
Most prompt advice is for chat. Engineering work is different — bigger context, clearer goals, harder verifications. Here are the patterns that consistently get good code out of Claude.