ProductivityMay 7, 20262 min read10 views

A Code Review Checklist for AI-Generated Code

AI writes plausible code that passes your tests and fails your principal review. This is the practical checklist for catching what the model misses before it ships.

Admin

A Code Review Checklist for AI-Generated Code

AI writes plausible code. Plausible isn't the same as correct, and it's especially not the same as maintainable in your codebase. The failure modes are specific and repeatable. This checklist covers the things you should check on any AI-generated diff before you let it merge.

Correctness

Does it handle the empty / null / zero case?
Does it handle the case where the array has one element vs many?
Are off-by-one errors hiding in the loop bounds?
Are edge cases tested, not just the happy path?

Security

AI is decent at OWASP basics and worse at subtle, codebase-specific threats. Always re-check:

Authentication and authorization. Did it copy a permission check from the right place?
Input validation. Is everything from the network parsed against a schema?
SQL safety. Parameterized queries, not string concatenation.
Secrets. Did it accidentally log a token or write one to a file?
SSRF/XSS surfaces. Any new endpoint that takes a user-supplied URL or HTML?

Codebase fit

Does it use the right utility from the right module, or did it reinvent one?
Does it match naming conventions in adjacent files?
Is the error handling consistent with the rest of the project?
Did it import from internal packages, not duplicate code?

Maintainability

Are there comments explaining the obvious that should be deleted?
Are there no comments where a non-obvious decision needs explanation?
Are abstractions appropriate for one caller, not a hypothetical second one?
Is the public API as small as it can be?

Performance

Any O(n²) loops on collections that could grow?
Database queries inside loops? (Classic AI mistake.)
Blocking I/O on the request path that could be async?
Caching where appropriate — but not where it adds complexity?

Tests

AI-generated tests have a specific failure mode: they confirm the implementation it just wrote, not the spec the implementation should satisfy. Read every test and ask, "would this still pass if the code were broken in a realistic way?"

Specific failure modes

Confidently wrong APIs. The library has been renamed, the method moved, or the signature changed. Always check imports.
Hallucinated dependencies. A package that doesn't exist, or one that does but doesn't do what the model thinks.
Quietly removed code. The model rewrites a function and drops a critical branch.
Wrong type assumptions. The model assumes a value is a string when your codebase has it as string | null.

Reading habit

Tab-accept and merge is not review. Read every diff line by line, run the tests yourself, and ask "why this and not the alternative?" at least once per change. That's the difference between "AI made me faster" and "AI gave me twice the bugs in half the time."

#code-review#claude#workflow#testing

Related essays

All articles

Productivity

Debugging with AI: Techniques That Actually Work

Debugging is the most underrated thing AI assistants are good at — when you use them right. Here's how to turn a stuck investigation into a five-minute fix.

AAdmin

May 3 20262m10 views

Productivity

Using AI for Documentation and Refactoring (Without Losing Your Voice)

AI is great at the parts of writing and refactoring no one wants to do — and terrible at the parts that need taste. Here's how to use it for both without the output reading like a press release.

Writing Better Prompts for Claude: The Engineering Edition

Most prompt advice is for chat. Engineering work is different — bigger context, clearer goals, harder verifications. Here are the patterns that consistently get good code out of Claude.

AAdmin

May 9 20262m24 views