Code review in the age of AI: why it matters more, not less

There's a quiet shift in how a lot of teams write software now. The blank-page part, the first draft of a function, a test, a migration, increasingly comes from an assistant. That's genuinely useful. But it changes where the hard part lives. When generating code gets cheap, the scarce, valuable work moves downstream: deciding what's worth merging, and making sure it's correct, clear, and consistent with everything around it.

In other words, review stops being a checkbox at the end of the pipeline and becomes the place where most of the engineering judgement actually gets applied. Here's why we think that's true, and what good practice looks like when the first draft isn't written by a human.

The 60-second version

AI shifts the bottleneck from writing code to deciding what's good. Review is where that decision happens.
Generated code is confident and plausible, which makes the failure modes subtler, not rarer.
The author still owns the diff. "The model wrote it" is not a review answer.
Small PRs, clear intent, and fast feedback loops matter more when volume goes up.
Review is also how knowledge spreads through a team. Don't automate that away.

1. The bottleneck moved, and most teams haven't noticed

For most of software's history, the slow, expensive step was producing code. Review was a tax on top, valuable, but secondary. Generation being cheap inverts that. If a feature's worth of code lands in an afternoon, the question is no longer "can we build it" but "should we, and is this version of it right?" Those are review questions.

The risk is treating review the way we always did, a quick skim before approve, while the volume and the stakes both went up. A reviewer who rubber-stamped a teammate's careful 80-line PR could mostly get away with it. A reviewer rubber-stamping a 400-line generated diff is approving code that no human has actually read yet. That's a new failure mode, and it's worth naming out loud on your team.

2. Generated code fails differently

Human-written code tends to fail in human ways: a typo, a forgotten edge case, an off-by-one. Generated code fails in its own way, and the patterns are worth knowing because they're easy to miss:

Confident and plausible, but wrong. The code looks idiomatic, names things well, and is subtly incorrect. Polish is no longer a signal of correctness.
Locally fine, globally off. It solves the prompt but ignores the convention three files over, reimplements a helper you already have, or adds an abstraction the codebase doesn't need.
Defensive sludge. Nil checks for things that can't be nil, error handling for cases that can't happen, validation at the wrong boundary. It compiles and passes, and it quietly makes the code harder to read.
Plausible-looking tests that don't test much. A test that asserts the mock was called, not that the behavior is correct. Green checkmark, no real coverage.

None of this means "don't use the tools." It means the reviewer's job is now partly to catch the specific things models are bad at: cross-file consistency, whether the change should exist at all, and whether the tests would actually fail if the code broke.

3. The author still owns the diff

The single most important practice, and the easiest one to let slip: whoever opens the PR is accountable for every line in it, regardless of who or what typed it. "The assistant generated that" is not a defense in review, the same way "I copied it from Stack Overflow" never was. If you can't explain why a line is there, it shouldn't be in your PR yet.

Practically, that means reading your own generated diff before you ask anyone else to. Self-review first. It's faster than it sounds, it catches the obvious stuff before it wastes a reviewer's attention, and it keeps you the author rather than a courier passing the model's output to your team.

Where we sit in this: we build PRFlow, which puts merge requests and their review threads into Slack, with CI/CD status and comments synced in line. We care about review latency because a PR that sits for a day is a PR nobody fully remembers by the time it gets looked at. Faster, more visible review is the whole point, not more notifications.

4. Good engineering practice is the same, it just matters more

The advice here isn't new. It's the boring stuff that good teams already did, now with higher stakes because the volume went up:

Keep PRs small. A reviewer's attention is finite and roughly fixed. If generation lets you produce four times the code, that's four times the diff competing for the same eyeballs. Smaller PRs are the only thing that scales.
Write the intent down. A good PR description, what changed, why, what you considered and rejected, is worth more than ever, because the reviewer can no longer assume the author reasoned through every line. State the reasoning explicitly.
Prefer deletion to addition. Models are biased toward adding code, more checks, more helpers, more abstraction. The best review comment is still often "do we need this at all?"
Let machines catch machine problems. Linters, type checkers, formatters, and tests should catch the mechanical stuff so human review can spend its budget on design, intent, and the things a tool can't judge.

5. Review is how a team thinks, not just how it ships

Here's the part that's easy to forget when the conversation is all about velocity. Code review was never only a quality gate. It's how a junior engineer learns the codebase, how a convention spreads without a meeting, how two people end up with the same mental model of a system. It's one of the few places where real, specific, in-context teaching happens at work.

If you let AI write the code and then point an AI reviewer at it and merge on green, you might ship faster for a while. But you've quietly removed the step where humans build shared understanding of their own system. That debt doesn't show up in the diff. It shows up six months later when nobody on the team can confidently explain how a core path works, because no human ever really read it.

Automated review tools are genuinely useful, use them. They're great at the first pass: style, obvious bugs, missing tests, security smells. Let them clear the noise so the human review can be about the things that need a human. The mistake is treating the automated pass as the whole review rather than the warm-up to it.

6. What "good developer experience" means now

Developer experience used to be mostly about how fast you could go from idea to working code. That part is largely solved, or at least dramatically cheaper. The frontier of DX has moved to the review loop: how fast a PR gets a real look, how clearly feedback comes back, how little friction there is between "I have a thought" and "the author sees it."

A team where review happens in hours, in context, with clear feedback, will outpace a team drowning in a backlog of generated PRs that nobody has the attention to read carefully. The bottleneck is human attention now. Good DX is whatever respects it: small changes, clear intent, fast and visible feedback, and tooling that surfaces the review when it's fresh instead of letting it rot in a tab.

Bottom line

AI didn't make code review less important. It moved the hard part of engineering into review and then handed everyone a firehose. The teams that do well from here are the ones that take review seriously as the place where judgement lives: small PRs, honest descriptions, authors who own their diffs, machines for the mechanical pass, and humans for everything that actually requires a human. The writing got cheap. The thinking didn't.