Reading code in the era of generated code

I reject most of the LLM diffs that land in my review queue. The reason barely changes from one to the next: the code is too functional, or it does not scale.

The diff compiles, the tests pass, the endpoints respond. The function works today, and it'll keep working tomorrow. It won't survive the third feature that needs the same logic, because nothing was abstracted.

The advice on reading generated code shows up as checklists now. Requirement fidelity, edge cases, API integrity, security, secrets. Every new tool ships a longer one. The assumption: if you check enough items, you've read the diff. The diffs I reject pass the checklists.

Peter Naur called a program "the shared theory in the heads of its authors." Generated code has no shared theory behind it. Reading it means building the theory yourself, from the diff. You used to read a diff to figure out what your colleague meant. Now you read it to ask whether the diff means anything at all.

I've worked across a lot of stacks in my career: C#, then vanilla JS, then Angular, Android in Java and later Kotlin, React, Python on Flask, Node, Java on the backend, Next.js. Each new one took less time to pick up than the one before. Not because I was getting smarter, but because the only thing new each time was the syntax. A for-loop is a for-loop, layering is layering, contracts are contracts. The principles transfer; the syntax does not.

An LLM is just another author whose code usually looks right at first scroll. What matters is whether the principles hold underneath. The four moves below are what I run in my head in the seconds before reading a diff line by line.

Read for contracts

Scan the diff for the structures that lock the rules down:

Interfaces.
Enums paired with a sibling config map: one record per enum value, so a single lookup replaces an if-else chain.
Schemas declared once and used everywhere they're needed.

If a contract exists, the system has one source of truth. If it doesn't, every place that uses the logic has to know the same rules on its own. The first one to drift breaks the rest.

The smell is "too functional." The code solves the task right in front of it with inline conditionals, repeated branches, copy-pasted regex, when one shared abstraction would have replaced all of it. LLMs default to this.

DRY and KISS both depend on this. DRY says don't repeat yourself; the contract is why you don't have to. KISS says keep it simple; the contract is what keeps it simple as the system grows.

This move over-fires. Not every piece of code needs a contract. A helper the codebase will only call once, that no one will need to extend later, is allowed to stay a helper. The question to ask: does anyone else depend on this logic staying the same? If yes, contract. If no, inline.

Read for data flow through layers

Now look at how the diff flows through your system's layers:

Frontend: user interaction, then components, then network.
Backend: controller or blueprint, then service layer (business logic), then data abstraction such as an ORM or query layer, then the connector to the database or cache.

Each layer has a job. Each transition between layers is where one layer's contract meets the next.

The smell is layer violation: business logic that has leaked into a controller, a SQL query inside a service, a fetch call inside a UI component that should be receiving its data, not asking for it. This is what "does not scale" looks like in a diff. The code runs and the endpoints respond. The first time someone needs to change the data source, or test the business logic in isolation, or move the UI behind a different framework, the cost of skipping layering shows up.

Contracts and data flow are related but not the same. Contracts are the joints between layers; data flow is the direction the work moves across them. Move 1 asks whether the joint exists. Move 2 asks whether the joint gets respected.

This move over-fires too. A small script that fetches a config and prints a number doesn't need three layers. The move applies to systems that already have, or will soon need, layers. Apply it to a 40-line helper and you get architecture astronauts on a problem that didn't ask for them.

Read for folder structure

Look at where the change lives in the tree. Does this file sit in a sensible module or feature? Does the diff scatter edits across folders that have no business changing together? Did the LLM create a new file next to the call site instead of putting it where the codebase usually puts that kind of code? Did common code get duplicated into a feature folder when it should have moved into a shared module?

Folder structure is where Moves 1 and 2 show up visibly. Most placement errors are actually contract or layering errors, just showing in the file tree. When generated code drops a new utility next to its first caller, that's the LLM defaulting to the shortest path. The real problem is the principle being violated. The placement is the easiest tell.

This move applies hardest in 0-to-1 projects that are growing. I've run enough of them to know that structure starts to matter earlier than most teams expect. The decisions worth making early are the ones that force you to think about modules, features, what is common, what is extensible.

This move over-fires on a project that's too young to know its shape. If the codebase is two weeks old, the right move isn't folder hygiene; it's keeping the surface area small until the structure becomes obvious. Rearranging folders before then is procrastination dressed as taste.

Read for what shouldn't exist

The first three moves check whether the right thing is there. This one checks whether something is there that shouldn't be. Generated code is biased toward more code: more files, more wrappers. What I flag in every read:

A try-catch around a synchronous helper that can't throw.
A defensive null check on a value the type system guarantees.
A new abstraction over a one-line operation the codebase already calls inline.
An interface for a class with one implementation that will never have another.

I find more code than the problem needs in roughly every LLM diff I read. Your job is to ask the negative question: this helper has one caller and should be inlined; this enum has two values that won't extend; this defensive check guards a case that can't happen. Prune.

The opposite mistake: pruning unfamiliar-but-correct abstractions because you mistook them for bloat. If the codebase has chosen a pattern you don't personally like, that isn't the same as the diff being wrong. Move 4 prunes what the codebase didn't need. It doesn't prune what the codebase has already decided.

When the playbook doesn't apply

Each move only fires when the codebase is in the right shape for it:

Contracts apply when more than one place will use the logic.
Layering applies when the system already has, or will need, a layered architecture.
Folder hygiene applies when the project has lived long enough to know its modules.
Prune applies when the diff has added more than the problem required.

The over-fire cases matter as much as the others:

Apply Move 1 to a single-use helper and you get a pointless interface.
Apply Move 2 to a one-file utility and you get three layers of nothing.
Apply Move 3 to a two-week-old project and you waste a week.
Apply Move 4 to a codebase whose conventions you haven't earned and you knife working architecture you don't yet understand.

The moves are a scan. You run them in the seconds before the line-by-line read. They tell you whether the diff is worth reading line by line, and what to look for when you do.

Closing

Be skeptical by default. What an LLM produces on its own has a ceiling, and the ceiling shows up fast. Anything past a small MVP runs into scale and correctness problems the model couldn't have known to expect.

The market's answer is more automation: LLM reviewers on top of LLM authors, layered bots, policy as code, a category growing from roughly two billion dollars to five billion by 2028. Tools can support the read. They can't do it for you, because the read means building the theory, and theory doesn't live in any tool.

What the market doesn't sell is internalized principles. Four moves you carry in your hands beat a forty-item list every time. One engineer fields many PRs a week now, the review queue is the bottleneck, and scan-speed is the only speed that matters.

After the read, you've either built the theory the diff implies or refused to. The refusal is what ships back to the author, human or otherwise.

Starter kit

Scan card for the next plausible-looking PR

Before line-by-line on the next PR, run the four moves in this order:

Contracts. Does the diff inline logic that one interface, enum-with-config-map, or schema would have collapsed into a single source of truth? Skip if the helper has one caller and no one will need to extend it.
Layers. Does business logic sit in a controller, a SQL query inside a service, or a fetch call in a UI component that should be receiving its data? Skip if the file isn't part of a layered system.
Folders. Does the diff scatter edits across unrelated places, drop a new utility next to its first caller, or duplicate common code into a feature folder? Skip if the project is too young to know its modules.
What shouldn't exist. Does the diff add a try-catch around code that can't throw, a defensive check the type system guarantees, an abstraction over a one-liner, or an interface with one implementation? Skip if the codebase has chosen a pattern you wouldn't have chosen yourself.

Run this when: A diff lands in your review queue and the first scroll-through looks plausible.

Four moves you carry in your hands beat a forty-item list every time.