Reserving capacity for the unknown · Engineering Playbook

A team I once ran shipped on a three-week cadence. One Wednesday in week one, three things landed in the same morning. A leadership P0 about a pricing change the go-to-market team needed shipped before quarter end. A production issue customer support had triaged for two days, now escalated to us. And the head of growth dropping into our channel to ask why the experiment we owed them was sliding. The team was already at full sprint capacity; there was no decision that didn't break a commitment.

That morning is the failure mode every new EM eventually runs into, and the root cause is structural. In a fast-moving company, priorities come from many places:

Engineering owns the roadmap.
Customer support drives incident urgency.
Growth defines its own deadlines.
Leadership reserves the right to a true P0.

The clean version of this is one senior engineering manager whose call is full and final, with every priority routing through them. Larger, slower orgs sometimes have it because there's enough stability for one person to hold the full priority map in their head. In fast-moving companies, the role is rare and brittle. Either the person can't keep up with the pace and becomes a bottleneck, or the org grows around them and starts routing decisions through PMs, growth leads, and skip-level execs anyway. Most teams end up with four people who all believe they have veto power, and your job as the EM is no longer to wait for the arbiter.

If you accept that surprises are a property of the environment, the question becomes how much room your plan leaves to absorb one.

The first move most teams reach for is wrong: the per-task buffer, where you pad every estimate by 10 to 20 percent and trust the cushion will be there when you need it. It's defensive and invisible, and it fails twice:

Inward. Cushion buried inside an estimate disappears into the work. Pad a three-day task to four and it takes four. The cushion doesn't survive as recoverable slack at the end; it gets eaten during execution, and by the time you go looking for it, there is nothing left to pull.
Outward. The cushion isn't a thing you can point to and protect. It's buried across every task estimate, and PM can't see any of it. PM doesn't believe in it; growth doesn't believe in it; leadership definitely doesn't.

The mental model that fixed this for me is a bucket. Capacity is the bucket; work is the water. A planned sprint is what we've already poured in, and any incoming surprise is more water you're trying to add. People often think you can swap water out, pull a task to make room, but in practice that swap is expensive. You're mid-flight on the bumped task, dependencies are already moving, and the team has to re-plan in the middle of executing. A bucket that isn't full can absorb the new water; a bucket that is full forces a tradeoff every time.

The framework I run leaves a named line in the bucket that everyone, including PM and growth and leadership, can see. Reserve is a category of work in the plan rather than a margin on estimates. Martin Fowler called the underlying idea "slack" years ago; what follows is two ways I've found to operationalize it so it doesn't quietly dissolve.

The first is a sprint-level buffer with the breakdown written out. On a three-week sprint, each engineer has about fifteen working days, split:

Nine days for planned task work.
Two to three carved out as named reserve.
Three to four absorbed by standups and small coordination tasks that aren't worth estimating.

I'll name where the reserve goes: roughly one day for adhoc leaves (each person on the team takes about one a month, and across the team that's a near-certainty in any given sprint), and roughly two days for adhoc work pulled in mid-sprint. Once you start naming the drains, the list grows:

Interview loads when hiring is on.
Onboarding cost for a recent hire.
On-call carryover from last week.
The code review queue.
Cross-team unblocks.

You don't have to model these formally. Naming them is most of the work, because the buffer stops being a vibe and starts being a line item PM can argue with. A rolling average over a few sprints will tighten the number. Do that later, not first.

The second is structural rather than per-sprint. We run a one-week mid-release after every three-week main sprint. The full cycle is three weeks of planned work followed by one week of calendar reserve. The week absorbs:

Spillover from the last release.
Production issues that don't warrant a hotfix.
Internal bugs.
Tech debt.
Small tech initiatives.

It's also the natural home for sprint retro, 1-1s, and next-sprint planning, which would otherwise eat into "real" sprint capacity and be first to drop under pressure. Every stakeholder knows the week exists; no one has to negotiate it back into the schedule each cycle, which is the part that matters.

Both of these will have weeks where the reserve isn't consumed. That's the point. But you do want a staged backlog ready for those weeks: low-priority production issues, bugs, crashes, well-scoped debt items. If a developer is freeing up ahead of time and the reserve isn't being eaten, pull from there. Tech changes fast enough that finding good filler work is rarely the bottleneck. The discipline is keeping the backlog triaged and shallow rather than letting it accumulate into a junk drawer that nobody trusts.

Negotiate the reserve before the surprise lands. Mid-incident is the worst time to argue with PM about whether slack is real. The smallest version of this contract is a pre-agreed swap rule. If a P0 comes in, the lowest-ranked item in the current sprint bumps to next sprint, automatically, without a meeting. You don't have to get every stakeholder to love the rule; you have to get them to acknowledge it once, in the calm.

A team running at 100 percent planned capacity has decided to pay for every surprise with chaos; reserve is what you write into the plan so the decision is already made.

Starter kit

the sprint capacity worksheet I run today

Category	L	M	S	Total devs
Workstream: Product	3	0	0	3
Workstream: Engineering	2	0	0	2
Workstream: Platform	0	2	0	1
Workstream: Growth	1	0	0	1
Buffer				1
Total				8

For each row, count initiatives by size. Sizes carry weights: L = 1 dev, M = 0.5, S = 0.25. The total-devs column is weight times count, summed across the row. Sum down to the bottom is the sprint's committed devs, which you compare against actual headcount.

Workstreams. One row per workstream the team owns. Sized the same way. Common types I run: product (roadmap features PM owns), engineering (tech debt, refactoring, internal tools, performance), platform (oncall, dependency upgrades, infra babysitting, the work you cannot say no to), growth (experiments and surface-area expansion). Add rows for your specific product surfaces. Keeps the conversation about which workstream is over committed, not whether the team is abstractly "full".

Buffer. The named reserve. Per engineer, per sprint: ~1 day for adhoc leaves, ~2 days for adhoc work pulled in mid-sprint, plus headroom for interview loads, onboarding ramp, on-call carryover, and code review queue surges. Multiply across the team to get the buffer line, which usually lands at 10 to 15 percent of the non-buffer total. Listed as its own line so every stakeholder can see it. This is the line you defend.

The table is in devs (one dev is roughly one engineer for one sprint, about 15 working days). The buffer row is the per-engineer reserve from earlier in the essay, multiplied by team size. Same idea, team-level view.

This sheet is the sprint-level reserve only. The one-week mid-release described earlier is the calendar-level reserve. Run both.

If the total at the bottom exceeds the team's actual count, something has to come out. Cut elsewhere before the buffer, even though it is the easiest line to cut.