Framing bets so they can fail cheaply

Picture the typical EM's quarter. It opens with ten bets in flight; eight get built, two slip, but none of the eight produces a clear answer. The PM is already lining up ten more ideas for next quarter, also small and unconnected, and the retro is adding a new check because last sprint's spec arrived late. The team is moving, busy, exhausted, and going nowhere in particular.

The lean startup method was supposed to fix this: build, measure, learn, with cheap experiments that kill what doesn't work. The framing has been around for over a decade, and most teams quote it back fluently. Somewhere on the way down from the slide deck to the sprint, the essence got sanded off. What survived was "be fast." What got lost was the part about pointing the bets at a goal and using failure as information.

A useful picture: think of each bet as an arrow with a direction and a size. One enormous arrow means the whole quarter rides on a single shot, and a hundred small arrows pointing every which way means motion without progress. What actually compounds is a small number of small arrows all pointed at the same goal, with the direction nudged based on what each one returned.

Cheap failure is what lets you nudge. If a bet that doesn't work also wrecks the quarter, you cannot nudge; you can only flinch. So cheap failure is design work, done before the bet starts, but the slogan never specifies what that work is. The job is to design the bet so that a no answer is survivable and clear, and so that the team walks out with sharper direction even if the bet itself did not work.

What follows is the playbook I run for that: seven steps for designing and running a bet, then a closing section on how this playbook gets misused.

Set the goal before you size bets

Pick the goal first, decide how many bets the quarter can carry, then write them out.

Most teams do this in reverse. A PM walks in with ten product ideas they have conviction in, and the team gets excited and builds them. None of the ten are aimed at a stated goal, and nothing learned in one bet feeds the next. The quarter ends with a lot shipped and the business number still flat. The team is told to be more product-led, and the cycle repeats.

So here it is: name the goal in one sentence. It's unglamorous. Say it is average revenue per user, or activation in the first week, or deflecting a specific kind of support ticket. Then cut the bet list to five things that could plausibly change that number, ranked by which has the strongest hypothesis rather than by who is most excited. The slots you save are not free capacity; they are what the later doubling-down step will need.

This step comes before portfolio thinking, which is about mixing safer work with riskier bets across the quarter. It points each uncertain bet at the same goal, so the learning from one feeds the design of the next.

Write the hypothesis, including how it dies

Most specs read like contracts: we are building X to achieve Y, ship date Z. Success is implicit, and the team works as if the feature is sure to work. When it does not, the failure feels like betrayal instead of information.

A bet meant to fail cheaply is written differently; before the team starts, four things go on paper:

The change being made.
The specific ways it can fail.
The reason it is worth trying anyway.
The thing the team hopes to learn whether it works or not.

The fourth one is the real test. If you cannot name what you will learn from a no answer, the bet is not designed for cheap failure; it is designed to succeed or be embarrassing.

Here is the number that makes this concrete. The success rate on individual feature bets in startups is somewhere around one in four, while most specs are written as if it were four in four. That single mismatch is responsible for an enormous amount of the demoralisation and process-bloat that follows a failed bet: the team is shocked by the no answer, and the answer is shocking only because the spec lied about the odds.

Failure rehearsals are how teams come up with the named ways a bet can fail. Run the rehearsal before you write the hypothesis, once the bet is big enough to be worth the time.

Cap the count and double down on what works

The first instinct, given a long list of plausible bets, is to run them all in parallel. Resist it, and halve the count you first wanted.

Here is why: with fewer bets running, you can actually see what each one is doing. With ten bets, weak results all look the same; with five, you can spot what is moving and act on it early. Kill the duds inside the first two weeks, then take the freed capacity and double down on the one or two that worked, by tightening the scope, expanding the rollout, or starting a second bet that builds on what the first one taught you. Compounding lives in the doubling-down.

Most teams treat this as an accident. They start the quarter committed to ten bets and finish it committed to the same ten bets; the only variation is which ones got the most love. Plan for it from week one, so that moving capacity into the winners becomes the default rather than the exception. Per-bet stopping rules belong in time-boxing and kill criteria. Here, the discipline is team-wide: cap the count, and move capacity to the winners.

Strip the MVP to must-haves

The MVP, or minimum viable product, exists to test the hypothesis and nothing else.

When a team builds a "minimum viable" version that is somehow polished and respects the designer's full vision, they are not building an MVP; they are building the second version of a product they have already decided is going to win. Good-to-haves in the first cut are a tell: the builder is trying to validate their bias rather than the hypothesis.

The discipline is to strip the bet to the smallest version that could plausibly test the hypothesis you wrote earlier. If the hypothesis is that users will pay for X if you offer it, the first cut is a paywall and a charge, with the beautifully designed checkout coming later. If the answer comes back yes, you build the rest; if it comes back no, you have not sunk three weeks of design on something you are about to throw away.

Set the rhythm: checkpoints in, mid-sprint changes out

Once the bets are running, two rules keep the work on track without freezing it.

The first is a scheduled checkpoint, where the team gathers on a fixed rhythm, looks at what each bet has produced so far, and makes small corrections. Adjustments stay small, because the point of the checkpoint is to catch drift early rather than to redirect the bet.

The second is a freeze on changes to the plan mid-sprint, unless the issue is a true emergency. This rule looks restrictive but is actually protective: without it, every weak early result becomes a debate, every debate becomes a scope change, and no bet survives long enough to produce a clean answer. With it, the team trusts that the plan will hold until the next checkpoint, when they actually evaluate the bet.

A worked example. I moved a team off the standard engineering-and-product split into a builder-mode shape, where engineers carried more end-to-end ownership. Part of the reason was the AI era itself: models now fill more of the context-gathering work the spec used to do. Before we started, I told the team explicitly that the first few sprints would surface issues; that framed the risk as intentional. Inside the sprints, I held the no-changes-mid-sprint line and did not adjust the shape on every wobble. At the checkpoints, we looked at the inputs together, picked the small adjustments worth making, and protected the team from thrashing on early results in between. The rhythm is what made the bet possible to evaluate at all.

After failure, classify before correcting

The most expensive thing an EM can do after a failed bet is to add a new process check.

The end-of-first-sprint retro for that builder-mode transition produced many inputs, one of them sharp. Someone said specs arrived late and were half-baked, and proposed adding a check to ensure they were early and complete from now on. The team's instinct was to add the check to the working agreement and call it done.

I held off. The first sprint of a new working shape is exactly when first-attempt failures are expected, so the risk was intentional and the failure was anticipated; the input was a known cost of running the bet rather than a flaw it revealed. The right response was to log the learning and revisit only if the pattern kept showing up after the team should have settled in. Same posture across several similar inputs that retro.

This is the classification step, with two questions before any process change. Did this fail because the risk was intentional and the answer was no? Log the learning and resist changing anything. Did this fail because the design was bad (scope too large, no ways to fail named, vision built instead of MVP)? Fix the design rather than the process.

The hidden cost of skipping that step is severe: every failure generates a new check, and the team becomes safer on paper and slower in practice. Six months in, no one can ship without three approvals, and no one remembers which approval was supposed to prevent which failure. Risk aversion is the real cost, and it accumulates one well-meaning retro at a time. What happens in the retro that follows is the craft of postmortems that change behaviour; this step is the filter you apply first, deciding whether the retro should produce learning or a process change.

Back conviction when the room earns it

The playbook so far has been about breaking big bets into small reversible shots. The exception matters.

Sometimes one person on the team has very high conviction in a single big-arrow bet. They have earned the conviction, and the room is smart enough to back them. In that situation, slicing the bet into many small reversible shots is the wrong call: it dilutes their shot and treats the playbook as dogma rather than design.

The case I learnt this on was a new product the team created. One builder had very high conviction in the direction, the room was smart, and the builder had a track record, so we backed the big-arrow bet whole. The reverse call, mechanically slicing it down because the playbook said so, would have killed the bet on rhythm alone.

This is where disagreeing and committing does its work. If you are in the room and you disagree, you record the disagreement, you commit to the call, and you give the builder real cover to run the shot. What makes this work is the room and the person carrying the bet, rather than the framework. In a less smart room, with less earned conviction, the same call fails; the playbook does not protect you from that.

When this gets it wrong

Two ways the playbook itself goes wrong.

The first is applying small-bets discipline mechanically to a conviction shot. The bet gets sliced into reversible cuts because that is how we do it, and the person carrying it never gets to run the real bet. The exception step above is the corrective: the playbook is design work rather than dogma.

The second is tightening the rhythm rules until they suffocate the work: daily course corrections, with a check added for every retro input. The rule was supposed to protect the team from thrashing on early results, but pushed too hard it produces the opposite failure, where the team thrashes on the rules instead of on the work. If the rule starts protecting the process rather than the bet, drop it.

One last thing on the culture argument. The fail-fast slogan attracts two standard critiques: people learn less from failure than they think, and the office ritual of celebrating failure is mostly theatre. Both are fair on the slogan, but both miss the practice underneath it. Don't defend the slogan; do the design work the slogan never specified. Get the design right and the team stops needing the culture argument. People discuss failure positively when it is genuinely a source of information; they discuss it defensively when the design forced it to be embarrassing.

Starter kit

A brief you fill in before the bet runs, plus the classification you run if it dies

Quarter setup

Field	Your answer
Quarter goal (one sentence)	___
Cap on concurrent bets	___

Bet brief (one per bet)

Field	Your answer
Bet name	___
Hypothesis (if X then Y)	___
Why worth trying	___
Ways it can fail	___
What we learn if it does not work	___
MVP scope (must-haves only)	___
Kill criterion (result or date)	___

If the bet does not work, classify before correcting

Intentional risk, answer was no. Log the learning, no process change.
Bad design (scope too large, no ways to fail named, vision built instead of MVP). Fix the design.

Run this when: quarter planning, or after a retro proposes a new process check.

A bet earns the right to fail cheaply by being designed for it.