A well-formed bet takes 30-45 minutes to review. If it takes longer, the bet is not ready.
The hypothesis owner, tech lead, and validation engineer run through these five questions together in the bet review meeting. The goal is not to fill out a form. The goal is to surface the gaps before the sprint starts, when they are cheap to fix.
If the answers are not satisfactory, the bet does not enter the sprint. It returns to the hypothesis owner for revision. This is not a failure. It is the process working.
All eight criteria must be checked before the sprint starts. Partial sign-off is not sign-off.
Can this bet be denied?
Read the hypothesis aloud. Now describe the result that would count as a denial. If you cannot describe a specific denial result, the hypothesis is not falsifiable. A hypothesis that can only be confirmed is not a hypothesis. It is a justification.
If the bet predicts that reducing time-to-first-integration below four minutes will increase 30-day activation from 34% to 42%, a denied result is: the measurement window closes, 30-day activation is at or below 34%, and the minimum exposure threshold was reached.
We believe this feature will improve the onboarding experience.
We think users will find this more intuitive.
This should help with activation.
None of these can be denied. They can only be confirmed, dismissed, or ignored.
Return the bet. Ask the hypothesis owner to write the success threshold as a specific number with a baseline and a window. If they cannot, the bet is not ready.
Is the success threshold written with a specific number, a baseline, and a window?
Look at the Outcome Bet Template. The threshold must have three parts: a baseline (the current value), a target (the value that constitutes success), and a window (the date range in which the measurement will be taken).
30-day activation increases from 34% to 42% within a 60-day measurement window.
Activation improves.
We expect to see an increase in 30-day activation.
A meaningful lift in activation.
A threshold not specified before the window opens will be selected after the results come in. That is not confirmation. It is a threshold chosen to match the result. The process has no integrity in that condition.
Return the bet. The hypothesis owner needs to pull the current metric value and commit to a target before the bet enters development. If the current metric value is not known, pull it before the review meeting. If it cannot be pulled, the measurement design is not ready.
Is the measurement design in place before the first line of code is written?
Four sub-questions, all of which must have satisfactory answers before the bet enters the sprint.
3a. What is the primary metric, and is it already instrumented?
If the metric is not currently being collected, instrumentation must be part of the sprint scope. Building the feature first and adding measurement later is not acceptable. Measurement after the fact is not measurement. It is reconstruction.
3b. Does the analytics pipeline schema expect the events this feature will emit?
The feature may emit an event called account_activation_completed. The analytics pipeline may expect user_activation_complete. These are not the same. Schema mismatches at this stage cost an entire measurement window to discover if not caught here. Check with the data engineering or analytics team before the sprint starts.
3c. Is there a holdout design, and if so, is it documented?
If the bet requires a holdout group to distinguish treatment effect from confounding variables, the holdout design must be specified before development starts: the holdout percentage, how the holdout will be maintained, and what the comparison will be. A holdout designed after deployment is a holdout in name only.
3d. Is the minimum exposure threshold defined?
How many users, sessions, or events must occur before the result is evaluable? Below this threshold, any result is inconclusive, regardless of which direction the metric moved. Define this before the window opens.
Block the bet from entering the sprint until the measurement design is complete. Partial measurement design is not a minor gap. It is the gap that makes the confirmation event impossible to run honestly.
Is the pre-ship validation plan proportional to the bet's classification?
P2 bets (exploratory, low-risk)
No formal validation required. A brief note in the bet document explaining why the risk is low is sufficient. The note must be present. 'We'll just ship it and see' is not a P2 classification rationale. It is measurement abandonment with a label.
P1 bets (significant, medium-risk)
A staged-environment test or prototype evaluation is required. The validation plan must be real: a defined method, a defined population, a defined observation period, and a conclusion structure. The validation record must be signed before the validation gate in CI/CD will pass.
P0 bets (strategic, high-risk)
A full pre-ship validation is required. This includes a defined validation method, a staged or shadow-mode evaluation where feasible, a structured conclusion, and sign-off from the validation engineer and the hypothesis owner. A P0 bet that cannot be validated before production exposure should not be classified as P0 unless engineering leadership has explicitly accepted the risk.
Block the bet. The classification system exists for this reason. If the hypothesis owner disagrees with the classification, escalate. Do not lower the classification to avoid the validation requirement. That is the gatekeeping reflex. Recognize it and name it.
Is the confirmation event on the calendar?
Not 'we will schedule it when the window closes.' On the calendar. Now. Before the sprint starts.
The confirmation event must have:
- A date (the window close date or within three business days after it)
- A named confirmation owner (the person who will present the evidence and produce the verdict)
- A list of participants (at minimum: hypothesis owner, confirmation owner, one representative from engineering, and the validation engineer)
Without a scheduled confirmation event, confirmation happens informally, late, or not at all. The measurement window closes. Someone glances at the dashboard. The metric moved. The team moves on. No verdict is produced. No Confirmation Record is written. The bet evaporates. A confirmation event on the calendar before the sprint starts is the single most reliable predictor of whether a bet will produce a real verdict.
Schedule it before the review meeting ends. It takes 30 seconds. If the confirmation owner has not been named, name them in the review meeting. If the window close date is uncertain, use the latest plausible date. It can be moved forward. Not having it on the calendar cannot be corrected retroactively.
Sign-Off
Start writing
your first bet
Copy the template as Markdown and paste it into your team's documentation tool. Fill it out before the next sprint begins.
Based on the framework in The Output Trap by JP LeBlanc
Free to use. No attribution required.