A verdict is not a narrative. It is a classification applied to evidence against pre-stated criteria.
Most teams produce post-mortems. Some produce retrospectives. Almost none produce a formal verdict on whether the thing they shipped did the thing they said it would do.
The Confirmation Record is that verdict. It is filled out at the confirmation event - the calendar date that was set before development began. The confirmation owner reads the results, applies the success threshold from the original bet, and records the outcome.
A confirmation record that cannot be evaluated against the original bet criteria is not a confirmation record. It is a post-ship narrative.
Write it at the confirmation event. Use the criteria you set before development started. The threshold was written before the window opened. Apply it.
Bet Reference
Link to the original Outcome Bet document. This record is only evaluable in relation to the bet it closes.
The person who wrote the original bet. They should acknowledge this record.
The person named in the original bet who is accountable for convening the evaluation and producing this record.
P0, P1, or P2 as recorded in the original bet. The classification determines what validation evidence is required before a verdict can be filed.
Deployment Record
The specific change, feature, or intervention as described in the original bet.
The date the feature reached the target population. If rolled out in phases, the date of full rollout.
Full / Canary / Percentage. If phased, describe the rollout sequence and the date each phase reached its target population.
Yes / No / Waived. If waived, record the reason and who authorized the waiver. An uninstrumented feature cannot produce a clean confirmation record.
Link to the pre-ship validation record. Required for P0 and P1 bets. If absent, the confirmation record is incomplete.
Measurement Window
The date measurement began. Should match the ship date or full-rollout date recorded above.
The date measurement ended. This date was set before the window opened. It was not moved because the results were inconvenient.
The number of users, sessions, or events in the measurement period. This number must meet or exceed the minimum exposure threshold before a verdict is valid.
From the original bet. If total exposure did not reach this number, the result is inconclusive regardless of direction.
Yes or No. If No: the result is inconclusive. Do not override this by reclassifying inconclusive as denied or confirmed. They are not the same.
Result
The recorded baseline from the original bet. If no baseline was recorded before the window opened, note that here and explain the implication for confidence.
The measured value of the primary metric at the end of the window.
The delta between baseline and close. State it as an absolute number and as a percentage change.
The threshold recorded in the original bet before the window opened. This is the number the result is evaluated against. It is not adjusted retroactively.
The difference between treatment and holdout groups. This is the number that matters for causal claims. Metric movement without a holdout is directional, not causal.
Did the secondary signal defined in the bet move in the expected direction? A secondary signal that moves opposite to the primary result is worth recording explicitly.
Did any counter-metrics move in an unexpected direction? Record specifically. A confirmed primary metric alongside a degraded counter-metric is not a clean confirmation.
Verdict
Confirmed, Denied, or Inconclusive. Apply the definitions below to the evidence above.
Two to four sentences. What does the evidence say? If denied: what does the denial tell us about the causal mechanism? If inconclusive: what would need to be different for the question to be answerable?
The primary metric met the success threshold, the minimum exposure was reached, and the holdout delta is directionally consistent.
The primary metric did not meet the success threshold, the minimum exposure was reached, and the result is distinguishable from noise.
The minimum exposure was not reached, the measurement infrastructure produced unreliable data, or the result is not distinguishable from baseline variation. Inconclusive is not confirmed and it is not denied. It means the question was not answered.
Interpretation
Not what to build next. What did this result tell us about whether the causal mechanism we believed in was correct? A confirmed result is not necessarily evidence the mechanism was right. A denied result is not necessarily evidence the feature failed.
High, Medium, or Low.
What reduces confidence? Measurement gaps, short window, confounding events, instrumentation issues introduced mid-window? State the factors explicitly.
Next Bet
Double down / Expand / Retire / Investigate further / Reverse. Choose one. If multiple actions seem warranted, the interpretation section needs more work.
Why this action? Connect it to the verdict and interpretation above.
The starting point for the next Outcome Bet document. It does not need to be fully formed. Write the bet in the same structure: if we build X, we believe it will cause Y, because Z.
Confirmation Event
The date the team convened to review results. This date was on the calendar before development began.
Names and roles of everyone present at the confirmation event. The confirmation owner is responsible for convening this group.
How long the review took. A confirmation event that takes three minutes produced a three-minute verdict on work that took weeks. Record it.
If participants disagreed on the verdict or interpretation, record the dissent here. Disagreement is information. It often indicates a measurement design problem, a hypothesis precision problem, or a genuine difference in how the team reads causal evidence.
Sign-Off
This record is linked to the deployment event in the confirmation system. It is also filed with the Outcome Bet document it closes. For P0 bets and regulated contexts, this record is the audit artifact linking original intent, deployment, validation evidence, and outcome classification. Store it where it survives team turnover and is accessible to compliance review.
The verdict is one of the three classifications, not a narrative approximation
The verdict rationale explains the evidence, not the team's feelings about the evidence
The next bet section exists, even if brief
The confirmation event participants and date are recorded
The record is linked to the original bet and to the deployment event
The result section contains phrases like 'seemed to work,' 'users responded positively,' or 'the metric moved in the right direction'
The verdict is 'confirmed' but the minimum exposure threshold was not met
There is no interpretation of what the result implies about the underlying theory
The next bet section is empty
A confirmation record that cannot be evaluated against the original bet criteria is not a confirmation record. It is a post-ship narrative. Those are useful. They are not the same thing.
Start writing
your first bet
Copy the template as Markdown and paste it into your team's documentation tool. Fill it out before the next sprint begins.
Based on the framework in The Output Trap by JP LeBlanc
Free to use. No attribution required.