A transition you can start without a reorganization, a new tool, or a culture initiative.
The first 90 days do not require you to change your planning meetings, rewrite job descriptions, or buy anything. They require you to measure what is already happening and run a small number of structured experiments on top of it.
This is not a culture change program. It is not a reorganization plan. It is a staged sequence: establish a baseline, run two complete bet cycles, and add one structural gate to CI/CD. Those three things are enough to evaluate whether the approach is working.
Complete the Output Trap Diagnostic before starting. You need a baseline score to evaluate whether the 90 days worked.
The Audit
Days 1-30Know your current confirmation rate before you change anything.
The most common mistake in this transition is starting with process change before establishing a baseline. If you do not know where you started, you will not know whether the transition worked.
Week 1: Pull the deployment log
List every production deployment from the past quarter. For each one, record whether there was a written outcome hypothesis before the sprint started, whether there was a defined measurement window, whether there was a scheduled confirmation event, and whether that confirmation event happened.
Do not reconstruct these from memory. Use your sprint tracking system, your CI/CD logs, and your product documents. If the information does not exist, record it as absent.
Week 2: Score the confirmation rate
Count the deployments that had all four: a written hypothesis, a defined window, a scheduled confirmation event, and evidence that the event happened. Divide by total deployments. That is your current confirmation rate.
Most organizations doing this for the first time land between 5% and 15%. If yours is above 25%, either your process is more mature than average or the data is incomplete.
Week 3: Identify the two best candidate bets
From your current or upcoming roadmap, find two items that meet all of the following criteria. These are your Phase 2 pilot bets. The criteria are strict by design. The goal is a clean result, not a representative sample.
Week 4: Brief the team
Before Phase 2 starts, the team needs to understand two things. First, why you are doing this: implementation velocity is increasing and confirmation capacity is not keeping pace. Second, what is different: you are not adding process on top of existing process. You are replacing two artifacts (the user story and the implicit success definition) with one artifact (the Outcome Bet) that does both jobs with more precision.
Do not launch Phase 2 without this conversation.
First Bets
Days 31-60Complete two outcome bets from hypothesis to confirmation event.
Day 31-35: Write the bets
Use the Outcome Bet Template for both candidates. Write the hypothesis, measurement, window, and instrumentation requirements before the sprint starts. The goal is not a perfect artifact. The goal is a complete one. All six sign-off criteria must be met before the bet enters development.
Expect this to take longer than writing a user story. The first bet typically takes two to three hours: a conversation to surface the causal hypothesis, a conversation with data engineering to confirm the metric is measurable, and getting a confirmation event on the calendar. That is not overhead. That is the work that was previously being skipped.
Day 31-60: Run the bets in parallel with normal sprints
Do not change how the rest of the team is working. These two bets run inside your normal sprint structure. The only differences are the artifact at the start (Outcome Bet instead of user story), the instrumentation requirement before ship, and the confirmation event on the calendar.
Day 50-60: Run both confirmation events
The confirmation event is a meeting, not a dashboard review. The confirmation owner presents the measurement results against the criteria defined in the bet. The team produces a verdict: confirmed, denied, or inconclusive. That verdict goes into the Confirmation Record.
Plan for 45 to 60 minutes per confirmation event. The first one will surface at least one data quality problem and at least one disagreement about what the evidence means. Both are productive.
A denied result is not a failure. The confirmation loop closed. You learned something. The output-oriented alternative would have shipped the feature, watched the metric not move, and attributed it to something else.
The most common Phase 2 finding is that the metric you planned to measure is not instrumented, or not instrumented correctly. This is not a Phase 2 failure. It is exactly what the process was designed to surface. Record the gap and fix it before Phase 3.
The most common form of resistance is 'we already know this is the right thing to build, the hypothesis is obvious.' The hypothesis being obvious does not mean it is falsifiable. Ask them to write the success threshold. If they cannot write a specific number, the hypothesis is not as obvious as they think.
First Gates
Days 61-90Make the first confirmation requirement structural by adding the instrumentation gate to CI/CD.
Day 61-70: Build the instrumentation gate
This is a platform engineering task. The instrumentation gate answers one question: are the required telemetry events present in the codebase before deployment proceeds?
Start with the simplest possible implementation: a JSON file in the repo listing required event names, and a CI/CD step that checks for those names in the codebase. It does not need to verify that the events fire correctly. It needs to verify they exist. Correctness is a test. Existence is the gate.
Day 71-80: Register the gate for all new bets
From day 71 forward, every new bet entering development must pass the instrumentation gate before it can deploy. Existing in-flight work is exempt. New bets are not.
This is the policy conversation that typically surfaces the most resistance. The argument against the gate is that it will slow down deployment. The correct response: deployment of unobservable code was never fast. It was just unaccountable.
Day 80-90: Run the diagnostic again
Score the second round of the Output Trap Diagnostic. Compare to the Phase 1 baseline. The specific metrics to compare: Question 5 (when the measurement plan was written), Question 9 (automated instrumentation check), Question 10 (feature shipping without instrumentation), and confirmation rate from the deployment audit.
This happens when the gate is implemented as a warning rather than a blocker, or when teams are given a bypass mechanism. A gate with a bypass is not a gate. It is a reminder. Reminders do not change confirmation rates.
If the gate is meeting significant resistance, the problem is usually upstream: the bet-writing process is too burdensome for the team's current capacity. The fix is not to make the gate optional. It is to reduce the friction in writing bets, typically by training one person per team to draft bets quickly and by creating a lightweight template for P2 bets.
The 90-day playbook establishes three things: a confirmation rate baseline, two complete bet cycles, and one structural gate. That is enough to evaluate whether the approach is working and to make the case for extending it.
The next layer is the validation gate (requiring pre-ship evidence for P0 and P1 bets) and the confirmation integration gate (linking deployments to bets in the confirmation system at deploy time). Both are documented in the Three Gates Implementation Guide.
The organizational changes follow from the structural changes, not the other way around. Do not restructure roles before the process has demonstrated value.
Start writing
your first bet
Copy the template as Markdown and paste it into your team's documentation tool. Fill it out before the next sprint begins.
Based on the framework in The Output Trap by JP LeBlanc
Free to use. No attribution required.