Hypothesis
Write the single-sentence hypothesis: 'If we change [X], then [primary metric] will [improve/decline] by [magnitude] because [causal reasoning].' The hypothesis must be falsifiable. If you can't write a scenario that would prove it wrong, rewrite the hypothesis.
Variants
Control on the left, one or two variants on the right. For each variant: describe the change in one sentence. Don't test three things at once in one variant — one change per variant, or you can't attribute the result. Write the expected direction of effect for each variant.
Primary metric and guardrails
Write the one primary metric that decides the winner. Then write two to three guardrail metrics — metrics that, if they move negatively beyond a threshold, kill the test regardless of the primary result. The guardrails prevent optimization that looks good on one metric while quietly destroying another.
Sample size and duration
Write: minimum detectable effect (the smallest improvement worth shipping), statistical power (typically 80%), confidence level (typically 95%), and estimated daily traffic to the experiment surface. From these: calculate the required sample size and divide by daily traffic to get the minimum test duration.
Decision criteria
Three decisions: (1) Ship the variant: primary metric improved by X% at 95% confidence, guardrails safe. (2) Revert: primary metric declined or guardrail breached. (3) No conclusion: run longer or increase sample. Write the specific thresholds for each decision now — don't make them up after looking at the data.