P-value

Quick answer

A p-value in A/B testing is the probability of observing a result at least as extreme as the one measured, assuming there is no real difference between the control and variant (the null hypothesis). A p-value below 0.05 means there is less than a 5% chance the observed difference occurred by random chance alone, which is the conventional threshold for calling a result statistically significant.

Key takeaways

P-value helps evaluate whether an experiment result is reliable enough to act on.
It should be reviewed together with sample size, duration, effect size, and business impact.
It is most useful when the hypothesis and primary metric are defined before the test starts.

Definition

What P-value means in A/B testing

In an A/B testing workflow, p-value is part of the statistical layer that determines whether an observed conversion rate difference is a real signal or noise. It is calculated from the effect size, sample size, and variance of the metric being measured. A lower p-value means the data is less consistent with the null hypothesis — the assumption that control and variant perform identically. P-value is most reliable when the primary metric and significance threshold are locked in before the test starts, so the decision rule is not distorted by peeking at early results.

Why P-value matters

P-value matters because it gives teams a principled way to separate a real improvement from random fluctuation. Without it, a team might see a variant converting at 4.8% against a control at 4.2% and conclude the variant wins — even if that gap could easily arise by chance with a small sample. It should be interpreted alongside effect size, confidence interval, sample size, and test duration to get the full picture of whether the result is trustworthy.

Example of P-value

For example, a team runs an A/B test on a pricing-page headline for two weeks. The control converts at 3.2% and the variant at 3.8% — a 0.6 percentage point lift. The calculated p-value is 0.03. Because 0.03 falls below the 0.05 threshold, the team rejects the null hypothesis and concludes the lift is statistically significant, meaning there is only a 3% probability this difference occurred by chance alone.

How to use P-value

Set your significance threshold — typically p < 0.05 — before the test starts, not after seeing the data. Calculate the required sample size upfront using a sample size calculator, then wait for the full planned duration before evaluating results. When reading the final p-value, pair it with effect size and practical business impact to decide whether the lift is worth shipping, not just statistically notable.

Common mistake

The two most common mistakes are peeking and conflating statistical with practical significance. Peeking — stopping a test the moment p-value drops below 0.05 — inflates the false positive rate and can lead to shipping a change that has no real effect. The second mistake is treating a very low p-value as a reason to ship regardless of effect size: a p-value of 0.001 on a 0.1% lift may be statistically significant but not meaningful for the business. Always evaluate both.

P-value

Quick answer

Key takeaways

Definition

What P-value means in A/B testing

Why P-value matters

Example of P-value

How to use P-value

Common mistake

Related A/B testing terms

FAQ

What does p-value mean in A/B testing?

Why does p-value matter for experiments?

How should teams use p-value in an experiment?

Try the A/B Testing Tool That's Free Up to 100K Monthly Tested Users