Enter your visitor and conversion numbers below to find out if your test result is statistically significant
Statistical significance is the checkpoint that separates a real conversion improvement from a result that happened by chance. Without it, you cannot tell whether variant B is genuinely better or whether you simply got lucky with which visitors happened to see it. Declaring a winner before reaching significance — a practice called peeking — is one of the most reliable ways to ship a losing variant and hurt your conversion rate.
This free significance calculator uses a two-sample proportion z-test to compute the confidence level between your control and variant. Enter visitor and conversion counts for each group and you will see instantly whether the difference is statistically significant at 95% confidence — the industry standard for A/B testing decisions.
The calculator computes a z-score from the difference between the two conversion rates, accounting for the natural variance in each group. It then converts that z-score to a probability using the cumulative normal distribution. A confidence level of 95% or above means: if the true effect were zero, you would only see a difference this large by chance 5% of the time.
z = (CR_B − CR_A) / √(CR_B(1−CR_B)/N_B + CR_A(1−CR_A)/N_A)
Confidence = Φ(z) × 100%
More visitors per group means a more precise estimate of the true conversion rate, which translates to higher confidence for the same observed difference. Small samples produce wide confidence intervals — a 15% conversion rate observed in 50 visitors could easily be anywhere from 5% to 25% in reality. Always use the sample size calculator before starting a test so you enter the significance calculator with a reliable dataset.
Larger observed differences between control and variant produce higher z-scores and reach significance faster. A 10 percentage-point gap (say, 10% vs. 20% CR) reaches significance much sooner than a 0.5 percentage-point gap. The significance calculator reflects this directly: with the same sample size, larger gaps produce higher confidence readings.
95% is the industry standard, meaning a 5% chance of a false positive — declaring a winner when there is none. For high-stakes decisions like checkout flow, pricing page, or major copy changes, consider requiring 99% confidence. For exploratory tests on low-traffic pages, 90% may be acceptable. Never lower your threshold retroactively once you have seen results — that is p-hacking.
Stopping as soon as significance is reached — before your pre-planned sample size is complete — is the most costly mistake. The confidence level fluctuates throughout a test, and early stopping inflates your false-positive rate from 5% to as high as 30%. Set your required sample size using the sample size calculator before launch, then run the full planned duration.
Do not run significance checks on sub-segments unless those segments were pre-specified in your test plan. Post-hoc slicing — dividing results by device, traffic source, or day of week after you see the numbers — is p-hacking and will produce false positives in proportion to the number of slices you try.
If your test result is not significant after reaching the planned sample, the most likely explanation is that the change had little or no real effect. Extending the test hoping for significance is a form of p-hacking. Consider redesigning with a bolder change instead.
Without statistical significance, you cannot tell whether your result is a real effect or random variation. Declaring a winner too early — before reaching significance — is called "peeking" and leads to false positives that can hurt your business when you ship the losing variant.
95% confidence means there is only a 5% probability that the observed difference between variants happened by chance. In other words, if you ran this test 100 times under the same conditions, the result would go the same way at least 95 times.
Be careful. A small sample can still produce a "significant" result by chance. Always use the sample size calculator before starting your test to ensure you collect enough data. A result that reaches significance with only 200 visitors is suspect; one with 5,000 visitors is far more reliable.
If you have already hit your target sample size and the result is not significant, the most likely explanation is that the change had little or no real effect. Extending the test hoping for significance is a form of p-hacking and leads to unreliable conclusions. Consider redesigning the test with a bolder change instead.
Mida is 10X faster than anything you have ever considered. Try it yourself.