P-hacking, also known as data dredging, is a method in which data is manipulated or selection criteria are modified until a desired statistical result, typically a statistically significant result, is achieved. It involves testing numerous hypotheses on a particular dataset until the data appears to support one.
P-hacking, also known as data dredging, is a method in which data is manipulated or selection criteria are modified until a desired statistical result, typically a statistically significant result, is achieved. It involves testing numerous hypotheses on a particular dataset until the data appears to support one. This practice can lead to misleading findings or exaggerated statistical significance.
In an A/B testing workflow, P-hacking is part of the statistical layer that helps explain whether a result is trustworthy. It is most useful when paired with a clear hypothesis, a primary metric, enough traffic, and a pre-defined decision rule.
P-hacking matters because it helps teams separate real experiment signals from random noise. It should be interpreted alongside sample size, test duration, traffic quality, and the business value of the metric being measured.
For example, a team testing a new pricing-page headline may see a higher sign-up rate in the variant. P-hacking helps the team judge whether that lift is strong enough to trust or whether they should keep collecting data before making a decision.
Use P-hacking after you have chosen a primary metric and collected enough traffic for a reliable read. Avoid checking it in isolation; compare it with effect size, confidence, practical impact, and whether the test ran long enough to cover normal traffic patterns.
A common mistake is treating P-hacking as a yes-or-no shortcut while ignoring sample size, test duration, and practical business impact. A statistically interesting result can still be too small, too noisy, or too risky to ship.
P-hacking, also known as data dredging, is a method in which data is manipulated or selection criteria are modified until a desired statistical result, typically a statistically significant result, is achieved. It involves testing numerous hypotheses on a particular dataset until the data appears to support one.
P-hacking matters because it helps teams separate real experiment signals from random noise. It should be interpreted alongside sample size, test duration, traffic quality, and the business value of the metric being measured.
Use P-hacking after you have chosen a primary metric and collected enough traffic for a reliable read. Avoid checking it in isolation; compare it with effect size, confidence, practical impact, and whether the test ran long enough to cover normal traffic patterns.
This comprehensive checklist covers all critical pages, from homepage to checkout, giving you actionable steps to boost sales and revenue.