A/B Test Calculator | Free Significance Tool

Enter your control and variant data to instantly check if your A/B test results are statistically significant at a 95% confidence level.

What is A/B Test Statistical Significance?

Statistical significance in A/B testing tells you whether the difference between two variations is real or just random noise. When you change a headline, button color, or pricing page layout, you need to know if the resulting change in conversions actually came from your modification or if it would have happened anyway due to normal traffic fluctuations. Use our conversion rate calculator to compute the exact rates for each variation before running this test.

Without checking for significance, you risk making decisions based on incomplete data. A variant might look like it is winning after 100 visitors, but that early lead often disappears once more data comes in. Statistical significance gives you a framework for knowing when you have collected enough evidence to trust the result.

How Statistical Significance is Calculated

This calculator uses a two-proportion Z-test, the standard method for comparing conversion rates between two independent groups. The formula compares the difference in conversion rates to the amount of variation you would expect from random chance alone.

Z = (p2 - p1) / sqrt(p * (1 - p) * (1/n1 + 1/n2))

Where p1 and p2 are the conversion rates, n1 and n2 are the sample sizes, and p is the pooled conversion rate: (c1 + c2) / (n1 + n2)

If the absolute Z-score exceeds 1.96, the result is statistically significant at the 95% confidence level. The higher the Z-score, the stronger the evidence that the difference is real and not due to chance.

What is 95% Confidence Level?

The 95% confidence level is the most widely used threshold in A/B testing and scientific research. It means that if there were truly no difference between your control and variant, you would only see a result this extreme 5% of the time or less. In other words, there is a 5% chance of a false positive (detecting a difference that does not actually exist).

Some teams use stricter thresholds like 99% confidence for high-stakes decisions (such as pricing changes) and more relaxed thresholds like 90% for lower-risk experiments. The right threshold depends on how costly a wrong decision would be. For most product and marketing tests, 95% provides a practical balance between speed and accuracy. As noted in Harvard Business Review's guide to A/B testing, the key is deciding on your confidence threshold before the test starts, not after seeing the results.

Common A/B Testing Mistakes

Even with a significance calculator, A/B tests can produce misleading results if you fall into common traps. Here are the most frequent mistakes to avoid:

Stopping too early: Checking results daily and stopping the moment you see significance leads to inflated false positive rates. Decide your sample size in advance and let the test run to completion.
Testing too many variations at once: Each additional variation increases the chance of a false positive. If you test 10 variants, one is likely to appear significant by pure chance. Stick to one or two variants per test.
Ignoring segment differences: A test might be significant overall but show opposite effects for mobile vs. desktop users. Always check if your results are consistent across key segments.
Not accounting for seasonality: Running a test during a holiday sale and comparing to normal traffic will skew your results. Make sure your test period represents typical user behavior.
Changing the test mid-run: Modifying copy, design, or targeting during a test invalidates the data you have already collected. If you need changes, start a new test from scratch.

How Long to Run an A/B Test

The duration of your A/B test depends on three factors: your baseline conversion rate, the minimum effect you want to detect, and your daily traffic volume. A site with 1,000 daily visitors testing a 5% baseline conversion rate needs roughly 2 to 4 weeks to detect a 20% relative uplift with 95% confidence.

As a rule of thumb, always run tests for at least one full business cycle (typically 7 days) to account for day-of-week effects. For e-commerce sites, two weeks is often the minimum. If your test has not reached significance after 4 to 6 weeks, the effect is likely too small to matter and you should move on to testing bigger changes.

Before launching a test, calculate the required sample size. If your traffic cannot deliver that sample in a reasonable timeframe, consider testing a bolder change that would produce a larger, more detectable effect. Collecting user feedback alongside your test data helps you understand the "why" behind the numbers. Use an ICE calculator to prioritize which tests to run next based on impact, confidence, and ease.

Frequently Asked Questions

How do you know if an A/B test is statistically significant?

An A/B test is statistically significant when the observed difference between your control and variant is unlikely to have occurred by random chance. The standard threshold is a p-value below 0.05 (or a Z-score above 1.96), which corresponds to 95% confidence. This calculator uses a two-proportion Z-test to determine significance automatically.

What is 95% confidence in A/B testing?

A 95% confidence level means there is only a 5% probability that the observed difference between your control and variant happened by chance. It does not mean the variant is 95% better. It means you can be 95% sure that a real difference exists. This is the industry standard threshold for declaring a winner in an A/B test.

How many visitors do I need for an A/B test?

The required sample size depends on your baseline conversion rate and the minimum detectable effect you want to measure. As a general guideline, you typically need at least 1,000 visitors per variation for conversion rates around 5-10%. Smaller effects or lower baseline rates require larger samples. Running a test too early leads to unreliable results.

What is a good sample size for A/B testing?

A good sample size ensures your test has enough statistical power (typically 80%) to detect meaningful differences. For most websites with a 3-5% conversion rate, plan for 5,000 to 25,000 visitors per variation to detect a 10-20% relative uplift. Higher traffic sites can detect smaller effects; lower traffic sites should focus on testing larger changes.

feeqd

Get started with Feeqd for free

Go beyond A/B results with qualitative feedback

Numbers show what happened. Feedback shows why. Collect both with feeqd's embeddable feedback widget.

No credit card requiredFree plan availableCancel anytime

A/B Test Significance Calculator