The Easy Guide to A/B Testing Duration: Master Statistical Power and Sample Size Calculations in 5 Minutes

Sep 17, 2025

Running your first A/B test? The biggest question you'll face isn't what to test. It's how long to run it. Get this wrong, and you'll either waste weeks on inconclusive results or make decisions based on bad data.

Here's everything you need to know about test duration in one quick read.

Why Test Duration Actually Matters

Think of A/B testing like flipping coins to see if one coin is "luckier" than another. Flip each coin 10 times, and random chance might make one look better. Flip each coin 1,000 times, and you'll see their statistical probability emerge.

A/B tests work the same way. Run a test for one day, and random variation might make your "winning" version look 20% better. Run it for the right amount of time, and you'll know if that improvement is real or just noise.

The cost of getting this wrong: Companies waste millions implementing "winning" tests that were actually just lucky flukes.

The Four Numbers That Control Every Test

Every test duration calculation comes down to four connected numbers:

Baseline Conversion Rate: How your current page performs Minimum Effect Size: The smallest improvement you want to detect

Confidence Level: How sure you want to be (usually 95%) Sample Size: How many visitors you need

Change any one of these, and your test duration changes dramatically.

Using Optimizely's Duration Calculator (Step by Step)

Most testing platforms include duration calculators. Here's how to use Optimizely's tool without making rookie mistakes:

Part 1: Visitors Needed

Baseline Conversion Rate

Enter your current conversion rate for the specific page you're testing
Don't use: Your overall site conversion rate
Do use: The actual page's conversion rate from the last month (The metric that you are trying to improve)

Example: If testing a checkout page that converts at 3%, enter 3% (not your site's overall 1.5%)

Minimum Detectable Effect

This is the smallest improvement you care about
10-15%: Good starting point for most tests
5%: Requires way more traffic but catches smaller wins
20%+: Only for major redesigns

Translation: 10% effect on a 2% baseline means detecting 2.0% vs 2.2% conversion

Significance Level

Stick with 95% confidence (5% significance) for now
This means you'll be wrong 1 out of 20 times
Higher confidence = longer tests

Part 2: Time Needed

Visitors per Week

Critical: This determines your actual timeline
Look at your analytics for weekly traffic to the test page
Only count qualified visitors (the actual audience for your test)

Traffic Split

50/50 is standard (half see the original, half see the variation).
Conservative teams sometimes use 90/10 (most traffic stays on original)
50/50 gets results faster

The Result: Total weeks needed

If the calculator shows 52+ weeks, your test parameters need adjustment.

Reading Calculator Results Like a Pro

When you see "30,000 visitors needed per variation":

Each version of your test needs 30,000 visitors
Total test needs 60,000 visitors (for 50/50 split)
This gives you 80% chance to detect your chosen effect size

Red flag warning: If your timeline shows 6+ months, don't lower your statistical standards. Instead:

Test pages with higher conversion rates
Accept detecting larger improvements only
Expand your test to more traffic sources

Common Calculator Mistakes (And How to Avoid Them)

Mistake #1: Using Total Site Traffic

Wrong: "My site gets 100,000 visitors per week" Right: "If you are testing on your check out page, your checkout page gets 5,000 qualified visitors per week and you are using 5,000 qualified visitors, use 5,000"

Only count visitors who will actually see your test.

Mistake #2: Picking Random Effect Sizes

Wrong: "Let's detect 5% improvements because smaller is better" Right: "A 15% improvement would increase monthly revenue by $10,000, so that's our target"

Base effect size on business impact, not statistical preferences.

Mistake #3: Stopping Tests Early for "Good" Results

Wrong: "We're at 95% confidence after one week—let's call it!" Right: "We planned for 30,000 visitors per variation, so we wait until we hit that number"

Early stopping inflates your false positive rate dramatically.

Mistake #4: Ignoring Traffic Quality

Wrong: Assuming all traffic is the same Right: Accounting for different conversion rates by traffic source, device, time of day

Your calculator is only as good as your traffic estimates.

Alternative Tools Worth Knowing

Evan Miller's Calculator: More technical but extremely accurate

Best for: Understanding the math behind duration calculations
Downside: Steeper learning curve

VWO Sample Size Calculator: Clean and beginner-friendly

Best for: Quick estimates and learning
Downside: Limited to basic scenarios

CXL Calculator: Simple but effective

Best for: Quick understanding of traffic sizes needed based on MDE
Downside: Less customization

Pick one tool and stick with it. Consistency matters more than finding the "perfect" calculator.

Real Traffic vs. Predicted Traffic

Here's where most beginners get burned: traffic predictions.

Your calculator says 4 weeks. Reality it took 8 weeks. Why?

Common traffic estimation errors:

Seasonal drops: Holiday shopping, summer vacations, business cycles
Traffic filtering: Ad blockers, bot traffic, returning vs new visitors
Targeting restrictions: Geographic, demographic, or behavioral filters
Weekend effects: B2B sites often see 50% traffic drops on weekends

Buffer rule: Add 25% to your timeline estimate. If the calculator says 4 weeks, plan for 5.

Quick Decision Framework for Test Duration

Before You Start:

Define "meaningful": What improvement justifies the implementation effort?
Check your traffic: Do you have enough qualified visitors per week?
Set expectations: Communicate realistic timelines upfront

During Your Test:

Week 1: Verify traffic matches predictions
Week 2: Adjust timeline if traffic is significantly off
Weekly: Monitor for external factors (campaigns, outages, holidays)

If Your Test Will Take Forever:

Option 1: Test higher-traffic pages
Option 2: Accept detecting only large improvements
Option 3: Focus on metrics closer to user behavior (clicks vs purchases)
Option 4: Expand test to more audiences

Don't: Lower confidence levels or stop early to get faster results.

The 5-Minute Action Plan

Right now, go do this:

Pick a page you want to test on your site
Find its conversion rate in your analytics (last 30 days)
Estimate weekly traffic to that specific page
Open CXL's calculator (or your preferred tool)
Run the numbers with 10% effect size, 95% confidence

If the timeline is reasonable (under 8 weeks), you're ready to test. If it's too long, try the same calculation on a higher-traffic page.

Do this exercise with 3-5 potential test pages. You'll quickly learn which pages are worth testing and which need more traffic before testing makes sense.

Beyond the Basics: Three Pro Tips

Pro Tip #1: Start with High-Impact Pages

Don't test your lowest-converting pages first. Start with pages that have:

Decent traffic (1,000+ weekly visitors)
Reasonable conversion rates (2%+ for ecommerce)
Clear improvement hypotheses

Pro Tip #2: Document Everything

Track your predictions vs reality:

Estimated timeline vs actual timeline
Predicted traffic vs actual traffic
Traffic quality issues you didn't anticipate

After 3-5 tests, you'll get much better at estimating.

Pro Tip #3: Plan Test Sequences

Don't think one test at a time. Plan 3-month testing roadmaps:

Month 1: High-traffic, high-impact pages
Month 2: Medium-traffic optimization
Month 3: Detailed refinements

This keeps you testing consistently instead of scrambling between individual tests.

Your Next Steps

This week: Run duration calculations for three potential tests. Don't actually launch them—just practice with the calculator until the inputs make intuitive sense.

Next week: Launch your first properly-planned test using these guidelines.

This month: Track your prediction accuracy and refine your traffic estimation process.

Master-level insight: The goal isn't perfect duration predictions. It's building systematic testing habits that generate reliable insights faster than competitors.

Start with the basics, stay consistent with your methodology, and your testing program will compound results over time. Most companies never get this foundation right, which is exactly why mastering it gives you such an advantage.

Duration calculations aren't complicated math. They're strategic decision-making tools. Use them well, and you'll never waste time on inconclusive tests again.

Experimentation Career

Discussion about this post

Ready for more?