The Easy Guide to A/B Testing Duration: Master Statistical Power and Sample Size Calculations in 5 Minutes
Running your first A/B test? The biggest question you'll face isn't what to test. It's how long to run it. Get this wrong, and you'll either waste weeks on inconclusive results or make decisions based on bad data.
Here's everything you need to know about test duration in one quick read.
Why Test Duration Actually Matters
Think of A/B testing like flipping coins to see if one coin is "luckier" than another. Flip each coin 10 times, and random chance might make one look better. Flip each coin 1,000 times, and you'll see their statistical probability emerge.
A/B tests work the same way. Run a test for one day, and random variation might make your "winning" version look 20% better. Run it for the right amount of time, and you'll know if that improvement is real or just noise.
The cost of getting this wrong: Companies waste millions implementing "winning" tests that were actually just lucky flukes.
The Four Numbers That Control Every Test
Every test duration calculation comes down to four connected numbers:
Baseline Conversion Rate: How your current page performs Minimum Effect Size: The smallest improvement you want to detect
Confidence Level: How sure you want to be (usually 95%) Sample Size: How many visitors you need
Change any one of these, and your test duration changes dramatically.
Using Optimizely's Duration Calculator (Step by Step)
Most testing platforms include duration calculators. Here's how to use Optimizely's tool without making rookie mistakes:
Part 1: Visitors Needed
Baseline Conversion Rate
Enter your current conversion rate for the specific page you're testing
Don't use: Your overall site conversion rate
Do use: The actual page's conversion rate from the last month (The metric that you are trying to improve)
Example: If testing a checkout page that converts at 3%, enter 3% (not your site's overall 1.5%)
Minimum Detectable Effect
This is the smallest improvement you care about
10-15%: Good starting point for most tests
5%: Requires way more traffic but catches smaller wins
20%+: Only for major redesigns
Translation: 10% effect on a 2% baseline means detecting 2.0% vs 2.2% conversion
Significance Level
Stick with 95% confidence (5% significance) for now
This means you'll be wrong 1 out of 20 times
Higher confidence = longer tests
Part 2: Time Needed
Visitors per Week
Critical: This determines your actual timeline
Look at your analytics for weekly traffic to the test page
Only count qualified visitors (the actual audience for your test)
Traffic Split
50/50 is standard (half see the original, half see the variation).
Conservative teams sometimes use 90/10 (most traffic stays on original)
50/50 gets results faster
The Result: Total weeks needed
If the calculator shows 52+ weeks, your test parameters need adjustment.
Reading Calculator Results Like a Pro
When you see "30,000 visitors needed per variation":
Each version of your test needs 30,000 visitors
Total test needs 60,000 visitors (for 50/50 split)
This gives you 80% chance to detect your chosen effect size
Red flag warning: If your timeline shows 6+ months, don't lower your statistical standards. Instead:
Test pages with higher conversion rates
Accept detecting larger improvements only
Expand your test to more traffic sources
Common Calculator Mistakes (And How to Avoid Them)
Mistake #1: Using Total Site Traffic
Wrong: "My site gets 100,000 visitors per week" Right: "If you are testing on your check out page, your checkout page gets 5,000 qualified visitors per week and you are using 5,000 qualified visitors, use 5,000"
Only count visitors who will actually see your test.
Mistake #2: Picking Random Effect Sizes
Wrong: "Let's detect 5% improvements because smaller is better" Right: "A 15% improvement would increase monthly revenue by $10,000, so that's our target"
Base effect size on business impact, not statistical preferences.
Mistake #3: Stopping Tests Early for "Good" Results
Wrong: "We're at 95% confidence after one week—let's call it!" Right: "We planned for 30,000 visitors per variation, so we wait until we hit that number"
Early stopping inflates your false positive rate dramatically.
Mistake #4: Ignoring Traffic Quality
Wrong: Assuming all traffic is the same Right: Accounting for different conversion rates by traffic source, device, time of day
Your calculator is only as good as your traffic estimates.
Alternative Tools Worth Knowing
Evan Miller's Calculator: More technical but extremely accurate
Best for: Understanding the math behind duration calculations
Downside: Steeper learning curve
VWO Sample Size Calculator: Clean and beginner-friendly
Best for: Quick estimates and learning
Downside: Limited to basic scenarios
CXL Calculator: Simple but effective
Best for: Quick understanding of traffic sizes needed based on MDE
Downside: Less customization
Pick one tool and stick with it. Consistency matters more than finding the "perfect" calculator.
Real Traffic vs. Predicted Traffic
Here's where most beginners get burned: traffic predictions.
Your calculator says 4 weeks. Reality it took 8 weeks. Why?
Common traffic estimation errors:
Seasonal drops: Holiday shopping, summer vacations, business cycles
Traffic filtering: Ad blockers, bot traffic, returning vs new visitors
Targeting restrictions: Geographic, demographic, or behavioral filters
Weekend effects: B2B sites often see 50% traffic drops on weekends
Buffer rule: Add 25% to your timeline estimate. If the calculator says 4 weeks, plan for 5.
Quick Decision Framework for Test Duration
Before You Start:
Define "meaningful": What improvement justifies the implementation effort?
Check your traffic: Do you have enough qualified visitors per week?
Set expectations: Communicate realistic timelines upfront
During Your Test:
Week 1: Verify traffic matches predictions
Week 2: Adjust timeline if traffic is significantly off
Weekly: Monitor for external factors (campaigns, outages, holidays)
If Your Test Will Take Forever:
Option 1: Test higher-traffic pages
Option 2: Accept detecting only large improvements
Option 3: Focus on metrics closer to user behavior (clicks vs purchases)
Option 4: Expand test to more audiences
Don't: Lower confidence levels or stop early to get faster results.
The 5-Minute Action Plan
Right now, go do this:
Pick a page you want to test on your site
Find its conversion rate in your analytics (last 30 days)
Estimate weekly traffic to that specific page
Open CXL's calculator (or your preferred tool)
Run the numbers with 10% effect size, 95% confidence
If the timeline is reasonable (under 8 weeks), you're ready to test. If it's too long, try the same calculation on a higher-traffic page.
Do this exercise with 3-5 potential test pages. You'll quickly learn which pages are worth testing and which need more traffic before testing makes sense.
Beyond the Basics: Three Pro Tips
Pro Tip #1: Start with High-Impact Pages
Don't test your lowest-converting pages first. Start with pages that have:
Decent traffic (1,000+ weekly visitors)
Reasonable conversion rates (2%+ for ecommerce)
Clear improvement hypotheses
Pro Tip #2: Document Everything
Track your predictions vs reality:
Estimated timeline vs actual timeline
Predicted traffic vs actual traffic
Traffic quality issues you didn't anticipate
After 3-5 tests, you'll get much better at estimating.
Pro Tip #3: Plan Test Sequences
Don't think one test at a time. Plan 3-month testing roadmaps:
Month 1: High-traffic, high-impact pages
Month 2: Medium-traffic optimization
Month 3: Detailed refinements
This keeps you testing consistently instead of scrambling between individual tests.
Your Next Steps
This week: Run duration calculations for three potential tests. Don't actually launch them—just practice with the calculator until the inputs make intuitive sense.
Next week: Launch your first properly-planned test using these guidelines.
This month: Track your prediction accuracy and refine your traffic estimation process.
Master-level insight: The goal isn't perfect duration predictions. It's building systematic testing habits that generate reliable insights faster than competitors.
Start with the basics, stay consistent with your methodology, and your testing program will compound results over time. Most companies never get this foundation right, which is exactly why mastering it gives you such an advantage.
Duration calculations aren't complicated math. They're strategic decision-making tools. Use them well, and you'll never waste time on inconclusive tests again.