← Back to Blog

A/B Testing 101: The Complete Beginner's Guide

CRO Audits Team

A/B testing is the gold standard for making data-driven decisions. Instead of guessing which design or copy works better, you let real users tell you—with statistical confidence.

This guide covers everything you need to start running valid, meaningful A/B tests.

What Is A/B Testing?

A/B testing (also called split testing) is an experiment where you show two versions of something to different users and measure which performs better.

Version A (Control): Your current design Version B (Variation): Your proposed change

Traffic is split randomly—typically 50/50. After enough data accumulates, you analyze which version achieved better results with statistical significance.

Why A/B Testing Matters

It Removes Guesswork

Without testing, decisions come from:

  • Opinions (“I think blue buttons work better”)
  • Copying competitors (“Amazon does it this way”)
  • Best practices (“Experts say shorter forms convert”)

These might be wrong for your specific audience. Testing reveals what actually works for your users.

It Quantifies Impact

A test doesn’t just tell you “B is better”—it tells you how much better and with what confidence. “Version B improved conversion rate by 15% with 95% confidence” is actionable business intelligence.

It Reduces Risk

Major redesigns are risky. Testing lets you validate changes incrementally before full commitment. If a change hurts performance, you’ve only affected half your traffic temporarily.

It Builds Organizational Knowledge

Each test teaches you something about your users. Over time, you develop deep understanding of what drives their decisions.

What Can You Test?

Almost anything users see or interact with:

Headlines and Copy

  • Value propositions
  • Product descriptions
  • CTA button text
  • Error messages
  • Form labels

Visual Design

  • Button colors and sizes
  • Image choices
  • Layout arrangements
  • Whitespace and spacing
  • Typography

Page Structure

  • Element ordering
  • Number of form fields
  • Checkout flow steps
  • Navigation options
  • Content length

Pricing and Offers

  • Price points
  • Discount presentation
  • Shipping thresholds
  • Bundle structures

Functionality

  • Search algorithms
  • Recommendation logic
  • Form validation timing
  • Popup behavior

The A/B Testing Process

Step 1: Form a Hypothesis

Don’t test randomly. Start with a hypothesis based on research:

Format: “We believe [change] will cause [effect] because [reasoning].”

Example: “We believe adding customer review ratings to product cards will increase click-through rate by 10-15% because session recordings show users looking for social proof before clicking.”

A good hypothesis is:

  • Specific (clear what you’re changing)
  • Measurable (defined success metric)
  • Based on evidence (research, not guessing)

Step 2: Calculate Sample Size

Before testing, determine how many users you need for statistically valid results.

You need:

  • Baseline conversion rate (your current rate)
  • Minimum detectable effect (smallest improvement worth detecting)
  • Statistical significance level (typically 95%)
  • Statistical power (typically 80%)

Example calculation:

  • Baseline conversion: 3%
  • Minimum detectable effect: 10% relative (0.3% absolute)
  • Significance: 95%
  • Power: 80%

Using a sample size calculator: approximately 35,000 visitors per variation needed.

Free calculators:

  • Evan Miller’s A/B Test Calculator
  • Optimizely Sample Size Calculator
  • VWO Sample Size Calculator

Step 3: Set Up the Test

Using your A/B testing tool:

  1. Create variations: Build your control and variation(s)
  2. Define audience: Who sees the test (all visitors, segment, etc.)
  3. Set traffic allocation: Usually 50/50 for two variations
  4. Configure goals: What event defines success
  5. QA thoroughly: Test both versions work correctly

Step 4: Run the Test

Duration guidelines:

  • Run for at least one full business cycle (usually 1-2 weeks minimum)
  • Include weekends if your traffic varies by day
  • Don’t stop early just because results look good

During the test:

  • Monitor for technical issues
  • Resist the urge to peek at results constantly
  • Don’t make other changes to tested pages

Step 5: Analyze Results

When your predetermined sample size or duration is reached:

  1. Check statistical significance: Is the difference real or random chance?
  2. Review confidence intervals: How precise is the estimate?
  3. Check secondary metrics: Did the change affect other important metrics?
  4. Segment the data: Did the change work equally across devices, sources, etc.?

Step 6: Implement or Iterate

If winner is clear: Implement the winning variation permanently. If no significant difference: The change doesn’t matter—move on. If loser is clear: Keep the control, but document the learning.

Statistical Significance Explained

Statistical significance answers: “Is this difference real or could it be random chance?”

The 95% Standard

When we say a result is “statistically significant at 95% confidence,” we mean:

  • If there were actually no difference between versions
  • There’s only a 5% chance we’d see a difference this large by random chance

It does NOT mean “95% chance B is better.” It means “95% confident the observed difference isn’t just noise.”

P-Values

P-value is the probability of seeing your result if there were no real difference.

  • P-value < 0.05 → Statistically significant (at 95% confidence)
  • P-value > 0.05 → Not significant, could be random chance

Confidence Intervals

Confidence intervals show the range where the true effect likely falls.

Example: “Variation B improved conversion rate by 12% (95% CI: 5% to 19%)”

This means we’re 95% confident the true improvement is between 5% and 19%. The wider the interval, the less precise the estimate.

Why Significance Matters

Without statistical significance, you might:

  • Implement a change that doesn’t actually work
  • Discard a change that would have helped
  • Make decisions based on random noise

Common A/B Testing Mistakes

Mistake 1: Stopping Too Early

You see B winning after 3 days. Exciting! Ship it!

Problem: Early results are unreliable. Statistical significance needs adequate sample size. Stopping early dramatically increases false positives.

Solution: Calculate required sample size before testing. Run to completion.

Mistake 2: Peeking and Acting

Checking results daily is fine. Acting on them isn’t.

Problem: If you check 10 times and stop when you see significance, your actual false positive rate is much higher than 5%.

Solution: Set a stopping rule in advance. Use sequential testing methods if you must make early decisions.

Mistake 3: Testing Too Many Variations

Testing A vs. B vs. C vs. D vs. E splits traffic five ways.

Problem: Each variation needs adequate sample size. You’ll need 5x the traffic and time.

Solution: Test fewer variations. Start with one challenger against the control.

Mistake 4: Testing Multiple Changes

Version B has a new headline, different image, and new button color.

Problem: If B wins, you don’t know which change caused it. You can’t apply the learning to other pages.

Solution: Test one change at a time. Or use multivariate testing with proper statistical power.

Mistake 5: Ignoring Segment Differences

Overall, B wins by 8%. But on mobile, B loses by 15%.

Problem: You might implement something that hurts a major segment.

Solution: Always check results by device, traffic source, and other key segments.

Mistake 6: Not Tracking Revenue Impact

B increases clicks by 20%! But average order value dropped 25%.

Problem: Optimizing the wrong metric can hurt actual business outcomes.

Solution: Track revenue per visitor or similar business-outcome metrics alongside conversion rate.

Choosing an A/B Testing Tool

Entry Level (< $100/month)

Google Optimize: Sunsetted, but alternatives exist VWO Testing: Starts around $99/month Convert: Starts around $99/month

These offer:

  • Visual editor for creating variations
  • Basic targeting and segmentation
  • Statistical analysis

Mid-Market ($200-1,000/month)

Optimizely Web: More robust experimentation AB Tasty: Good balance of features and usability Dynamic Yield: Strong personalization features

Additional capabilities:

  • Advanced targeting
  • More sophisticated statistics
  • Better developer tools
  • Personalization

Enterprise ($1,000+/month)

Optimizely Full Stack: Server-side testing LaunchDarkly: Feature flag focused Conductrics: AI-driven optimization

For organizations with:

  • High traffic volumes
  • Complex technical requirements
  • Need for server-side testing
  • Sophisticated experimentation programs

DIY Options

With developer resources, you can build testing with:

  • Feature flags
  • Random assignment logic
  • Analytics tracking

Pros: No vendor costs, full control Cons: Statistical analysis is complex, easy to make mistakes

How Much Traffic Do You Need?

The Traffic Reality Check

Many sites don’t have enough traffic for frequent A/B testing.

Rule of thumb: You need roughly 100 conversions per variation per week for reasonable test velocity.

Weekly ConversionsTest Duration (for 10% lift detection)
506-8 weeks
1003-4 weeks
2501-2 weeks
500~1 week

If tests take months, you’ll struggle to build momentum.

Low-Traffic Alternatives

Test bigger changes: A 50% improvement needs much smaller sample than 10%.

Focus on high-volume pages: Test where traffic concentrates.

Use qualitative methods: User testing, surveys, and heatmaps provide insights without statistical power requirements.

Before/after analysis: Less rigorous than A/B but still valuable for major changes.

Multi-armed bandit: Some tools automatically allocate more traffic to winning variations, reaching conclusions faster (with trade-offs).

Building a Testing Program

Start Small

First test: pick something simple.

  • Change CTA button color
  • Test headline variation
  • Adjust form field count

Learn the process before tackling complex tests.

Build a Backlog

Maintain a prioritized list of test ideas:

  • Source ideas from research, analytics, team input
  • Score by potential impact, confidence, ease
  • Always have next test ready

Establish Cadence

Aim for continuous testing:

  • Analyze completed test
  • Document learnings
  • Launch next test
  • Review backlog and priorities

Organizations running 2-4 tests monthly see compounding improvements.

Document Everything

For each test, record:

  • Hypothesis and rationale
  • What was tested (screenshots)
  • Duration and sample size
  • Results (including statistical details)
  • Learnings and next steps

This knowledge compounds over time.

Your First A/B Test Checklist

  • Hypothesis formed based on research
  • Success metric defined
  • Sample size calculated
  • Duration estimated
  • Test set up in tool
  • Both variations QA’d thoroughly
  • Test launched
  • Running without interference
  • Reached predetermined endpoint
  • Results analyzed properly
  • Segments checked
  • Winner implemented (or learning documented)

Ready to Improve Your Conversions?

Get a comprehensive CRO audit with actionable insights you can implement right away.

Request Your Audit — $2,500

Ready to optimize your conversions?

Get personalized, data-driven recommendations for your website.

Request Your Audit — $2,500