How to A/B test

The process of improving conversion is called A/B testing: it's the science of testing changes to see if they improve performance.

For example, you could rewrite the top half of your landing page or you could switch from a paid trial to a free trial. These changes might increase your sign up rate.

Your job is to figure out what's worth testing.

We'll cover:

Deciding what to A/B test
Finding the most valuable tests
Asking your team to implement tests

A/B testing is fundamental to growth

In a test, each thing you're testing is called a variant. For example, your existing site may be Variant A. The change you're comparing it to may be called Variant B.

Hence, "A/B" testing.

Testing makes or breaks growth. I've worked with many companies who couldn't get Facebook ads to run profitably then later achieved success through three months' worth of landing page A/B testing: they continuously made their visuals more enticing and their messaging more clear.

The A/B testing cycle

Here's the testing cycle:

Decide what change to test.
Use Google Optimize (an A/B testing tool) to show half your visitors the change.
Run this test until you reach a statistically significant sample of visitors.
When enough data is collected, Google Optimize will report the likelihood that your change had a significant effect on conversion. If it caused a significant positive difference, you should consider implementing it.
Log the design and results of your experiment to inform future experiments.

Repeat these steps until you run out of variant ideas. Never have downtime; every day of the year, a test should be running—or you're letting traffic go to waste.

A/B testing isn't about striving for perfection with each variant. It's about iteration.

Sourcing A/B ideas

Here's where I source ideas from:

Support and sales teams — My team members who interact with customers know best what appeals to them. Ask them what objections they encounter, then proactively address those in your landing page copy.
User surveys — Ask users which features they love about your product. And ask what their biggest concerns are. Address these in your messaging.
Best ads — Your best-performing ads have value props, text, and imagery that can be repurposed for A/B tests.
Competitors' sites — Identify successful competitors in your space and mine their pages for inspiration. Do they structure their content differently? Do they talk to visitors differently? Consider testing their style.
On-site behavior — Use a visitor recording tool like Hotjar or FullStory to find patterns in visitor engagement: What are they clicking? What are they ignoring? And what does this imply about the type of content that appeals to them? Try giving them more of it.
Past A/B successes and failures — At the end of this page, I cover how to log A/B successes and failures to revisit them for future experiments.

Testing the growth funnel

An A/B variant is only better if it increases your bottom line.

If you discover that a variant motivates visitors to click a button 10x more, but button clicking doesn’t actually lead to greater signups or purchases, then your variant isn’t better than the original. All it's done is distract users into clicking a button.

For each A/B test, keep your eye on the prize: What is the meaningful funnel metric you're trying to increase? Often, it's email captures, sign ups, purchases, and retention.

Of these, you'll more often A/B test earlier parts of the funnel—for two reasons:

Bigger samples — Because users drop off at each step in the funnel, earlier steps have larger sample sizes. Sample volume is important because you need a sufficient sample size to finish a test. Otherwise, tests take weeks to conclude and will block other tests from running.
Less implementation — It's typically less implementation work to change top-of-funnel assets like ads, landing pages, and emails than it is down-funnel assets like in-product experience. This is why landing page A/B testing is more common than in-product feature testing.

What to A/B test on your landing page

There are two types of variants: micros and macros.

Micro variants are small, quick changes. They're unlikely to have a large impact. For example, changing a button's color (a micro variant) typically won't have more than 2% conversion impact—at best.

Macro variants, on the other hand, are significant rethinkings of your asset. Entirely rewriting a landing page can increase conversion by 50-300%. This happens often. Although, you'll usually only get a couple of boosts before facing diminishing returns.

Your goal is to focus on big, macro impacts—because every A/B test has an opportunity cost: you're usually only running one test per audience at a time.

Macro variants

Macro variants require considerable effort: It’s hard to repeatedly summon the focus and company-wide collaboration needed to wholly rethink your assets.

But macros are the only way to see the forest through the trees.

Since the biggest obstacle to testing macros is committing the resources, I urge you to create an A/B testing calendar and adhere to it: Create a recurring event for, say, every 2 months. On that day, spend a couple hours brainstorming a macro variant for a step in your growth funnel.

You can do so using one of five approaches:

Mimic sections of competitors' pages — Find competitors with thoughtful, well-structured pages. Then mimic some of their sections. By "sections," I refer not to their word choice, but to their layout elements, such as charts, sliders, GIFs, and other means of conveying value. (Do not rip off their site. Take inspiration.)
Write to a new persona — Tailor your value props and copy to, say, mothers instead of teenagers. Perhaps you’re misidentifying your most valuable audience.
Cut the page in half — Counterintuitively, having less content sometimes means more content is ultimately read—because visitors are less overwhelmed. Try being hyper-concise.
Take a stance — Choose one value prop that you embrace more than your competitors. Write a variant that draws this line in the sand; pick a fight and call out the competitors who differ from your one, true way. Then show visitors how being on your side leads to better outcomes.
Combine micros — Combine a half-dozen micros that work toward a singular goal, such as reinforcing a value prop or compelling visitors to take a specific action.

Micro variants

Now here are micro ideas.

Text — Header, subheader, feature headers, feature paragraphs
Imagery — Header image, content images, background images
CTA — CTA button design, placement, copy
Social proof — Try different company logos or different forms of proof
Forms — Number of fields, field layout, and field copy
Order — The order of your page sections
Design — Spacing, color, and font styling
Offers — Introduce time-sensitive discounts

Despite micros being less important, I'm including them because if you piece together enough micros, you sometimes have yourself a macro.

The best micro

When you run out of macros, this is the micro with the greatest impact: change your above-the-fold content.

Every page has an above-the-fold (ATF) section. This is what visitors see before scrolling to the rest of a page. The content placed in your ATF in part determines whether visitors continue scrolling.

Specifically, rewrite your header and subheader copy. Header text is the first hook encountered for your product. So, if you've been unknowingly showing visitors unenticing messaging here, fixing it can have an impact.

Prioritizing A/B tests

An A/B test has an opportunity cost; you only have so many visitors to test against. So prioritize thoughtfully.

Here are the factors I consider:

Confidence — How confident are you the test will succeed? You can build confidence by better understanding your users: survey them, monitor their behavior, and study your past A/B's. (That said, sometimes crazy, unexpected tests perform best.)
Impact — If a tests succeeds, is it likely to significantly increase conversion? The less optimized your landing page is to start, or the more macro your proposed test is, the greater the potential impact. Higher impact tests should be run first.
Implementation — How easy it is to implement? Is there too much technical or operational complexity? If so, deprioritize the test if you have equally strong ideas that require less implementation.
Uniqueness — Is your new test a near copy of a previous one that failed? For example, are you changing the color of a button further down the page after a previous button color change higher on the page failed?
Brand consistency — If adding aggressive sales copy successfully increases your signup conversion, but you're a company that values being casual and interpersonal, then perhaps going off-brand is not the right tradeoff. Sometimes, it's wiser to prioritize building a brand you're proud of than increasing your bottom line at any cost.

A reminder that the last page of this handbook has a downloadable cheatsheet that handily recaps most of what you're about to learn.

Setting up A/B tests

Two things to understand about proper test design:

Run one A/B at a time. Otherwise, visitors can criss-cross through multiple tests when changing devices (e.g. mobile to desktop) across sessions. This makes results murky if not meaningless.
Run A/B variants in parallel. Instead, if you were to run variants sequentially—meaning, one variant for 5 days followed by another for next for 5 days—the varying traffic sources throughout that 10 day period, and the different days of the week, won’t be controlled for. This invalidates your results.

Google Optimize handles all this A/B testing logic for you.

Consider only targeting new users

When setting up tests, consider who should be included in them. It doesn't have to be everyone.

For example, consider only showing an experiment to visitors arriving at your site for the first time. This ensures that everyone in the test has the same base level of familiarity with your product.

To target only new users in Google Optimize, follow Example 1 in these instructions:

Assessing A/B test results

For test results to be statistically valid, you need to reach a sufficiently large sample size. The math is simple:

To statistically validate a 6.3% or greater conversion increase, a test needs 1,000+ visits.
To statistically validate a 2%+ increase, a test needs 10,000+ visits.

The implication is that if you don’t have a lot of traffic, the opportunity cost is too great to run micro variants, which tend to show conversion increases in just the 1-5% range. Meanwhile, macros have the potential to produce 10-20%+ improvements, which is well above the 6.3% threshold.

Below is an example of an experiment I ran using Google Optimize:

Read Google's docs (parts one and two) to learn how to interpret these results.

Above, our page had 1,724 views throughout the testing period. There was a 30% (29/22) improvement in our test variant over our baseline.

This 30% number is likely inaccurate, by the way. It's just a reference for the variant's maximum potential. We don't yet have that many sessions to validate this conversion improvement with certainty. But 30% is likely good enough to validate that we improved conversion by at least 6.3% (the number from earlier).

Pay attention to the Google Optimize column labeled Probability to be Best. If a variant’s probability is 70%+ and it has sufficient sessions (e.g. 1,000 and 10,000 as I indicated above in the sample size thresholds), the results are likely statistically sound, and the winning variant should be considered for implementation.

Now you can decide if the labor and implementation externalities are worth the 6.3%+ improvement in conversion.

Sample sizes and revenue

What if our results weren't conclusive? What if we didn't surpass a 70% certainty?

Had the experiment revealed merely a 3% increase, for example, we would have to dismiss the sample size of 1,724 as too small for the 3% to be statistically valid.

We would end the experiment if we have low confidence in it, or we'd accept the testing opportunity cost and continue until we reach 10,000 sessions. If, after 10,000 sessions, the 3% increase remains, we'd conclude it's likely valid.

But, as mentioned in the previous section, if you have little traffic to begin with, don't risk waiting on a small, 3% improvement. Instead, consider a new test.

However, if that small change is tied to a meaningful revenue objective (e.g. purchases) as opposed, to say, people providing their email addresses, then perhaps it's worth continuing.

In other words, the closer an experiment's conversion objective is to revenue, the more worthwhile it may be to confirm small conversion boosts.

Don't implement negligible wins

Don't implement A/B variants that win negligibly. The unknown downsides of implementation often outweigh the expected value of the gain.

For example, a change may introduce unforeseen funnel consequences that won't be obvious for a few months. It'll later be difficult to identify this as the root cause.

Consider degree of intent

However, sometimes negligible wins are worth re-running on a new audience.

Consider this: when running A/B tests to improve conversion, you'll get diminishing returns on conversion gains for already high-intent traffic (e.g. organic search, referrals, and word of mouth). Those visitors came looking for you on their own merit. They're already interested. The onus is on you to reassure that you sell what they're expecting, and to not scare them off.

In contrast, for, say, ad traffic, A/B testing has the potential to provide much larger returns. These are uninterested, medium-intent eyeballs at best—often people who whimsically clicked your ad. They're looking for excuses to dismiss your value props and leave immediately.

This is where A/B tests shine: they're more effective at significantly improving conversion rates for low-to-medium intent traffic—because there's a greater interest gap to cover.

Here’s the implication: If you only A/B against high-intent traffic, you may not notice a significant improvement and may mistakenly dismiss your test as a global failure. When this happens, but you're confident the variant does have potential, retry the test on paid traffic. That’s where the improvement may be large enough to notice its significance.

Here's the point

Three takeaways:

A/B testing is higher-leverage and cheaper than most other marketing initiatives. It's critical to build the team-wide discipline to do it.
Focus on macro variants until you run out of bold ideas. When pursuing micro variants, focus on those that directly impact revenue (e.g. purchases) instead of conversion objectives much earlier in your funnel (e.g. signups).
Diligently track A/B results and reference them when ideating future ones.