One method of determining the credibility of an A/B test platform is to perform an A/A test. This means that you create two or more identical variations and run an A/B test to see how the platform handles the variations. Successful results should show that both variations yield very similar results. With an A/A test you can verify:
- Allocation is random
- All data is collected
- The Probability to Be Best score is reliable
Conducting an A/A Test
- Create a new Custom Code campaign by going to Site Personalization › New Campaign › Custom Code.
- Give your campaign a name (e.g. "AA Test")
- If you have an analytics integration (Google Analytics integration or a custom analytics integration) - make sure it is enabled, and click Next.
- In the targeting tab, click Next without changing any settings.
- In the variation tab, click New Variation and select the Custom Code template.
- In the JS tab, add the following code.
console.log('A/A test variation A');
- Click Save Variation.
- Click New Variation and create a second variation with the following code in the JS tab:
console.log('A/A test variation B');
- After saving the second variation, use the Allocation column to allocate 50% of the traffic to each variation.
- Use the default settings for the primary metric (e.g. purchases). Do not change the default advanced settings which are sticky for the user (multi-session), an attribution window that starts when the variation is served, and ends when the session ends.
- Click Next and set the experience status to Active.
- Click Save Experience and Publish. Don't worry, it will not impact your visitors' experience. Users that are assigned to variations will only trigger a console.log message in the browser.
- In the campaign list, click more options next to the campaign you just created and select Duplicate. Do this 9 times so that you have a total of 10 A/A tests. This will minimize any effect of randomness.
- Allow the test to run for a full week before assessing the results.
- After the test and the analysis are complete, archive the campaigns.
Analyzing A/A Test Results
In A/A tests, since the 2 variations are identical, we are not required to wait for 2 weeks. It is recommended to let it run for a full week, accumulate data, and then analyze results according to the following parameters in the following order (as a failure in step 1 can lead to failure in step 3):
- Variation allocation: view the number of users that were allocated to each variation to verify that allocation is indeed random.
Good: each variation was served to 48% – 52% of users.
Bad: each variation was served to less than 48% or more than 52% of users. - Data collection: compare the number of purchases and revenue in the campaign report to your primary analytics platform.
Good: less than 5% discrepancy in the number of purchases and revenue.
Bad: more than 5% discrepancy in the number of purchases and revenue.
Read more about troubleshooting data discrepancies issues › - Probability to Be Best score: look at 2 metrics. For eCommerce: "Add to Cart" and "Purchase". In each campaign, count how many variations have a score greater than 95% for one of these metrics. This is a false winner, that can occur around 5% of the time statistically due to random chance.
Good: 0-2 metrics reached significant results (>95% score).
Bad: 3 or metrics reached results (>95% score).
FAQ
Why should I run 10 tests?
95% Probability to Be Best score is considered reliable. however, this also means that 5% of the time a false winner will be declared in an A/A test. If you run a single test that eventually reaches statistical significance, you will not know if it’s an outlier (1 of 20 chance) or a real issue with the platform. Looking at 3 metrics across 10 tests (overall 30 metrics), reduces the effect of randomness on the test.
What to do if an A/A test failed?
It depends on which metrics were problematic:
- Variation allocation: If the allocation was not between 48-52% - make sure overall implementation of Dynamic Yield is correct. Initial check: number of users in the dashboard matches the number of users in your analytics platform. If no implementation issue was detected - contact Custom Support.
- Data collection: If there was a discrepancy between Dynamic Yield and your analytics platform of more than 5%, verify that your events are implemented correctly. For details, see Validating Events Implementation.
- Probability to be Best: If there are more than 2 metrics that reached significance - let the test run for a second week before determining the results. If more than 6 metrics have a variation with a score greater than 95%, check contact Customer Support.
What to do after an A/A test succeeded?
Simply archive all of the campaigns and start creating some real A/B tests!