This is a score that is given to each variation, predicting its chance to outperform all other variations in the long term. It is used to determine how variations are distributed when using dynamic allocation, and to determine which variation is the winner of an A/B test.
The algorithm
The calculations are based on an enhanced, quick, and aggressive Bayesian statistical engine. The model is based on statistical inference and provides each variation with its probability to be best, and a new type of confidence interval called HPDR. It also allows for a more reliable declaration of a test winner. For mathematical information about the Bayesian statistical model - read this article.
Probability to Be Best in A/B test
The probability to be the best score, together with the minimum test duration (default: 14 days), determines which (if any) variation is the winner of an A/B test. The score is calculated daily and is reset when a new test version is created.
Probability to Be Best in Dynamic Allocation
Each variation begins with a score reflecting the number of variations in the experience (for example, if there are four variations, the scores all start at 25%). As data is collected, the algorithm adjusts the scores based on the relative performance of each variation. Variations with higher scores are served more traffic. The scores are calculated at a rate of at least once an hour. To continuously train and monitor the model, 10% of sessions are always served with a random variation.
If a new variation is added to an existing test, the relative scores of the existing variations will be maintained, but the new variations will be assigned a score based on the number of variations. For example, if there are two variations with scores 10% and 90%, and a new variation is added, the new variation's score will begin at 33.3% and the original two scores will be scaled down (10% and 90% of the remaining 66% = 6.6% and 60%).
If an existing variation is edited, it is viewed as a new variation and its score is reset.