Probability to Be Best is a score given to each variation that predicts its chance to outperform all other variations in the long term. It's used to determine how variations are distributed when using dynamic allocation, and to determine which variation is the winner of an A/B test.
The algorithm
The calculations are based on an enhanced, quick, and aggressive Bayesian statistical engine. The model is based on statistical inference and provides each variation with its Probability to Be Best, and a new type of confidence interval called HPDR. It also provides for a more reliable declaration of a test winner. For mathematical information about the Bayesian statistical model, read this article.
Probability to Be Best in A/B tests
The probability to Be Best score, together with the minimum test duration (default: 14 days), determines which (if any) variation is the winner of an A/B test. The score is calculated daily and is reset when a new test version is created.
Probability to Be Best in Dynamic Allocation
Each variation begins with a score reflecting the number of variations in the experience (for example, if there are 4 variations, the scores all start at 25%). As data is collected, the algorithm adjusts the scores based on the relative performance of each variation. Variations with higher scores are served more traffic. The scores are calculated at a rate of at least once an hour. To continuously train and monitor the model, 10% of sessions are always served with a random variation.
If a new variation is added to an existing test, the relative scores of the existing variations are maintained, but the new variations is assigned a score based on the number of variations. For example, if there are 2 variations with scores of 10% and 90%, and a new variation is added, the new variation's score begins at 33.3%, and the original 2 scores are scaled down (10% and 90% of the remaining 66% = 6.6% and 60%).
If an existing variation is edited, it is viewed as a new variation, and its score is reset.