1)We will be looking into a randomized control design, where the volume of data is high and the cost per observation is cheap.
Let’s assume we are trying to increase the CTR of the ‘request access’ button
2)Start with a hypothesis -
Changing the size of the button by 10px will increase the CTR.
3)Describe the unit of the experiment
The unit is the entity about whom you want to make an inference. This is the entity to which you will apply your treatment. In our case, it will be the unique user who visits the page.
4)Comparable groups split
You need to divide the total units into two groups - the treatment group, who will be given the ‘treatment’, and the control group, for calculating the baseline. We will dive later into how big the treatment group should be.
the variable you are trying to predict an outcome for. You introduce changes to the page to see the change in the CTR, hence the target Variable is the ctr.
Since this a very straightforward A/B test, we aren’t considering any product metrics. But in most cases, you will be tracking these. For example - if you are testing a recommender system, just tracking the CTR doesn’t cut it. You need to evaluate if there has been a positive movement in product metrics like retention, engagement, etc. (refer to the blog I posted for more on this)
7)Experimental Variable -
This is the variable you can change to see how it affects the target variable. In our case, we will be increasing the size of the button.
These are the variables you have about the units of the experiment. We will have the following about the user - browser info, demographic, laptop, etc.
The main use of the control variable is to make sure the control and treatment groups are as similar to each other as possible and are also representative of the population.
They also help us with strategic targeting. For example, We might only be looking to launch for users using MacBook, and in the US. We can use the control variable to filter out.
In most cases, we will have a huge list of control variables. Which ones to choose?
1. List down all the control variables present with you.
2. Eliminate the variables that are not logical connection to the target variable.
3. Check the correlation of the variable with the target variable.
4. Check for Multicollinearity, i.e, the correlation between control variables. These chosen variables should be independent of each other.
🤬🤬Be wary of confounding variables (they will surely fuck up your experiment)
For example, we may be mistaking a high positive correlation between users who use chrome, and CTR. Though this is because google’s SEO ranks the page higher. Hence the search engine is the confounding variable.
The sample size of the treatment group depends on - the baseline conversion rate, Minimum detectable effect (% change in CTR we would like to detect), and the Statistical Significance.
This calculator by Optimizely is amazingly useful for this purpose -
Understand why we need a large enough sample size, read -
If you want to detect very small changes (0.1%), you would need a bigger sample size than if you were looking to detect a 5% change, to make sure the difference in target variable that you are getting is not random or due to chance. (0.1% changes also matters, in google/facebook case, a 0.1% change in CTR could mean 💵 💸 )
there is no set rule, it depends on the business cycle, and how long it takes for you to get a representative population exposed to the treatment.For ex, if the sample size required by our treatment group is 100000, and ~200000 unique users visit the website every day and we do a random 10-90 split of users into treatment and control group, it would require the experiment to run for at least 5-6 days.We also want to make sure that all types of users get a chance,ex, the user who only visits on weekends.Ideal duration ~ 7 days
has proved helpful for calculating the duration.
11)Analyzing the results -
You will have the data for the two groups - Users visited, Users Clicked. Start with a basic analysis if there is a winner.To check whether the difference is not due to chance, we want to check the difference of mean, whether it was significant or not. For this purpose, we use the t-test.
If you are tracking other product metrics, you need to check if there is an uplift in those metrics.
For example, In a recommender system A/B test, we might be seeing the CTR shoot up, yet no effect on product metrics. This would require a qualitative look. The reason could be that the recommender is showing click baits, which increases the CTR, but doesn’t affect the product metrics in any meaningful way.
If you are looking for some free courses to deep dive -
Understanding A/B experimentations -