Skip to content

DS CTR model

Target Label:
1 if clicked / viewed otherwise 0
P(click/view)
Training data:
We build model on USER-CLP pair with label as 1 if clicked / viewed otherwise 0
We randomly pick 5% of users and all interacted clps on a given day to create training data
We use 4 category of features to build the model (user-static/user-clp/user-attribute/clp-level features)
We look at user history for past [7,14,30,60] days to capture both short term and long interest
PI/Ni : 4.5% in training data
We created training data using 1 day from each week with atleast 7 day interval between label dates
A row in training data would look like this:
user_id, clp_id, date, features, y-value
y-value would be 1/0 (click/no click)
Validation data:
Same as training data generation with Validation set dates > Training set dates
Testing data:
We used few newly created experiment Widget groups and corresponding clps
3M random signed-up users were used along with few internal users
Feature creation same as training data generation with Test set dates > Validation set dates > Training set dates
Num of rows in testing data = num of users (3M) * Num of clps in experiment Widget groups
Model:
XgBoost classification model with “binary: logistic“ loss. We build three variations of model using above features:
Using only user features [Removing clp-level features]
Using user features without views [Removing clp-level features and views feature]
Using all features
MODEL EVALUATION:
Qualitative evaluation
1. AUC-ROC
2. AUC-PR
3. Precision@1/3/5
4. Recall@1/3/5
5. MRR
Training and backtesting results can be found here
Feature importance of each model can be found here
Backtesting Approach -
We do prediction for the test set and rank predictions in sorted order of predicted scores
Labeled data for each user will be as ground truth
Calculated precision@k and recall@k using sorted predicted ranking and ground truth set for each user
Calculate MRR using below algorithm
Open MRR_algo.png
Screenshot 2024-03-06 at 5.54.31 PM.png
Quantitative evaluation
1. We looked at top clp distribution for each model to analyse skewness and biases in model. Top clp distribution for each model can be found here RESTRICTED CONTENT
2. We manually looked at rankings for few internal users
OUTPUT PROB DISTRIBUTION
Training sample
Screenshot 2024-03-06 at 5.55.14 PM.png
Inference sample
Screenshot 2024-03-06 at 5.55.18 PM.png

Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.