Skip to content

DS CVR model

Target Label:
1 if ordered / click otherwise 0
P(order/click)
Training data:
We build model on USER-CLP pair with label as 1 if ordered / click otherwise 0
We randomly pick 15% of users and all clicked clps on a given day to create training data
Feature set is same as V1.0 (CTR model)
We look at user history for past [7,14,30,60] days to capture both short term and long interest
PI/Ni : 1.1% in training data
We created training data using 1 day from each week with atleast 7 day interval between label dates
A row in training data would look like this:
user_id, clp_id, date, features, y-value
y-value would be 1/0 (order/no order)
Validation data:
Same as training data generation with Validation set dates > Training set dates
Testing data:
We used few newly created experiment Widget groups and corresponding clps
3M random signed-up users were used along with few internal users
Feature creation same as training data generation with Test set dates > Validation set dates > Training set dates
Num of rows in testing data = num of users (3M) * Num of clps in experiment Widget groups
Model:
XgBoost classification model with “binary: logistic“ loss. We build below variant of model using above features:
Using user features without views [Removing clp-level features and views feature]
MODEL EVALUATION:
Qualitative evaluation
1. AUC-ROC
2. AUC-PR
3. Precision@1/3/5
4. Recall@1/3/5
5. MRR
Training and backtesting results can be found here
Feature importance of each model can be found here
Backtesting Approach -
We do prediction for the test set and rank predictions in sorted order of predicted scores
Labeled data for each user will be as ground truth
Calculated precision@k and recall@k using sorted predicted ranking and ground truth set for each user
Calculate MRR using below algorithm
a.jpeg
Quantitative evaluation
1. We looked at top clp distribution for each model to analyse skewness and biases in model. Top clp distribution for each model can be found here
b.jpeg
GENDER COHORT
c.png
d.png
ORDER-STAGE COHORT
e.png
f.png
g.png
h.png
2. We manually looked at rankings for few internal users
OUTPUT PROB DISTRIBUTION
Training sample
Inference sample
j.png
Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.