Explore

DS CTR model

Target Label:

1 if clicked / viewed otherwise 0

P(click/view)

Training data:

We build model on USER-CLP pair with label as 1 if clicked / viewed otherwise 0

We randomly pick 5% of users and all interacted clps on a given day to create training data

We use 4 category of features to build the model (user-static/user-clp/user-attribute/clp-level features)

Feature description⁠

We look at user history for past [7,14,30,60] days to capture both short term and long interest

PI/Ni : 4.5% in training data

We created training data using 1 day from each week with atleast 7 day interval between label dates

A row in training data would look like this:

user_id, clp_id, date, features, y-value

y-value would be 1/0 (click/no click)

Validation data:

Same as training data generation with Validation set dates > Training set dates

Testing data:

We used few newly created experiment Widget groups and corresponding clps

3M random signed-up users were used along with few internal users

Feature creation same as training data generation with Test set dates > Validation set dates > Training set dates

Num of rows in testing data = num of users (3M) * Num of clps in experiment Widget groups

Model:

XgBoost classification model with “binary: logistic“ loss. We build three variations of model using above features:

Using only user features [Removing clp-level features]

Using user features without views [Removing clp-level features and views feature]

Using all features

MODEL EVALUATION:

Qualitative evaluation

1. AUC-ROC

2. AUC-PR

3. Precision@1/3/5

4. Recall@1/3/5

5. MRR

Training and backtesting results can be found here

⁠

neptune.ai⁠

feature importance⁠

Feature importance of each model can be found here

feature importance⁠

neptune.ai⁠

Backtesting Approach -

We do prediction for the test set and rank predictions in sorted order of predicted scores

Labeled data for each user will be as ground truth

Calculated precision@k and recall@k using sorted predicted ranking and ground truth set for each user

Calculate MRR using below algorithm

Open MRR_algo.png

⁠

⁠

Quantitative evaluation

1. We looked at top clp distribution for each model to analyse skewness and biases in model. Top clp distribution for each model can be found here

https://docs.google.com/spreadsheets/d/1a-gnxZhAlfOxmi1ox8l83Mw2qvrB6gag8WCwsQgJ1hM/edit?usp=sharing⁠

RESTRICTED CONTENT

2. We manually looked at rankings for few internal users

OUTPUT PROB DISTRIBUTION

Training sample

⁠

⁠

Inference sample

⁠

⁠

Want to print your doc?
This is not the way.

Try clicking the ··· in the right corner or using a keyboard shortcut (

CtrlP

) instead.