Practical Statistics for Data Scientists
Share
Explore
5. Classification

icon picker
Evaluating classification models

Accuracy

The percent/proportion of cases classified correctly

image.png

Confusion matrix

A tabular display of the record counts by their predicted and actual classification status

image.png

The rare class problem

Depending on the relative cost, need to make the trade-off between false positives and false negatives

Precision, recall, and specificity

Term
Description
Interpretation
Formula
R code
1
Specificity
The percent/proportion of 0s correctly classified
Measures a model's ability to predict a negative outcome
image.png
conf_mat[2,2]/sum(conf_mat[2,])
2
Precision
The percent/proportion of predicted 1s that are actually 1s
The accuracy of a predicted positive outcome
image.png
conf_mat[1,1]/sum(conf_mat[,1])
3
Sensitivity/Recall
The percent/proportion of 1s correctly classified
Measure the strength of the model to predict a positive outcome
image.png
conf_mat[1,1]/sum(conf_mat[1,])
There are no rows in this table
3
Count

Receiver Operating Characteristics (ROC) curve

A plot of trade-off between sensitivity and specificity as
image.png

Precision-recall curve


Area under the curve (AUC)


Lift

A measure of how effective the model is at identifying (comparatively rare) 1s at different probability cutoffs

Share
 
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.