Gallery

Practical Statistics for Data Scientists

Explore

Gallery

Practical Statistics for Data Scientists

5. Classification

Evaluating classification models

⁠

Accuracy

The percent/proportion of cases classified correctly

⁠

Confusion matrix

A tabular display of the record counts by their predicted and actual classification status

⁠

The rare class problem

Depending on the relative cost, need to make the trade-off between false positives and false negatives

⁠

Precision, recall, and specificity

Term

Description

Interpretation

Formula

R code

Specificity

The percent/proportion of 0s correctly classified

Measures a model's ability to predict a negative outcome

conf_mat[2,2]/sum(conf_mat[2,])

Precision

The percent/proportion of predicted 1s that are actually 1s

The accuracy of a predicted positive outcome

conf_mat[1,1]/sum(conf_mat[,1])

Sensitivity/Recall

The percent/proportion of 1s correctly classified

Measure the strength of the model to predict a positive outcome

conf_mat[1,1]/sum(conf_mat[1,])

There are no rows in this table

Count

⁠

Receiver Operating Characteristics (ROC) curve

A plot of trade-off between sensitivity and specificity as

⁠

Precision-recall curve

⁠

Area under the curve (AUC)

⁠

Lift

A measure of how effective the model is at identifying (comparatively rare) 1s at different probability cutoffs

Accuracy

Confusion matrix

The rare class problem

Precision, recall, and specificity

Receiver Operating Characteristics (ROC) curve

Precision-recall curve

Area under the curve (AUC)

Lift

Gallery

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.