This dataset captures the results of a series of direct marketing campaigns of “BANK”, an international banking institution. The campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to assess if the product (bank term deposit) would be ('yes') or not ('no') subscribed.
Run the a classifier that predicts the subscription of a term deposit using a 90:10, 80:20, and 70:30 split strategy
The summary result shows the basic situations about this dataset, we can find the result of subscription is not balanced, about 90% is no and only 10% is yes, so we can not say it’s a good model if the accuracy is near 90%, because even if I randomly guess all result is no, the prediction accuracy is still about 90%
This plot shows when cp=0.006 the model performs best, therefore when we prune the decision tree model, we can use cp = 0.006
The result shows the importance of variables in this model is duration>nr.employed>pdays>........
This is the decision tree plot after the pruning.
From this plot, we notice that the tree first split the data by nr.emplo then by duration and pdays.
This is the Confusion Matrix of prediction result on testing data, we can find the accuracy is 92.09%
Precision Rate = TP/(TP+FP)= 65.38%
Recall Rate= TP/(TP+FN)= 54.33%
From the result we can see that it’s very likely for this model to predict a NO result when it is a YES in real.
This plot shows when cp=0.0069 the model performs best, therefore when we prune the decision tree model, we can use cp = 0.0069
The result shows the importance of variables in this model is duration>nr.employed>pdays>........, in another word, duration influence the result mostly
This is the decision tree plot after the pruning.
From this plot, we notice that the tree first split the data by nr.emplo then by different section of duration and pdays.
This is the Confusion Matrix of prediction result on testing data, we can find the accuracy is 91.49%
Precision Rate = TP/(TP+FP)= 64.77%
Recall Rate= TP/(TP+FN)= 53.98%
From the result we can see that it’s very likely for this model to predict a NO result when it is a YES in real.