By creating a linear regression model and providing a female student example with 3.0 GPA, 1050-1100 SAT Score, and 100 miles distance to campus, the probability for this student to apply is 10.58%.
2. Create an appropriate split and validate your model using a ratio of correct predictions vs total predictions.
#Question 1.2: Split and validate application model#Create teh training and test data samplesset.seed(100)split = (.9)trainingRowIndex1 = sample(1:nrow(application),(split)*nrow(application))trainingData1 = application[trainingRowIndex1, ]testData1 = application[-trainingRowIndex1,]#Develop the model on training data and plotmodel.app90 = rpart(status ~. ,data = trainingData1, method = "class")model.app90rpart.plot(model.app90)fancyRpartPlot(model.app90)#Calculate prediction accuracyprediction.app = predict(model.app90, testData1, type = "class")summary(prediction.app)pd1 = data.frame(actual = testData1$status, Prediction = prediction.app)pd1 = table(pd1)accuracy.app = paste(round((pd1[1,1]+0)/sum(pd1)*100,2), "%")accuracy.app
1. From the 10% test data, 3439 of them are under prospect or suspect and 39 of them have applied.
2. The accuracy of data is 98.88%.
3. Interpret your model and give actionable recommendations to the marketing department.
Interoperation for the linear regression:
1. Based on the predictors that have been chosen, there are few factors have strong relationship between application status which are gender, SAT score from 1360 to 1530, distance to campus, household income, and in state status.
2. As one unit increase of distance to the campus, the application value would be decreased by 0.0051.
3. As one unit increase of household income, the application value would be decreased by 1.20110^5.
4. If the students is in the state, the application value would be increased by 0.494.
Interoperation for split model:
1. The root node, at the top, shows only 1.0% have applied for this school while 99% are under prospect or suspect.
2. The number above theses proportions indicates that the node is voting (1 = applicant) and the number below indicates the proportion of the population that resides in this node, or impurity.
3. If the student has applied, move right, and if he or she is under prospect or suspect, more left.
1. Target students who live close to the campus and research on neighborhoods and communities.
2. Target students whose family have relatively low income.
3. Target students who live in the state of the school.
1. Incomplete Data: Based on vacancy of data, more researches about student application could be carried out.
2. Research Range: Similar researches for students who have already been admitted could be carried out for a better explanation.
3. Multiple Factors: Other prediction factors like university reputation, rank, and etc. could be added for further research because there might be some other issues that affect students’ decision.
Want to print your doc? This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (