1. From the top features after creating a custom dictionary, it has been noticed that customers mentioned a lot about their account, credit, report, payment, information and loan.
2. The top words might be sound neutral and descriptive when customers state their complaints, but could guide the company to divert to specific departments.
3. The potential problems could be found with negative words like “dispute” and “never”.
4. The next step could be carried out by going through from more words and negativewords to their related contexts.
3.1 More Words:
#update for bigrans using tokens
toks.1<-tokens(newscorpus) #creates tokens
toks.2<-tokens_remove(toks.1, c(swlist,stopwords("english"))) #remove stop words from tokens
toks.3 <-tokens_ngrams(toks.2, n=2) # ngram =2
dfm.ngram2<- dfm(toks.3, verbose=TRUE)
topfeatures(dfm.ngram2, n=50)
Still, credit, report and cards are the top feature words. However, there are some more informative words like “Wells Fargo”, “credit score”, and “never received”.
3.2 Negative Word Context
#exploration in context
kwic(newscorpus, "dispute", 2) #Keywords in context
kwic(newscorpus , "never", window = 3)
“Dispute” context:
“Never” context:
From the word “dispute”, it could be found that problems like equifax issue, request letter, conformation information, and etc are the company need to pay more attention to.
From the word “never”, some complaints about missing mention information, bankruptcy, disbursements, missing response, utility issue, and etc are the company should find a way to solve.
4. What type of product issues & complaints are the most frequent?
##### Question 1.4: What type of product issues & complaints are the most frequent? ####
i=p=complaints.0 %>%
group_by(Issue) %>%
summarise(
n= n()
)
i_sort=arrange(i,desc(n))
#Barplot
i_sort %>%
head(8) %>%
ggplot(aes(x = fct_reorder(Issue, n,.desc=FALSE), y = n, fill = Issue)) +
geom_col() +
geom_text(aes(label = n), vjust = -0.25) +
theme(legend.position = "none")+
xlab("Issue") + ylab("Count")+
coord_flip()
For product issues and complaints, the most frequent Issue is “Incorrect information on report”.
i_sort_10=i_sort %>%
head(10)
p2=complaints.0 %>%
group_by(Product,Issue) %>%
summarise(
n= n()
) %>%
arrange(desc(n))
p2=p2 %>% filter(Issue %in% i_sort_10$Issue)
p2 %>%
ggplot(aes(x = fct_reorder(Issue, n,.desc=TRUE), y = n, fill = Product)) +
geom_col() +
theme(legend.position = "bottom")+
xlab("Issue") + ylab("Count")+
ggtitle("Product VS top 10 Issues")+
coord_flip()
For product issues and complaints, the most frequent product in “Incorrect information on report” is “Credit reporting, credit repair services, or other personal consumer reports”
5. Complete a sentiment analysis for all the types of complaint submissions observed during this year.
#####Question 1.5: Complete a sentiment analysis for all the types of complaint submissions observed during this year. ####
1. By the analysis, there are 23250 positive numbers and 640 negative values.
2. After sparcity and specifying, the word cloud has been created as the figure shows with the most frequent word “credit_card”, “credit_reporting”. and “credit_bureaus”.
Want to print your doc? This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (