Skip to content

icon picker
Customer Complaints

1. How many complaints have been generated?

#### Question 1 ########Question 1.1: How many complaints have been generated? #####Load and clean the dataurl6 <- '/Users/mofei/Desktop/IE9113 DA/Assignment & R Code/Assignment 6/Consumer_Complaints.csv'complaints = read.csv(url6,header=TRUE, stringsAsFactors=TRUE,na.strings='NA')dim(complaints)
> dim(complaints)[1] 257341 18
257341 complaints have been generated.

2. How many are unique or recurring?

####Question 1.2: How many are unique or recurring? ########When Product, Sub Product, Issue, Sub Issue, Company are the same, we regarded them as recurring
#Select Product, Sub Product, Issue, Sub Issue, Company Columncomplaints.0 = complaints[,c(2,3,4,5,8)]
#Count number of duplicate rowsp=complaints.0 %>% group_by(Product,Sub.product,Issue,Sub.issue,Company) %>% summarise( n= n() )
#Arrengep_sort=arrange(p,desc(n))p_unique=p_sort %>% filter(n == 1)p_recurred=p_sort %>% filter(n != 1)
#Sum unique and recurred number of rowsn_unique=sum(p_unique$n)n_recurred=sum(p_recurred$n)n_uniquen_recurred
> n_unique[1] 19933> n_recurred[1] 237408
There are 237408 recurring data and 19933 unique data.
1650581096.png
For recurred complaints, most are Credit reporting problems, especially, incorrectly information complaints.

3. Using "Consumer.complaint.narrative", what can you say about the type of complaints in this report?

############################# Preprocessing###########################create a corpusrequire (quanteda)help(corpus)names(complaints)#create a corpus with metadatanewscorpus<- corpus(complaints$Consumer.complaint.narrative, docnames=complaints$Complaint.ID, docvar=data.frame(Date.received=complaints$Date.received, Product=complaints$Product, Sub.product=complaints$Sub.product, Issue=complaints$Issue, Sub.issue=complaints$Sub.issue, Company=complaints$Company, State=complaints$State, Submitted.via=complaints$Submitted.via))names(newscorpus) summary(newscorpus) #summary of corpus
#create document feature matrix from clean corpus + stemhelp(dfm)dfm.simple<- dfm(newscorpus, remove = stopwords("english"), verbose=TRUE)
topfeatures(dfm.simple, n=50)
Screen Shot 2022-04-19 at 15.13.54.png
# create a custom dictionaryswlist = c("xxxx", "xx","called", "also", "can", "pay", "paid", "said","call", "made", "days", "now", "s", "still", "date","told","one","make","tri")
dfm<- dfm(newscorpus, remove = c(swlist,stopwords("english")), verbose=TRUE, stem=FALSE)
topfeatures(dfm, n=50)
Screen Shot 2022-04-19 at 15.27.06.png
1. From the top features after creating a custom dictionary, it has been noticed that customers mentioned a lot about their account, credit, report, payment, information and loan.

2. The top words might be sound neutral and descriptive when customers state their complaints, but could guide the company to divert to specific departments.

3. The potential problems could be found with negative words like “dispute” and “never”.

4. The next step could be carried out by going through from more words and negative words to their related contexts.

3.1 More Words:

#update for bigrans using tokenstoks.1<-tokens(newscorpus) #creates tokenstoks.2<-tokens_remove(toks.1, c(swlist,stopwords("english"))) #remove stop words from tokenstoks.3 <-tokens_ngrams(toks.2, n=2) # ngram =2dfm.ngram2<- dfm(toks.3, verbose=TRUE)
topfeatures(dfm.ngram2, n=50)
Screen Shot 2022-04-19 at 15.41.06.png
Still, credit, report and cards are the top feature words. However, there are some more informative words like “Wells Fargo”, “credit score”, and “never received”.

3.2 Negative Word Context

#exploration in contextkwic(newscorpus, "dispute", 2) #Keywords in context
kwic(newscorpus , "never", window = 3)
“Dispute” context:
Screen Shot 2022-04-19 at 15.56.31.png
“Never” context:
Screen Shot 2022-04-19 at 15.44.22.png
From the word “dispute”, it could be found that problems like equifax issue, request letter, conformation information, and etc are the company need to pay more attention to.
From the word “never”, some complaints about missing mention information, bankruptcy, disbursements, missing response, utility issue, and etc are the company should find a way to solve.

4. What type of product issues & complaints are the most frequent?

##### Question 1.4: What type of product issues & complaints are the most frequent? ####i=p=complaints.0 %>% group_by(Issue) %>% summarise( n= n() )
i_sort=arrange(i,desc(n))
#Barploti_sort %>% head(8) %>% ggplot(aes(x = fct_reorder(Issue, n,.desc=FALSE), y = n, fill = Issue)) + geom_col() + geom_text(aes(label = n), vjust = -0.25) + theme(legend.position = "none")+ xlab("Issue") + ylab("Count")+ coord_flip()
For product issues and complaints, the most frequent Issue is “Incorrect information on report”.
1650584485(1).jpg
i_sort_10=i_sort %>% head(10)
p2=complaints.0 %>% group_by(Product,Issue) %>% summarise( n= n() ) %>% arrange(desc(n))
p2=p2 %>% filter(Issue %in% i_sort_10$Issue)

p2 %>% ggplot(aes(x = fct_reorder(Issue, n,.desc=TRUE), y = n, fill = Product)) + geom_col() + theme(legend.position = "bottom")+ xlab("Issue") + ylab("Count")+ ggtitle("Product VS top 10 Issues")+ coord_flip()
1650585902.jpg
For product issues and complaints, the most frequent product in “Incorrect information on report” is “Credit reporting, credit repair services, or other personal consumer reports”

5. Complete a sentiment analysis for all the types of complaint submissions observed during this year.
#####Question 1.5: Complete a sentiment analysis for all the types of complaint submissions observed during this year. #########Sentiment Analysis####mydict2 <- dictionary(list(negative = c("don","didn","detriment*", "bad*", "awful*", "terrib*", "horribl*"), positive = c("best","love","good", "great", "super*", "excellent", "yay")))
dfm.sentiment2 <- dfm(newscorpus, remove = c(swlist,stopwords("english")), verbose=TRUE, dictionary = mydict2, stem=FALSE)> topfeatures(dfm.sentiment2)posit negat 23252 5640

# to evaluate sparcityrequire(tm)dfm.tm2<-convert(dfm.ngram2, to="tm")dfm.sparse2<-removeSparseTerms(dfm.tm2,0.7)dfm.sparse2
#####specifying a correlation limit of 0.5 findAssocs(dfm.tm2,c("life","colleg*","stori"),corlimit=0.5)
####word cloud#####set.seed(100)freq2<-topfeatures(dfm.ngram2, n=200)wordcloud(words = names(freq2), freq=freq2, min.freq = 0.5, max.words=500, random.order=FALSE, rot.per=0, scale=c(6,0.3),size = 2, colors=brewer.pal(8, "Dark2"))
Screen Shot 2022-04-19 at 19.15.15.png
1. By the analysis, there are 23250 positive numbers and 640 negative values.

2. After sparcity and specifying, the word cloud has been created as the figure shows with the most frequent word “credit_card”, “credit_reporting”. and “credit_bureaus”.

Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.