Skip to content

Customer Complaints

1. How many complaints have been generated?
#### Question 1 ####
####Question 1.1: How many complaints have been generated? ####
#Load and clean the data
url6 <- '/Users/mofei/Desktop/IE9113 DA/Assignment & R Code/Assignment 6/Consumer_Complaints.csv'
complaints = read.csv(url6,header=TRUE, stringsAsFactors=TRUE,na.strings='NA')
dim(complaints)

> dim(complaints)
[1] 257341 18
257341 complaints have been generated.

2. How many are unique or recurring?
####Question 1.2: How many are unique or recurring? ####
####When Product, Sub Product, Issue, Sub Issue, Company are the same, we regarded them as recurring

#Select Product, Sub Product, Issue, Sub Issue, Company Column
complaints.0 = complaints[,c(2,3,4,5,8)]

#Count number of duplicate rows
p=complaints.0 %>%
group_by(Product,Sub.product,Issue,Sub.issue,Company) %>%
summarise(
n= n()
)

#Arrenge
p_sort=arrange(p,desc(n))
p_unique=p_sort %>% filter(n == 1)
p_recurred=p_sort %>% filter(n != 1)

#Sum unique and recurred number of rows
n_unique=sum(p_unique$n)
n_recurred=sum(p_recurred$n)
n_unique
n_recurred

> n_unique
[1] 19933
> n_recurred
[1] 237408

There are 237408 recurring data and 19933 unique data.

1650581096.png
For recurred complaints, most are Credit reporting problems, especially, incorrectly information complaints.


3. Using "Consumer.complaint.narrative", what can you say about the type of complaints in this report?
##########################
### Preprocessing
##########################
#create a corpus
require (quanteda)
help(corpus)
names(complaints)
#create a corpus with metadata
newscorpus<- corpus(complaints$Consumer.complaint.narrative,
docnames=complaints$Complaint.ID,
docvar=data.frame(Date.received=complaints$Date.received,
Product=complaints$Product,
Sub.product=complaints$Sub.product,
Issue=complaints$Issue,
Sub.issue=complaints$Sub.issue,
Company=complaints$Company,
State=complaints$State,
Submitted.via=complaints$Submitted.via))
names(newscorpus)
summary(newscorpus) #summary of corpus

#create document feature matrix from clean corpus + stem
help(dfm)
dfm.simple<- dfm(newscorpus,
remove = stopwords("english"),
verbose=TRUE)

topfeatures(dfm.simple, n=50)
Screen Shot 2022-04-19 at 15.13.54.png
# create a custom dictionary
swlist = c("xxxx", "xx","called", "also", "can", "pay", "paid", "said","call", "made", "days", "now", "s", "still", "date","told","one","make","tri")

dfm<- dfm(newscorpus,
remove = c(swlist,stopwords("english")),
verbose=TRUE,
stem=FALSE)

topfeatures(dfm, n=50)
Screen Shot 2022-04-19 at 15.27.06.png
1. From the top features after creating a custom dictionary, it has been noticed that customers mentioned a lot about their account, credit, report, payment, information and loan.

2. The top words might be sound neutral and descriptive when customers state their complaints, but could guide the company to divert to specific departments.

3. The potential problems could be found with negative words like “dispute” and “never”.

4. The next step could be carried out by going through from more words and negative words to their related contexts.

3.1 More Words:
#update for bigrans using tokens
toks.1<-tokens(newscorpus) #creates tokens
toks.2<-tokens_remove(toks.1, c(swlist,stopwords("english"))) #remove stop words from tokens
toks.3 <-tokens_ngrams(toks.2, n=2) # ngram =2
dfm.ngram2<- dfm(toks.3, verbose=TRUE)

topfeatures(dfm.ngram2, n=50)
Screen Shot 2022-04-19 at 15.41.06.png
Still, credit, report and cards are the top feature words. However, there are some more informative words like “Wells Fargo”, “credit score”, and “never received”.
3.2 Negative Word Context
#exploration in context
kwic(newscorpus, "dispute", 2) #Keywords in context

kwic(newscorpus , "never", window = 3)
“Dispute” context:
Screen Shot 2022-04-19 at 15.56.31.png
“Never” context:
Screen Shot 2022-04-19 at 15.44.22.png
From the word “dispute”, it could be found that problems like equifax issue, request letter, conformation information, and etc are the company need to pay more attention to.
From the word “never”, some complaints about missing mention information, bankruptcy, disbursements, missing response, utility issue, and etc are the company should find a way to solve.

4. What type of product issues & complaints are the most frequent?
##### Question 1.4: What type of product issues & complaints are the most frequent? ####
i=p=complaints.0 %>%
group_by(Issue) %>%
summarise(
n= n()
)

i_sort=arrange(i,desc(n))

#Barplot
i_sort %>%
head(8) %>%
ggplot(aes(x = fct_reorder(Issue, n,.desc=FALSE), y = n, fill = Issue)) +
geom_col() +
geom_text(aes(label = n), vjust = -0.25) +
theme(legend.position = "none")+
xlab("Issue") + ylab("Count")+
coord_flip()

For product issues and complaints, the most frequent Issue is “Incorrect information on report”.
1650584485(1).jpg
i_sort_10=i_sort %>%
head(10)

p2=complaints.0 %>%
group_by(Product,Issue) %>%
summarise(
n= n()
) %>%
arrange(desc(n))

p2=p2 %>% filter(Issue %in% i_sort_10$Issue)


p2 %>%
ggplot(aes(x = fct_reorder(Issue, n,.desc=TRUE), y = n, fill = Product)) +
geom_col() +
theme(legend.position = "bottom")+
xlab("Issue") + ylab("Count")+
ggtitle("Product VS top 10 Issues")+
coord_flip()
1650585902.jpg
For product issues and complaints, the most frequent product in “Incorrect information on report” is “Credit reporting, credit repair services, or other personal consumer reports”


5. Complete a sentiment analysis for all the types of complaint submissions observed during this year.
#####Question 1.5: Complete a sentiment analysis for all the types of complaint submissions observed during this year. ####
#####Sentiment Analysis####
mydict2 <- dictionary(list(negative = c("don","didn","detriment*", "bad*", "awful*", "terrib*", "horribl*"),
positive = c("best","love","good", "great", "super*", "excellent", "yay")))

dfm.sentiment2 <- dfm(newscorpus,
remove = c(swlist,stopwords("english")),
verbose=TRUE,
dictionary = mydict2,
stem=FALSE)
> topfeatures(dfm.sentiment2)
posit negat
23252 5640

# to evaluate sparcity
require(tm)
dfm.tm2<-convert(dfm.ngram2, to="tm")
dfm.sparse2<-removeSparseTerms(dfm.tm2,0.7)
dfm.sparse2

#####specifying a correlation limit of 0.5
findAssocs(dfm.tm2,c("life","colleg*","stori"),corlimit=0.5)

####word cloud#####
set.seed(100)
freq2<-topfeatures(dfm.ngram2, n=200)
wordcloud(words = names(freq2), freq=freq2, min.freq = 0.5,
max.words=500, random.order=FALSE, rot.per=0,
scale=c(6,0.3),size = 2,
colors=brewer.pal(8, "Dark2"))

Screen Shot 2022-04-19 at 19.15.15.png
1. By the analysis, there are 23250 positive numbers and 640 negative values.

2. After sparcity and specifying, the word cloud has been created as the figure shows with the most frequent word “credit_card”, “credit_reporting”. and “credit_bureaus”.

Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.