Skip to content

icon picker
Steve Jobs Commencement Speech

Question Statement:

Using install.packages("readtext") and the function readtext( ), load the .txt file and conduct an analysis of top features, words in context, correlations, and a wordcloud visualization.

1. Top Features of the Speech


install.packages("readtext")
library(readtext)
url<- "https://raw.githubusercontent.com/jcbonilla/BusinessAnalytics/master/BAData/JobsStandfordSpeech.txt"
speech <- readtext(url)

head(speech$text)
speech$text <- gsub("'", "", speech$text) # remove apostrophes
speech$text <- gsub("[[:punct:]]", " ", speech$text) # replace punctuation with space
speech$text <- gsub("[[:cntrl:]]", " ", speech$text) # replace control characters with space
speech$text <- gsub("^[[:space:]]+", "", speech$text) # remove whitespace at beginning of documents
speech$text <- gsub("[[:space:]]+$", "", speech$text) # remove whitespace at end of documents
speech$text <- gsub("[^a-zA-Z -]", " ", speech$text) # allows only letters
speech$text <- tolower(speech$text) # force to lowercase
head(speech$text)

require (quanteda)
speechcorpus<- corpus(speech$text)
#explore the corpus
names(speechcorpus)
summary(speechcorpus) #summary of corpus

dfm.speech<- dfm(speechcorpus,
remove = stopwords("english"),
verbose=TRUE,
stem=TRUE)

topfeatures(dfm.speech, n=50)
# create a custom dictionary
list = c("s", "t","go","now","like","ever","just","even","someth","next","get","got","let"
,"ve","later","never","month","don","didn","know","put","make","thing","made","everthing"
,"turn","day","first","one","today","live","best","great","decid","start","year","can"
,"everyth","everi","way","clear")
dfm_stem<- dfm(dfm.speech,
remove = c(list,stopwords("english")),
verbose=TRUE,
stem=TRUE)
topfeatures(dfm_stem, n=50)
image.png
Top 10 features: life, colleg, drop, love, appl, graduat, work, stori, comput, death
From the top features, we can have a guess that in the commencement speech, Steve Jobs told some stories in this life like dropping out the college and working at Apple creating computers. He also told stories about love, death and his family

2. Words in Context

#exploration in context
kwic(speechcorpus, "dots", 2)
kwic(speechcorpus, "drop", 3)
kwic(speechcorpus, "fired", 3)
kwic(speechcorpus, "death", 3)
image.png
image.png
image.png
image.png
After the exploration of context, we confirmed our guess that Steve Jobs mentioned the stories that he dropped out the college and got fired from Apple. We also get new informations that he told something about connecting the dots and facing death.

3. Words Correlations

#Text Correlation
library(tm)
speech.tm<-convert(dfm.speech, to="tm")
findAssocs(speech.tm, c("colleg","work","appl","drop"), corlimit=0)
image.png
The findAssocs doesn't yield any results shows that the words "colleg","work","appl","drop" don’t have any obvious correlation with the other words in the speech.

4. Word Cloud

install.packages("wordcloud")
install.packages("RColorBrewer")
library(wordcloud)
set.seed(142) #keeps cloud' shape fixed
dark2 <- brewer.pal(8, "Set1")
freq<-topfeatures(dfm_stem, n=100)

wordcloud(names(freq),
freq, max.words=100,
scale=c(3, .1),
colors=brewer.pal(8, "Set1"))
image.png
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.