Gallery

DA_Assignment 6

Steve Jobs Commencement Speech

Explore

Steve Jobs Commencement Speech

Yuting Tian

Xinyu Wu

Yu Zhu

Question Statement:

Using install.packages("readtext") and the function readtext( ), load the .txt file and conduct an analysis of top features, words in context, correlations, and a wordcloud visualization.

1. Top Features of the Speech

install.packages("readtext")

library(readtext)

url<- "https://raw.githubusercontent.com/jcbonilla/BusinessAnalytics/master/BAData/JobsStandfordSpeech.txt"

speech <- readtext(url)

head(speech$text)

speech$text <- gsub("'", "", speech$text) # remove apostrophes

speech$text <- gsub("[[:punct:]]", " ", speech$text) # replace punctuation with space

speech$text <- gsub("[[:cntrl:]]", " ", speech$text) # replace control characters with space

speech$text <- gsub("^[[:space:]]+", "", speech$text) # remove whitespace at beginning of documents

speech$text <- gsub("[[:space:]]+$", "", speech$text) # remove whitespace at end of documents

speech$text <- gsub("[^a-zA-Z -]", " ", speech$text) # allows only letters

speech$text <- tolower(speech$text) # force to lowercase

head(speech$text)

require (quanteda)

speechcorpus<- corpus(speech$text)

#explore the corpus

names(speechcorpus)

summary(speechcorpus) #summary of corpus

dfm.speech<- dfm(speechcorpus,

remove = stopwords("english"),

verbose=TRUE,

stem=TRUE)

topfeatures(dfm.speech, n=50)

# create a custom dictionary

list = c("s", "t","go","now","like","ever","just","even","someth","next","get","got","let"

,"ve","later","never","month","don","didn","know","put","make","thing","made","everthing"

,"turn","day","first","one","today","live","best","great","decid","start","year","can"

,"everyth","everi","way","clear")

dfm_stem<- dfm(dfm.speech,

remove = c(list,stopwords("english")),

verbose=TRUE,

stem=TRUE)

topfeatures(dfm_stem, n=50)

⁠

Top 10 features: life, colleg, drop, love, appl, graduat, work, stori, comput, death

From the top features, we can have a guess that in the commencement speech, Steve Jobs told some stories in this life like dropping out the college and working at Apple creating computers. He also told stories about love, death and his family

2. Words in Context

#exploration in context

kwic(speechcorpus, "dots", 2)

kwic(speechcorpus, "drop", 3)

kwic(speechcorpus, "fired", 3)

kwic(speechcorpus, "death", 3)

⁠

After the exploration of context, we confirmed our guess that Steve Jobs mentioned the stories that he dropped out the college and got fired from Apple. We also get new informations that he told something about connecting the dots and facing death.

3. Words Correlations

#Text Correlation

library(tm)

speech.tm<-convert(dfm.speech, to="tm")

findAssocs(speech.tm, c("colleg","work","appl","drop"), corlimit=0)

⁠

The findAssocs doesn't yield any results shows that the words "colleg","work","appl","drop" don’t have any obvious correlation with the other words in the speech.

4. Word Cloud

install.packages("wordcloud")

install.packages("RColorBrewer")

library(wordcloud)

set.seed(142) #keeps cloud' shape fixed

dark2 <- brewer.pal(8, "Set1")

freq<-topfeatures(dfm_stem, n=100)

wordcloud(names(freq),

freq, max.words=100,

scale=c(3, .1),

colors=brewer.pal(8, "Set1"))

⁠

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.