 DA-HW4-Xinyu Wu
Share
Explore ## Question 1: Winter Dummy Variable

### 1. Patterns of the Data

`df_soup = read.csv("https://raw.githubusercontent.com/jcbonilla/BusinessAnalytics/master/BAData/Progresso_Soup.csv")``str(df_soup)``df_soup\$Month = as.factor(df_soup\$Month)`
`#1.1 Data Patterns ``#Figure out the pattern from 5 aspects:``#region sale, month sale, cumulative region sales, cumulative month sales, and competitive price percentage.`
To get access to the patterns of data, 5 aspects could be analyzed based on region sales, month sales, cumulative region sales, cumulative month sales, and competitive price percentage.
a) Region Sale
`df_soup_region = aggregate(df_soup\$Sales.Progresso, by=list(df_soup\$Region), sum) ``summary(df_soup_region)``barplot(df_soup_region\$x, names.arg = df_soup_region\$Group.1, main='Sales by Region')``> summary(df_soup_region)`` Group.1 x `` Length:4 Min. :13193184 `` Class :character 1st Qu.:14280310 `` Mode :character Median :15510411 `` Mean :20738548 `` 3rd Qu.:21968649 `` Max. :38740183 `
Region Sale
0
FOUND:
East region has the highest sales and south has the lowest sales.
East region has much higher selling comparing to other three regions.
NEXT MOVE:
It is recommend to dig into the east and south data to figure out why it sells the most and the least.
Figure out why the difference between east and other region has such a large difference.

b) Month Sale
`df_soup_month = aggregate(df_soup\$Sales.Progresso , by=list(df_soup\$Month), sum) ``summary(df_soup_month)`
`> summary(df_soup_month)`` Group.1 x `` 1 :1 Min. : 2103939 `` 2 :1 1st Qu.: 3064259 `` 3 :1 Median : 7506196 `` 4 :1 Mean : 6912849 `` 5 :1 3rd Qu.:10343104 `` 6 :1 Max. :11921757`
Month Sale
0
FOUND:
January has the highest sales and June has the lowest sales.
Winter sales (Oct, Nov, Dec, Jan, and Feb) are higher than other seasonal sale.
Summer sales’ performance (May, June, and July) is not good.
NEXT MOVE:
Find out if the selling performance related to Weather condition since winter is cold and people are inclined to drink soup.
Find out other reasons like discount or activities responding to sale change.

c) Cumulative Region Sale
`df_soup_region_cul = aggregate(df_soup\$Category_Sales, by=list(df_soup\$Region), sum) ``summary(df_soup_region_cul)``barplot(df_soup_region_cul\$x, names.arg = df_soup_region_cul\$Group.1, main='Cumulative Sales by Region')``> summary(df_soup_region_cul)`` Group.1 x `` Length:4 Min. :69744710 `` Class :character 1st Qu.:73754300 `` Mode :character Median :82707495 `` Mean :83664188 `` 3rd Qu.:92617383 `` Max. :99497054 `
Cumulative Region Sale
0
FOUND:
East region has the highest sales and west has the lowest sales.
NEXT MOVE:
It is recommend to dig into the east and west data to figure out why it sells the most and the least.

d) Cumulative Month Sale
`df_soup_month_cul = aggregate(df_soup\$Category_Sales , by=list(df_soup\$Month), sum) ``summary(df_soup_month_cul)``barplot(df_soup_month_cul\$x, names.arg = df_soup_month_cul\$Group.1, main='Cumulative Sales by Month')``> summary(df_soup_month_cul)`` Group.1 x `` 1 :1 Min. :14785276 `` 2 :1 1st Qu.:18502298 `` 3 :1 Median :29149972 `` 4 :1 Mean :27888063 `` 5 :1 3rd Qu.:38456885 `` 6 :1 Max. :41466885 `
Cumulative Month Sale
0
FOUND:
December has the highest sales and June has the lowest sales.
Winter sales (Oct, Nov, Dec, Jan, and Feb) are still higher than other seasonal sale.
Summer sales’ performance (May, June, and July) is still not good.
NEXT MOVE:
Find out if the cumulative selling performance related to Weather condition since winter is cold and people are inclined to drink soup.
Find out other reasons like discount or activities responding to sale change.
e) Competitive Price Percentage
`#Subset data to see the percentage of competitive price(low) comapring with competitors.``df_soup_com_price = subset(df_soup, `` df_soup\$Price.Progresso < df_soup\$Price.Campbell & `` df_soup\$Price.Progresso < df_soup\$Price.PL)``df_soup_com_price_1 = subset(df_soup, `` df_soup\$Price.Progresso > df_soup\$Price.Campbell & `` df_soup\$Price.Progresso < df_soup\$Price.PL)``df_soup_com_price_2 = subset(df_soup, `` df_soup\$Price.Progresso < df_soup\$Price.Campbell & `` df_soup\$Price.Progresso > df_soup\$Price.PL)``competitive_price_rate = nrow(df_soup_com_price)/nrow(df_soup)``competitive_price_rate``competitive_price_rate_1 = nrow(df_soup_com_price_1)/nrow(df_soup)``competitive_price_rate_2 = nrow(df_soup_com_price_2)/nrow(df_soup)``competitive_price_rate_mid = competitive_price_rate_1 + competitive_price_rate_2``competitive_price_rate_mid`
FOUND:
The percentage of lowest price of Progresso among the two competitors is 6.68%.
The percentage of mid price of Progresso among the two competitors is 19.67%.
Comparing with other two competitors (Campbell and PL), Progresso usually holds a higher price.
NEXT MOVE:
Dig into higher and lower price comparing with other competitors based on different region and month.
List out the price change during different region and month for every competitors.

### 2. Patterns of Winter Data

`#Create dummy variables for "Winter"``df_soup\$winter_Oct = ifelse(df_soup\$Month == "10",1, 0)``df_soup\$winter_Nov = ifelse(df_soup\$Month == "11",1, 0)``df_soup\$winter_Dec = ifelse(df_soup\$Month == "12",1, 0)``df_soup\$winter_Jan = ifelse(df_soup\$Month == "1",1, 0)``df_soup\$winter_Feb = ifelse(df_soup\$Month == "2",1, 0)``df_soup\$winter = ifelse(df_soup\$Month == "10" | df_soup\$Month =="11" |`` df_soup\$Month == "12" | df_soup\$Month == "1" |`` df_soup\$Month == "2",1, 0)``df_soup\$non_winter = ifelse(df_soup\$Month != "10" & df_soup\$Month !="11" &`` df_soup\$Month != "12" & df_soup\$Month != "1" &`` df_soup\$Month != "2",1, 0)`
To get access to the patterns of winter data, 5 aspects could be analyzed based on region sales, month sales, cumulative region sales, cumulative month sales, and competitive price percentage.
a) Region Sale
`df_soup_region_win = aggregate(df_winter\$Sales.Progresso, by=list(df_winter\$Region), sum)``summary(df_soup_region_win)``> summary(df_soup_region_win)`` Group.1 x `` Length:4 Min. : 8631430 `` Class :character 1st Qu.: 9201755 `` Mode :character Median : 9987724 `` Mean :13559340 `` 3rd Qu.:14345310 `` Max. :25630483 `
Region Sale in Winter
0
FOUND:
East region has the highest sales and south has the lowest sales in winter.
East region has much higher selling comparing to other three regions in winter.
NEXT MOVE:
It is recommend to dig into the east and south data to figure out why it sells the most and the least in winter.
Figure out why the difference between east and other region has such a large difference for winter.
b) Month Sale
`df_soup_month_win = aggregate(df_winter\$Sales.Progresso , by=list(df_winter\$Month), sum) ``summary(df_soup_month_win)``> summary(df_soup_month_win)`` Group.1 x `` 1 :1 Min. : 9805258 `` 2 :1 1st Qu.:10316008 `` 10 :1 Median :10424391 `` 11 :1 Mean :10847472 `` 12 :1 3rd Qu.:11769947 `` 3 :0 Max. :11921757 `` (Other):0 `
Month Sale in Winter
0
FOUND:
January has the highest sales and February has the lowest sales.
NEXT MOVE:
Find out if the selling performance related to Weather condition among winter month.
c) Cumulative Region Sale
`df_soup_region_cul_win = aggregate(df_winter\$Category_Sales, by=list(df_winter\$Region), sum) ``summary(df_soup_region_cul_win)``> summary(df_soup_region_cul_win)`` Group.1 x `` Length:4 Min. :39069926 `` Class :character 1st Qu.:41758940 `` Mode :character Median :46555127 `` Mean :47717724 `` 3rd Qu.:52513911 `` Max. :58690714`
Cumulative Region Sale in Winter
0
FOUND:
East region has the highest sales and west has the lowest sales.
NEXT MOVE:
It is recommend to dig into the east and west data to figure out why it sells the most and the least.
d) Cumulative Month Sale
`df_soup_month_cul_win = aggregate(df_winter\$Category_Sales , by=list(df_winter\$Month), sum) ``summary(df_soup_month_cul_win)``> summary(df_soup_month_cul_win)`` Group.1 x `` 1 :1 Min. :32237737 `` 2 :1 1st Qu.:38143448 `` 10 :1 Median :39397196 `` 11 :1 Mean :38174179 `` 12 :1 3rd Qu.:39625629 `` 3 :0 Max. :41466885 `` (Other):0 `
Cumulative Month Sale in Winter
0
FOUND:
December has the highest sales and February has the lowest sales.
NEXT MOVE:
Find out if the cumulative selling performance related to Weather condition among winter month.
e) Competitive Price Percentage
`#Subset Winter data to see the percentage of competitive price(low) comapring with competitors.``df_soup_com_price_win = subset(df_winter, `` df_winter\$Price.Progresso < df_winter\$Price.Campbell & `` df_winter\$Price.Progresso < df_winter\$Price.PL)` 