Skip to content
DA-HW4-Xinyu Wu
Share
Explore

icon picker
Problem 1

Progresso Soup Sales

Question 1: Winter Dummy Variable

1. Patterns of the Data

df_soup = read.csv("https://raw.githubusercontent.com/jcbonilla/BusinessAnalytics/master/BAData/Progresso_Soup.csv")str(df_soup)df_soup$Month = as.factor(df_soup$Month)
#1.1 Data Patterns #Figure out the pattern from 5 aspects:#region sale, month sale, cumulative region sales, cumulative month sales, and competitive price percentage.
To get access to the patterns of data, 5 aspects could be analyzed based on region sales, month sales, cumulative region sales, cumulative month sales, and competitive price percentage.
a) Region Sale
df_soup_region = aggregate(df_soup$Sales.Progresso, by=list(df_soup$Region), sum) summary(df_soup_region)barplot(df_soup_region$x, names.arg = df_soup_region$Group.1, main='Sales by Region')> summary(df_soup_region) Group.1 x Length:4 Min. :13193184 Class :character 1st Qu.:14280310 Mode :character Median :15510411 Mean :20738548 3rd Qu.:21968649 Max. :38740183
Region Sale
0
FOUND:
East region has the highest sales and south has the lowest sales.
East region has much higher selling comparing to other three regions.
NEXT MOVE:
It is recommend to dig into the east and south data to figure out why it sells the most and the least.
Figure out why the difference between east and other region has such a large difference.

b) Month Sale
df_soup_month = aggregate(df_soup$Sales.Progresso , by=list(df_soup$Month), sum) summary(df_soup_month)
> summary(df_soup_month) Group.1 x 1 :1 Min. : 2103939 2 :1 1st Qu.: 3064259 3 :1 Median : 7506196 4 :1 Mean : 6912849 5 :1 3rd Qu.:10343104 6 :1 Max. :11921757
Month Sale
0
FOUND:
January has the highest sales and June has the lowest sales.
Winter sales (Oct, Nov, Dec, Jan, and Feb) are higher than other seasonal sale.
Summer sales’ performance (May, June, and July) is not good.
NEXT MOVE:
Find out if the selling performance related to Weather condition since winter is cold and people are inclined to drink soup.
Find out other reasons like discount or activities responding to sale change.

c) Cumulative Region Sale
df_soup_region_cul = aggregate(df_soup$Category_Sales, by=list(df_soup$Region), sum) summary(df_soup_region_cul)barplot(df_soup_region_cul$x, names.arg = df_soup_region_cul$Group.1, main='Cumulative Sales by Region')> summary(df_soup_region_cul) Group.1 x Length:4 Min. :69744710 Class :character 1st Qu.:73754300 Mode :character Median :82707495 Mean :83664188 3rd Qu.:92617383 Max. :99497054
Cumulative Region Sale
0
FOUND:
East region has the highest sales and west has the lowest sales.
NEXT MOVE:
It is recommend to dig into the east and west data to figure out why it sells the most and the least.

d) Cumulative Month Sale
df_soup_month_cul = aggregate(df_soup$Category_Sales , by=list(df_soup$Month), sum) summary(df_soup_month_cul)barplot(df_soup_month_cul$x, names.arg = df_soup_month_cul$Group.1, main='Cumulative Sales by Month')> summary(df_soup_month_cul) Group.1 x 1 :1 Min. :14785276 2 :1 1st Qu.:18502298 3 :1 Median :29149972 4 :1 Mean :27888063 5 :1 3rd Qu.:38456885 6 :1 Max. :41466885
Cumulative Month Sale
0
FOUND:
December has the highest sales and June has the lowest sales.
Winter sales (Oct, Nov, Dec, Jan, and Feb) are still higher than other seasonal sale.
Summer sales’ performance (May, June, and July) is still not good.
NEXT MOVE:
Find out if the cumulative selling performance related to Weather condition since winter is cold and people are inclined to drink soup.
Find out other reasons like discount or activities responding to sale change.
e) Competitive Price Percentage
#Subset data to see the percentage of competitive price(low) comapring with competitors.df_soup_com_price = subset(df_soup, df_soup$Price.Progresso < df_soup$Price.Campbell & df_soup$Price.Progresso < df_soup$Price.PL)df_soup_com_price_1 = subset(df_soup, df_soup$Price.Progresso > df_soup$Price.Campbell & df_soup$Price.Progresso < df_soup$Price.PL)df_soup_com_price_2 = subset(df_soup, df_soup$Price.Progresso < df_soup$Price.Campbell & df_soup$Price.Progresso > df_soup$Price.PL)competitive_price_rate = nrow(df_soup_com_price)/nrow(df_soup)competitive_price_ratecompetitive_price_rate_1 = nrow(df_soup_com_price_1)/nrow(df_soup)competitive_price_rate_2 = nrow(df_soup_com_price_2)/nrow(df_soup)competitive_price_rate_mid = competitive_price_rate_1 + competitive_price_rate_2competitive_price_rate_mid
FOUND:
The percentage of lowest price of Progresso among the two competitors is 6.68%.
The percentage of mid price of Progresso among the two competitors is 19.67%.
Comparing with other two competitors (Campbell and PL), Progresso usually holds a higher price.
NEXT MOVE:
Dig into higher and lower price comparing with other competitors based on different region and month.
List out the price change during different region and month for every competitors.

2. Patterns of Winter Data

#Create dummy variables for "Winter"df_soup$winter_Oct = ifelse(df_soup$Month == "10",1, 0)df_soup$winter_Nov = ifelse(df_soup$Month == "11",1, 0)df_soup$winter_Dec = ifelse(df_soup$Month == "12",1, 0)df_soup$winter_Jan = ifelse(df_soup$Month == "1",1, 0)df_soup$winter_Feb = ifelse(df_soup$Month == "2",1, 0)df_soup$winter = ifelse(df_soup$Month == "10" | df_soup$Month =="11" | df_soup$Month == "12" | df_soup$Month == "1" | df_soup$Month == "2",1, 0)df_soup$non_winter = ifelse(df_soup$Month != "10" & df_soup$Month !="11" & df_soup$Month != "12" & df_soup$Month != "1" & df_soup$Month != "2",1, 0)
To get access to the patterns of winter data, 5 aspects could be analyzed based on region sales, month sales, cumulative region sales, cumulative month sales, and competitive price percentage.
a) Region Sale
df_soup_region_win = aggregate(df_winter$Sales.Progresso, by=list(df_winter$Region), sum)summary(df_soup_region_win)> summary(df_soup_region_win) Group.1 x Length:4 Min. : 8631430 Class :character 1st Qu.: 9201755 Mode :character Median : 9987724 Mean :13559340 3rd Qu.:14345310 Max. :25630483
Region Sale in Winter
0
FOUND:
East region has the highest sales and south has the lowest sales in winter.
East region has much higher selling comparing to other three regions in winter.
NEXT MOVE:
It is recommend to dig into the east and south data to figure out why it sells the most and the least in winter.
Figure out why the difference between east and other region has such a large difference for winter.
b) Month Sale
df_soup_month_win = aggregate(df_winter$Sales.Progresso , by=list(df_winter$Month), sum) summary(df_soup_month_win)> summary(df_soup_month_win) Group.1 x 1 :1 Min. : 9805258 2 :1 1st Qu.:10316008 10 :1 Median :10424391 11 :1 Mean :10847472 12 :1 3rd Qu.:11769947 3 :0 Max. :11921757 (Other):0
Month Sale in Winter
0
FOUND:
January has the highest sales and February has the lowest sales.
NEXT MOVE:
Find out if the selling performance related to Weather condition among winter month.
c) Cumulative Region Sale
df_soup_region_cul_win = aggregate(df_winter$Category_Sales, by=list(df_winter$Region), sum) summary(df_soup_region_cul_win)> summary(df_soup_region_cul_win) Group.1 x Length:4 Min. :39069926 Class :character 1st Qu.:41758940 Mode :character Median :46555127 Mean :47717724 3rd Qu.:52513911 Max. :58690714
Cumulative Region Sale in Winter
0
FOUND:
East region has the highest sales and west has the lowest sales.
NEXT MOVE:
It is recommend to dig into the east and west data to figure out why it sells the most and the least.
d) Cumulative Month Sale
df_soup_month_cul_win = aggregate(df_winter$Category_Sales , by=list(df_winter$Month), sum) summary(df_soup_month_cul_win)> summary(df_soup_month_cul_win) Group.1 x 1 :1 Min. :32237737 2 :1 1st Qu.:38143448 10 :1 Median :39397196 11 :1 Mean :38174179 12 :1 3rd Qu.:39625629 3 :0 Max. :41466885 (Other):0
Cumulative Month Sale in Winter
0
FOUND:
December has the highest sales and February has the lowest sales.
NEXT MOVE:
Find out if the cumulative selling performance related to Weather condition among winter month.
e) Competitive Price Percentage
#Subset Winter data to see the percentage of competitive price(low) comapring with competitors.df_soup_com_price_win = subset(df_winter, df_winter$Price.Progresso < df_winter$Price.Campbell & df_winter$Price.Progresso < df_winter$Price.PL)
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.