# Week 2 Mentor Guide

Mentor Guideline
TEAM CHALLENGE #1 - Advanced Data Analytics

Mentor notes:
In this challenge, learners are asked to analyze data from Smöl, a chain of European drug stores. The data can be viewed
and is summarized here:
Search
1
Data Field
Explanation
2
Store
A unique ID for the 1,115 stores tracked in this data
3
CompetitionDistance
distance in meters to the nearest competitor store
4
CompetitionOpenSince[Month/Year]
gives the approximate year and month of the time the nearest competitor was opened
5
StoreType
we have four types of stores:
A - Our smallest (smölest) store - kiosks that are roughly 4-5 square meters, often found at train stations and other transportation hubs
B - Corner convenience stores ranging from 50-200 square meters. Similar to 7-11 and found most commonly in city centers.
C - Stand alone stores, approximately 500 - 1,000 square meters. Typically in city outskirts, smaller towns and suburbs.
D - Stores in malls, with sizes ranging from 200-1,000 square meters
6
Promo
This data deals with promotions the store was running. We will not be using it for this exercise.
7
Sales
The total revenue generated by that store per day. For the purposes of this assignment, assume all has been converted to USD (\$)
8
Customers
The total number of customers who purchased something at the store that day.
9
Open
an indicator for whether the store was open: 0 = closed, 1 = open
10
Promo
indicates whether a store is running a promo on that day
11
StateHoliday
indicates a state holiday. Normally all stores, with few exceptions, are closed on state holidays. Note that all schools are closed on public holidays and weekends.
a = public holiday, b = Easter holiday, c = Christmas, 0 = no holiday
12
SchoolHoliday
indicates if the store was affected by the closure of public schools on a specific date (1 = affected, 0 = not affected)
There are no rows in this table

Use the ‘V-Lookup’ function to pull the fields ‘CompetitionDistance’ and ‘CompetitionMonthsOpen’ from the Store_Info to the ‘Sales_Data’ table. (Hint - you can do this by matching ‘Store’).
After finishing the lookups, figure out what percentage of stores have a competitor within a kilometer (1000 meters) as well as what percentage of stores have a competitor that opened up since this data collection started (April 2019).
In the transactions tab, create a pivot table that shows the mean “customers” by “StateHoliday” and “SchoolHoliday” (Hint - choose “customers” as the row and “StateHoliday” and “SchoolHoliday” as the columns).
Your team is curious about the effect of having close competitors on revenue and number of customers. Create a visualization that helps illustrate this—note, this is an open-ended assignment, so be creative!
Finally, based on your visualizations above, summarize your insights about the effects of close competitors on revenue and number of customers.

Learners are first to find the top 10 highest grossing stores. There are many ways to do this—a basic way would be to sort stores so they are listed in order and sum revenue. However, this will be time consuming; a quicker way to do so is using a Pivot Table.
Note - this may look somewhat different if done in Excel:
.
Then, sort the Stores by SUM (descending) to find the top 10:
Next, learners are asked to create a summary statistics table in the Store_Info tab that lists the mean, median, standard deviation, max, and min for the fields “sales” and “customers” by store (they will have to add customer info to the pivot table to accomplish this). Sheets does not have a ‘summary statistics’ function, so each value (mean, median, etc) will have to be calculated on its own.
The next task is to create a new field called ‘CompetitionMonthsOpen’. This field should contain the number of months the competition has been open up to the start of this data (start of April 2019). For instance, if a store opened up in March 2018, the CompetitionMonthsOpen will be 13 (1 year * 12 months + 1 month). If a store opened up more recently than April 2019, simply enter 0.
One way to do this is to create a formula such as:
=((2019-E2)*12)+(4-D2)
In this case, we first figure out how many years have passed by subtracting 2019 by the year of opening and multiplying it by 12 months. Since the store opened in April (4), we subtract any additional months.
Learners can then use an IF statement to adjust newer stores (which will come out as negatives) to 0 or remove missing entries.
Then, learners have a VLOOKUP challenge:
Use the ‘V-Lookup’ function to pull the fields ‘CompetitionDistance’ and ‘CompetitionMonthsOpen’ from the Store_Info to the ‘Sales_Data’ table. (Hint - you can do this by matching ‘Store’).
Note that learners may struggle with this concept (they have in previous challenges). Here’s a brief tutorial/ one way of breaking it down:
We know we want Competition Distance on the Sales_Data table, so let’s start by creating a new column. It doesn’t matter where it goes; I put mine next to customers:

That’s where we’re going to enter our VLOOKUP formula. First we want to choose the search_key, or the ID that will be matched. In this case, that’s our Store column. So the search_key is A2.
Next, we want to choose the columns it will search in (aka Range). In this case, it will need to search the Store and CompetitionDistance columns. I just selected all of it, including the Store Type, rather than specifically isolating those columns.
NOTE - make sure this is an absolute reference (use \$)—since we will be copying and pasting this formula to multiple cells.
Finally, we need to choose which column the VLOOKUP will draw the value from—the index. Because CompetitionDistance is the third column, I will choose 3:

I then used that same formula to populate the entire spreadsheet:
And that’s it! Once they get it once, it will be a lot simpler! Here’s the formula for CompetitionMonthsOpen:

Finally, learners are asked to create visualizations that demonstrate the effect of having close competitors on revenue and number of customers. Personally, I would recommend the following basic example (note - learners can get much more creative if they’d like):
Define ‘close competitor’. In this case, I will say anything <1000m is a close competitor.
Calculate mean Revenue and # of Customers based on close vs distant competitors.
Create charts showing the differences (if any).

In order to get more advanced, learners could:
Create various ‘tiers’ of distance (e.g. close, medium, distant)
Difficult: Control for store type (a,b,c,d) and make those comparisons
Very Difficult: Control for both school and state holidays to ensure comparisons are accurate