Skip to content
Email Archive
dolphin.jpg
Yesterday was fun because we got to physically act out math when recreating Netflix's recommendation engine using clustering in a 3-dimensional graph. I hope morale stays high among the kids. Everybody is getting some answers right, but nobody is getting all of them.
Class Materials - Vocabulary
, Strategy - Make A List
, In class problems
.
Extra MO Problems -
. Problems, problems with hints, and then solutions.
Mailing List - Up to 61 subscribers, which feels like it might be the peak. Math loving friends can sign up
or email me at rbarrows@alum.mit.edu
Math In The Real World (pdf for printing
)
image.png
image.png
Netflix is great because they have millions of hours of movies and TV shows that you can watch. But you can't really use Google to find the right content like you do for a web page.
Have you used Netflix on a TV at home without a keyboard to type on?
How did you find something to watch without typing in the name of the show?
How do they know which shows to show you in those lists?
Are there shows in the list that you've never watched before?
How do they know what kind of shows you might like?
The solution to those problems comes from something called a Recommendation Engine. Netflix has a very good one, but you can find then all over once you know what they look like.
Can you think of other examples?
Amazon tells you what you might like to buy, Tik Tok tells you what dance moves are trending, and Facebook seems to know who you might want to be friends with even though you tried to keep it a secret...which often looks creepy, they probably know you are stalking them.
Let's stick with Netflix for now though, because they had to solve this problem before most others. People usually don't know what movies they want to watch until they're presented with a few options.
Anyone have any ideas for how to build a recommendation engine?
One thing Netflix had from the start was a rating system. That way they always know what you liked an what you didn't like. You could then use that info in a few ways.
One way might be to look at who your friends are and what they liked. If they liked it, you might like it too.
They could also look at your age or gender. Maybe most 10 year old girls like watching mermaid videos.
Maybe they can look at the videos that you liked and see that they are all PG-13 comedies and cartoons. Then they can recommend to you the most popular comedies and cartoons that you haven't watched yet.
These are all pretty simple ideas that produce OK results, but they're certainly not best. In fact, back when Netflix was starting to get popular they realized they had a problem finding the best recommendations, so they announced The Netflix Challenge. Anybody in the world was invited to design their own recommendation engine that Netflix could use. If the winner was able to improve on what Netflix already had, they would win one million dollars.

Clustering
We said that if you knew someone's friends, you could probably come up with good recommendations based on what they liked. But what if you didn't know who their friends were? Find them some fake friends based on people who have the same taste! This is done using a technique called clustering.
Let's take a look at the simplest cluster you can have. This is what we call one dimensional.

Do you like sports?

←- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - →
xxx zzzzzzz
Really dislike. Love all sports.

We have just clustered people into two groups based on how much you like sports. As simple as this is, it's probably pretty useful for figuring out, not just if you like sports, but if you like other things as well.
Now let's look at a two dimensional cluster. In addition to liking Sports, lets also graph how much you like Cartoons. How many groups will we have here?
image.png
Top Right - likes Sports, likes Cartoons
Bottom Right - likes Sports, but dislikes Cartoons
Bottom Left - doesn't like Sports, but likes Cartoons
Top Left - doesn't like Sports, doesn't like Cartoons
Now we have created four groups based on how you feel about cartoons and sports. If you like both, you might like the movie Cars. If you only like cartoons but don't want to watch sports, Toy Story 4 might be good. If you don't like cartoons but like sports, maybe Alex and Me is a good movie to check out. And if you don't like sports or cartoons, try watching Annedroids on Amazon Prime, so at least you can learn how to build robots.
Now let's talk about a three dimensional cluster. The next dimension we'll add is if you're in 5th grade, stand on your chair. How many groups will we have here?
image.png

There are eight ways to split these people up.
Like Sports, like Cartoons, in 5th grade.
Like Sports, like Cartoons, in 4th grade.
Like Sports, don't like Cartoons, in 5th grade.
Like Sports, don't like Cartoons, in 4th grade.
Etc.
So we now have eight distinct groups that people can be in. We can consider all the people in each group to be friends, and then recommend movies to them based on what their other friends like. Even though we didn't ask any questions about Action movies, we might find out that they are very popular among 5th graders that like sports but don't like cartoons.
How many groups of people would there be in a four dimensional cluster? 16. Can you visualize what a four dimensional cluster would look like? No? Neither can I! But you don't need to. Computers can do the math whether you can visualize it or not.
How many groups of people would there be in five dimensions? 32. 10 dimensions? 1024. 20 dimensions? Just over one million groups. Twenty questions about the types of movies you like isn't crazy, especially when you consider they don't even need you to rate the categories. They can just guess based on how many comedies you've watched, how many cartoons you've watched, etc.
But who can see the problem here as you start getting millions of groups?
As you get to millions of groups there's a good chance that a lot of people will be in their own group, all by themselves. Not very helpful then to figure out what other people like you have watched. The main way to avoid this is to just select your dimensions wisely. If you have 100 million subscribers, you might want to limit you graph to about 16 dimensions, which would give you 64,000 groups.
On average how many people would be in each group?
About 1,500. That seems like a good number of people, but you'd of course want to play around with more or less and see how things go.
Finally, we can actually do even better than just splitting the world into groups like this though. People tend to have similar interests, so we can take advantage of this to find better groupings of people. Here is an example of where the clusters end up in real life for 3 dimensions.
image.png

And as a final note, the $1,000,000 Netflix prize, which was announced in 2006, was won in 2009 by a team of researchers. Math in the real world paid off again!
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.