AI / Data science / machine learning / computer vision / signal processing
What is this?
Most of these fields are what are driving the innovation in our economy today.
There is a lot of overlap in these fields in terms of the tools, real-world applications, and the math.
E.g. How does Amazon know what products you might like?
They use data science to look at the buying behaviors of people in your area, people who have bought things similar to what you’ve bought, etc. In other words, they use data science to create a model of who might be similar to you so that they can recommend products.
A model is just an equation - something like:
How does Alexa understand what you are asking for? They use machine learning. Amazon engineers have recorded hundreds of thousands of voice requests, someone has manually written down (transcribed) what each audio sample is saying (labeled the data) and then input both the transcripts and the raw audio into a neural network or random forest or some other statistical algorithm which then finds correlations and differences between the examples. During this process, the algorithm builds a model based on the data that was given to it.
From there, Amazon can use that model on new recordings of voice requests to guess what you’re looking for.
There can be major sources of bias that cause a data science / machine learning model to behave in unexpected ways. For example, if all of the training data is recorded by people with a British accent, the model may not be able to recognize what Americans are asking for.
Computer vision is the process of editing or analyzing the contents of images and videos using code. An image is a matrix of pixels. In some cases, many algorithms are based on geometry and precalculus (E.g. combining several photos into a panorama). Other times, models may be based on data science and machine learning (e.g. recognizing faces in a photo, restoring old movies).
How can a software detect if you have diabetes just by looking at a picture of your eye? The software developer takes several hundred photos of eyes from various people in various lighting conditions at various times of day, labels which images depict a person with diabetes and which don’t, and use a machine learning algorithm to create model if there are any valid correlations.
As before, training a computer vision model on one sample population and using the model on another population can result in inaccurate predictions and unintended consequences. E.g. many police facial recognition softwares are trained primarily using white faces (because most of the employees at the companies that make these softwares are white, and they may use their own faces or faces of people they know as input). As a result, they are very accurate when differentiating between different white people, but they are notoriously inaccurate when looking at black or latino or asian faces. This means that the wrong person could be put in a jail for a crime they did not commit.
Work is considering “signal processing” if you take an input signal, apply some filter to the signal, and output a modified signal. Signal processing started out in music and electrical engineering, but now also applies to pictures, video, and many other signals.
E.g. When people talk wearing a facemask, everything sounds a bit muffled. The input is the voice, the filter is the mask, and the output is the muffled voice. In signal processing, you can create a model of what the mask is doing to the audio (this is a matrix), and use software to make it sound as if a person wasn’t wearing a mask at all!
Coding languages
Python (requires additional libraries)
R (very easy to learn - many political scientists and business analysts learn this language without knowing computer science or coding beforehand)
Matlab
Since we’re teaching Python for other things anyway, it may make sense to stick with Python here too.
Common Python libraries
Numpy
Pandas
Tensorflow
PyTorch
SciPy
SciKit Learn
Libra (this one is really new. It’s not the most popular, but it is extremely powerful because it hides a lot of the behind the scenes math. Could be good for getting students exposed to analyzing complicated data sets. Should be careful of leaving students with misconceptions though.)
Important techniques / algorithms / math
Principal Component Analysis
Convolutions
Neural Networks
Converting audio to frequencies
Key concepts for high school students to learn
Thinking of datasets as spreadsheets and matrices rather than loops
Thinking of image editing operations as matrices (for example, blurring an image, rotating an image, detecting the outline of an object in an image)
Understanding that most things in life are not just a single variable input - there are correlations between different variables
Thinking of algorithms as Input data → Filter/processing algorithm → Output data
Just like in a bread factory, the same machine (filter processing algorithm) could yield drastically different results (output data) if the dough given to it (input data) is different.
Transforming data from one perspective to another (this is technically linear algebra, but doesn’t require linear algebra to get the intuition. Can be taught in precalculus)
Understanding bias that can be baked into the input
Understanding that the features that computers pick out using machine learning algorithms may be different than the ones humans think of, and yet they lead to pretty similar results most of the time and surprisingly different results on rare occasions.