AI in Education Toolkit for Racial Equity

AI in Education Toolkit for Racial Equity: How to mitigate racial bias in the design and development of your products

How To Use

Ideation

Logical Assumptions

Explore

Ideation

This section explains how to ideate with edtech products while keeping racial bias in mind. Ideation includes all the steps from your first idea "spark" to the decision that your idea is worth executing. You might think this happens on the business or product sideーbut for an AI engineer, ideation is one of the most critical steps in the software development process. If you don’t think very carefully about whether AI is the right tool for your problem, you may later find that your AI solution wasn’t the right fitーyou’ll have wasted a lot of effort. At this stage, you need to clearly articulate what you hope to do with AI and why using AI is the right decision. Make sure you consider the education context in which it will be used to answer these questions. After this section, you'll explore your

Logical Assumptions⁠

🎯 Goals

Articulate the value proposition of your idea before you look at the data

Evaluate whether AI or ML is appropriate for your scenario

Define the risks to racial equity if things go wrong

Appropriately scope the required resources to achieve your goal

🚧 Caution!

If you’ve collected a lot of data, and are now considering what AI can do for youーthis is an especially important step for you:

Talk to educators and Black and Brown students who will eventually use your product. Start your design and development process from a deep understanding of a specific educational context.

Understand student and teacher needs: What makes things difficult for them? What would it look like if the problem went away? The steps in this toolkit will help you identify dangerous assumptions you might otherwise make along the way.

✅ Activities for Ideation

Activity 1: Collect and Vet Ideas with Your Team AND Your Users

Step 1: In no more than 3 sentences, use the table below to explain your idea. You don’t need to know how you'll make it work yet, but if you had a magic wand that brought your idea to life, what would happen? Before you begin development, it's critical that every product manager and engineer on your team is on the same page about your goal. Make sure that's the case, and socialize conceptions with everyone on the team.

AI Idea Collection

AI Idea Collection

What's the idea?

Submitted By

Feedback and Concerns

We will personalize in real-time the order and type of lessons students see, based on their interests and level of understanding.

Madison Jacobs

I like this idea because...

We will assess each kid's understanding of math and literacy and highlight areas of weakness to the teacher.

Nidhi Hebbar

I have concerns about this idea because...

We will recommend the most engaging and effective content for each student on any given topic.

Matthew Volk

I like this idea because...

We will flag students who are at risk of dropping out or having a behavioral issue, before it's too late.

Liz Allendorf

I have concerns about this idea because...

There are no rows in this table

⁠

Step 2: Make sure that teachers, students, and parents would value and welcome your idea. Talk to your users, and make sure students – Black and Brown students – are included. What questions or concerns do they have? Even if you feel your idea is still worth exploring, you should revisit their concerns later in the process. Involve your users at several steps in the design and development process to make sure you aren’t later surprised by their concerns.

User Interviews and Feedback

User Interviews and Feedback

Interview date

User Type

User Feeling

User Concerns

March 3, 2020

Two teachers at a low-income public school

Concerned that flagging students who are at-risk will make certain students appear a "lost cause".

March 6, 2020

Four students at a small private school

Excited to have more interesting content. Don't like that it reports to their teacher how many pages the students read.

March 11, 2020

Parents of 3 K-12 students in an urban public school

Not comfortable with discipline data being used for predictive purposes. Worried that app may not be able to understand students struggling to learn English.

There are no rows in this table

⁠

Activity 2: Is AI or ML the Right Decision?

Make sure you understand what AI and ML are. In the Appendix, you can learn more on the difference between

AI vs. ML⁠

In brief, an artificial intelligence (AI) algorithm is any algorithm that simulates intelligent thought. Often, such algorithms simply use data as an input with a

set of rules⁠

to produce an output – this could be a recommendation, a label, or a decision. These rules might be the complex math that allows Google search to identify cat pictures. But AI can also be more basic. For example, if you’re ever played PacMan, you have seen AI in action: a programmer wrote code that tells the ghosts precisely what to do: change direction when you hit a wall, follow a set of rules (algorithm) to get to PacMan the fastest, turn blue when PacMan eats a yellow pill, and so on.

AI algorithms are just a set of rules that produce different kinds of output based on what data you give it. By this definition, you most likely use AI in your software alreadyーeven if it’s simple if/else statements. Your team must understand the logic and assumptions behind these sets of rules, and this toolkit will still help you uncover potential bias even if you never decide to use a more complicated form of AI, machine learning.

Machine learning (ML) is a type of AI that learns over time how to do something. We recommend exploring supervised ML, in which you provide a "machine" with a task (something you want the algorithm to do) as well as a dataset (which contains examples of tasks and their "correct" outcomes). If you’re new to ML, we don’t recommend

unsupervised ML⁠

deep learning algorithms⁠

in the education context. The indiscernibility of deep learning algorithms make it more difficult to both find and address bias. That being said, all algorithms, whether they are deep learning or traditional machine learning, should be checked for bias. The key is to test your algorithms and ultimately your products to ensure they serve Black and Brown students equally as well as they serve other groups of students. Below are some useful criteria to evaluate whether or not ML is a good idea for your product.

Activity 3: Evaluate Whether ML Makes Sense for You

Not every good idea will satisfy the below criteria, but all reasonable scenarios should at least address the below issues with a mitigation and disclosure plan.

You want to detect, predict, or infer something that people generally agree on when they see it, or that has a clear, well-defined answer.

Bad question: “Is this a good student?”

Better question: “Is this student going to drop out before s/he graduates?”

Even better question: “Is this student going to get a C or higher in all courses this year?”

You have a ton of data. A useful metric is at least 1,000 examples per “class” of data. If you are trying to determine if something is a plant or a flower, you have at least 1,000 pictures of plants and at least 1,000 pictures of flowers.

Your problem is difficult to scale. For instance, maybe a teacher can’t decipher 10,000 images of student-drawn plants and flowers in the next hour. An algorithm would help meet the deadline, even if the results are less than perfect.

If your algorithm guesses wrong, limited tangible harm is done to any human. If there is potential to harm a human, safeguards should be put in place to allow another human to step in and make corrections before real damage is done. Your algorithm should be a helper for a human in charge, not a standalone decision-maker. For example, misidentifying a flower from a plant may not be a huge deal on its own. But if that misidentification determines which academic track a student is placed on, you should make sure a teacher can review and intervene. Similarly, if your algorithm incorrectly flags students as having high behavioral risk or does not adequately flag a student likely to harm another student, an algorithm may not be appropriate.

Use the charts below (pre-populated with bad and good examples each) to evaluate whether ML makes sense for you. This is not an exhaustive list of criteria for you to consider; the following stages of the toolkit will provide additional considerations.

The chart demonstrates a bad example scenario of a company evaluating whether or not they should utilize usage behavior data (click patterns on a page) and machine learning to determine whether or not a student is engaged.

Bad Example

Bad Example

Question

Validation Plan

Answer

Mitigation Plan

Are you detecting, predicting, or inferring something that people generally agree on when they see it, or that has a clear, well-defined answer.

Present example scenarios and ask teachers, students, and families if they think students are engaged or not. Clearly define that the options are either engaged or not, with nothing in between.

[If the answers vary, then the question may not be a good use case for ML, given the data we have.]

Probably not

If NO, talk with schools, students, and parents to identify what data we would need or how we might reframe the question to have a more clear, well-defined answer.

Do you have enough data?

Make sure you have at least 1,000 data points for each type of behavior, across a diverse representation of students to ensure you’ve accounted for usage behavior from all engagement "types".

[This likely requires several thousands of students.]

Yes

If NO, talk to schools or other communities to find out if this is data they would be comfortable sharing. If so, request access to a larger dataset.

Do you have a problem that is hard to scale without technology?

Did schools share this issue as an example of something time-consuming and burdensome for their teachers or students? Are teachers struggling to assess whether their students are engaged or not in a timely manner for all their students?

Do we internally need to distinguish between engaged and unengaged students, and we cannot do this manually or with basic if/then statements? Would it be too time-consuming for an employee to look at the data and determine whether individual students are engaged or not?

Yes

If NO, explore what it would look like for a human to take on this task instead.

If your algorithm is wrong, will it cause tangible harm to a human?

Ask schools what would happen if a student gets labeled as disengaged when they are engaged, or vice versa. Ask teachers and students how they feel about erroneous results to assess the severity of the situation. Never decide for yourself without exploring with users whether the harm is "serious enough."

In this scenario, a student labeled "disengaged" might not progress as quickly as an "engaged" student, and ultimately receive a less ambitious educational experience. The product might alternatively label a disengaged student as engaged, giving the student more challenging content, leading the student to fail and become discouraged.

Yes

if YES, pause before proceeding. Consider ways to mitigate errors or to incorporate safeguards so that mistakes are less severe. You may be able to create opportunities for human review or intervention. For example, before the content is changed to a less complex pathway, recommend that a teacher check in with the student and make a human recommendation.

There are no rows in this table

⁠

The chart demonstrates a good example scenario of a company evaluating whether or not they should utilize usage behavior data (click patterns on a page) and machine learning to determine whether or not a student attempted to game the system - to appear as if they’ve read the text rather than having actually completed the module.

Good Example

Good Example

Question

Validation Plan

Answer

Are you detecting, predicting, or inferring something that people generally agree on when they see it, or that has a clear, well-defined answer.

Present example scenarios to schools and ask teachers, students, and families if they would classify student behavior as gaming the system or actually reading. [If the answers vary, then the question may not be a good use case for ML, given the data we have.

Yes

Do you have enough data?

Make sure you have at least 1,000 data points for each scenario – students actually reading and students gaming the system across a diverse representation of students to ensure you’ve accounted for usage behavior from all engagement "types".

[This likely requires several thousands of students.]

Yes

Do you have a problem that is hard to scale without technology?

Ask teachers or parents if they have a hard time figuring out when students actually read or gamed the system. Teachers do struggle to keep track of all students at all times, so this may help them understand why students who appear to have read a given book still struggle to demonstrate reading comprehension.

Do we internally need to distinguish between students who read vs. those who didn’t, and we cannot do this manually or with basic if/hen statements? It would be difficult for an employee to do this for all students, and basic if/then statements may not incorporate the unique differences between different students’ behavior.

Yes

If your algorithm is wrong, will it cause tangible harm to a human?

Work with schools to understand what would happen if a student gets labeled as having gamed the system when they haven’t, or vice versa. Never decide for yourself without exploring with users whether the harm is "serious enough."

In this case, a student mislabeled as gaming the system might be reprimanded and a student who isn’t identified may be considered at a mastery level that doesn’t match their needs (below or above, depending on luck). It’s important to share this potential with teachers so that they check in with students labeled as “gaming the system” rather than punishing them automatically.

Yes

There are no rows in this table

⁠

Activity 4: What could go wrong?

Most likely, your company’s use case does not pass the above criteria test with flying colors as a perfect ML use case. Threat models like

Microsoft's⁠

can help you assess for what might go wrong. Edtech products can have a significant positive or negative impact on a child’s life. You should continuously ask "what might go wrong" throughout your design and development process.

Step 1: Realize it is not what you think.

You can't predict everything that might go wrong while sitting at your desk or in your office. Ask the people who will use your product AND those impacted by your product about the use of algorithms. What do (Black and Brown) students and teachers think of your plan to use machine learning in this scenario? What about learning researchers? Students, teachers, administrators, and other members of their community will each have their concerns when they learn that AI or ML is being used in their schools. You should engage in these conversations often and incorporate their concerns as important feedback.

Step 2: Summarize and share all risks with a diverse group of schools, students, and families. Leave time in your development to make changes based on their reactions and concerns. If your users would be concerned with the worst-case scenario, machine learning is likely not the right solution. If users don't feel their concerns have been addressed by the end of your development process, then using these tools is likely not a good idea for this community. Your product cannot be successful without their buy-in.

Activity 5: What will it cost?

It may seem too early to assess cost, but this is precisely the time to determine whether you have the resources to create a meaningful and safe ML model. Do not proceed if you don't have the required resources! Consider a (non-exhaustive) list of resources required to be a part of the team:

Tasks

Tasks

Question

Responsible

Yes or No

Software engineer(s) with significant ML experience understand all the issues laid out by this guide and can execute them over the long term.

Liz Allendorf

Yes

Data scientist(s) deeply understand the data going into the ML pipeline and have the resources needed to understand any questions they might have about it.

Nidhi Hebbar

Social scientist(s) and education expert(s) understand ML enough to monitor the ML process on an ongoing basis and can step in to prevent problematic design choices that may cause harm to students (See

UX Design

⁠

section).

Matthew Volk

Unclear

We have access to a (large, comprehensive) dataset with labels created by a trusted party or we can create the labels ourselves to acceptable quality. (See the

Datasets

⁠

section for more details). Note: This is a huge barrier that many prospective ML projects cannot overcome.

Liz Allendorf

Unclear

We have the resources and access required to update the dataset regularly.

Liz Allendorf

Yes

We have adequate computing resources. If the resources are not in-house, then we have an engineer who is familiar with commercial computing solutions, and how to use AI/ML on these platforms.

Liz Allendorf

Yes

We have the time to develop the solution fully before deploying, then maintaining the solution during deployment (including bug fixes, model retraining, dataset updates, model updates, update deployment etc.).

Madison Jacobs

Yes

We have the funds to maintain and scale these resources as the company grows.

Nidhi Hebbar

Unclear

We have the funds and time to adequately test our product with students and teachers, to collect feedback, and to incorporate critical feedback into the product before launch.

Nidhi Hebbar

Unclear

There are no rows in this table

⁠

🎯 Conclusion: Is this a good or a bad idea?

Continuously revisit this question before and after making the final decision to deploy. Be careful not to fall into the sunk cost fallacy. It is perfectly natural for a company to realize very late that maintaining an ML system in production is not ethically or financially reasonable for a given use case. Do not try to force a problem into an ML-shaped hole if it does not make sense.

After this section, you'll explore your
`Logical Assumptions`⁠
.

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.

Ideation

🎯 Goals

🚧 Caution!

✅ Activities for Ideation

🎯 Conclusion: Is this a good or a bad idea?

After this section, you'll explore your Logical Assumptions⁠.

After this section, you'll explore your
`Logical Assumptions`⁠
.