icon picker
Responsible AI Product Design

Resources for private companies building Responsible AI outputs for each stage of the Machine Learning model lifecycle.
This guide is meant to provide resources for encouraging transparency and compliance from private sector companies submitting contract bids for government procurement contracts.
Private sector AI companies improve their abilities to promote trust in their products by thinking about new ways to share transparency and trust in easy, comprehensible ways for their users. Being able to share transparency artifacts that describe the high-level design process, metadata summaries, performance & fairness monitoring thresholds, and consumer-friendly prediction explanations results in stronger AI products overall.
Blank diagram (4).png
Microsoft’s design principles for Responsible AI include the following principles, which all represent potential areas for improving trust in the usage of AI products. For clarification on the differences between fairness, bias, and transparency, check out this .

Transparency-by-design System Architecture Map

The diagram below describes the Machine Learning Model lifecycle with key monitoring, auditing, and transparency considerations in each stage.
ML Lifecycle (2).png
Machine Learning applications differ from traditional software engineering in a few key ways:
ML applications are directly exposed to the constantly changing real world through data, whereas traditional software operates in a simplified, static, abstract world which is directly constructed by the developer.
ML apps need to be developed through cycles of experimentation: due to the constant exposure to data, we don’t learn the behavior of ML apps through logical reasoning but through empirical observation.
The skillset and the background of people building the applications gets realigned: while it is still effective to express applications in code, the emphasis shifts to data and experimentation—more akin to empirical science—rather than traditional software engineering.
This approach is not novel. There is a decades-long tradition of data-centric programming. To make ML applications production-ready from the beginning, developers must adhere to the same set of standards as all other production-grade software. This introduces further requirements:
The scale of operations is often two orders of magnitude larger than in the earlier data-centric environments. Not only is data larger, but models—deep learning models in particular—are much larger than before.
Modern ML applications need to be carefully orchestrated: with the dramatic increase in the complexity of apps, which can require dozens of interconnected steps, developers need better software paradigms, such as first-class DAGs.
We need robust versioning for data, models, code, and preferably even the internal state of applications—think Git on steroids to answer inevitable questions: What changed? Why did something break? Who did what and when? How do two iterations compare?
The applications must be integrated to the surrounding business systems so ideas can be tested and validated in the real world in a controlled manner.
Two important trends collide in these lists. On the one hand we have the long tradition of data-centric programming; on the other hand, we face the needs of modern, large-scale business applications. Either paradigm is insufficient by itself: it would be ill-advised to suggest building a modern ML application in Excel. Similarly, it would be pointless to pretend that a data-intensive application resembles a run-off-the-mill microservice which can be built with the usual software toolchain consisting of, say, GitHub, Docker, and Kubernetes.
For a case study on building technical architectures that embed transparency, explainability, and technical fairness into their designs, check out LinkedIn’s blog posts on their responsible AI initiatives.

Data Collection, Metadata, & Quality Resources

When data is incorrect, downstream data sets will be wrong too. Features derived from that data will suffer, which in turn means the performance of machine learning models will deteriorate during training or tuning, decreasing the predictive power of models once deployed to production. Any downstream consumers of that model (e.g. backend services) will be negatively affected by it too.
Corrections need to cascade all the way down, which can be both costly and time-consuming. It is even worse with of data, features, or models: there is no way to notify teams in case of issues or if they have to backfill or retrain. Such are common, they often go unnoticed, but they are not unavoidable. Negative side effects quickly compound in cascades with plenty of data producers and consumers. The problem is that if data quality is not addressed at the source, it pops up in every derived data set, every related query, and every dependent machine learning product.
The role of metadata is to help to understand the data. There are several dimensions of understanding a dataset:
What does the data represent logically? What is the meaning of the attributes? Is it the source of truth, or derived from another dataset?
What is the schema of data? Who manages it? How was it transformed?
When was it last updated? Is the data tiered? Where are the previous versions? Can I trust this data? How reliable is the data quality?
Who and/or which team is the owner? Who are the common users?
What query engines are used to access the data? Are the datasets versioned?
Where is the data located? Where is it replicated, and what is the format?
How is the data physically represented, and can it be accessed?
Are there similar datasets with common similar or identical content, both overall as well as for individual columns?

Metadata details of a dataset can be divided into three categories: technical, operational, and team metadata/tribal knowledge.
Technical metadata
consists of logical and physical metadata details of the dataset. Physical metadata covers details related to physical layout and persistence, such as creation and modification timestamps, physical location and format, storage tiers, and retention details. Logical metadata includes dataset schema, data source details, the process of generating the dataset, and owners and users of the dataset. Technical metadata is typically extracted by crawling the individual data source without necessarily correlating across multiple sources.
Operational metadata
consists of two key buckets: Lineage and Data Profiling stats. Lineage involves tracking how the dataset was generated and its dependencies on other datasets. For a given dataset, lineage includes all the dependent input tables, derived tables, and output models and dashboards. It includes the jobs that implement the transformation logic to derive the final output. Data profiling stats involves tracking availability and quality stats. It captures column-level and set-level characteristics of the dataset. It also includes execution stats that capture the completion times, data processed, and errors associated with the pipelines.
Team metadata
As data scientists work with different datasets for their projects, they discover additional details about attribute meaning, business vocabulary, data quality, and so on. These learnings are referred to as tribal knowledge. The goal is to actively share team knowledge across data users by enriching the metadata details for the datasets.

Outputs
Transparency artifacts around data quality and metadata are powerful, portable resources for promoting trust and consistency around datasets and pipelines that drive AI model creation. A few of the emerging interfaces for consumer-friendly metadata reporting interfaces include:
, which aims to create a standard label for summarizing dataset contents similar to food nutrition labels.
, a project for building report cards on datasets/machine learning models oriented for different types of data users.
, a reporting interface for machine learning models design, metadata, and limitations that has sparked industry adoption.

Links
Model/Data Cards:
Data version control:

ML Deployment Monitoring Resources


image.png
source: Fast Forward Labs
Concept Drift: Changes in the underlying relationships between data and predictions are known as concept drift, and will cause the predictive performance of a model to degrade over time, eventually making it obsolete for the task it was initially intended to solve.
Feature drift: (also referred to as covariate shift, feature change, input drift) characterizes the scenario where the distribution of one or more input variables changes over time
Machine learning models can be used to automate the detection of health issues or diseases in patient data. A model could be trained to automatically screen patient data samples against known diseases or health issues. The input data could be image files such as x-rays, or any number of health-related measurements. A model could also be used to flag patients at risk from certain diseases based on recorded lifestyle choices.
However, if the model is trained on data from a type of patient that isn’t representative of the real-world use case, covariate drift can occur. For example, a model trained on available training data made up of patients in their 20’s won’t be as accurate at screening data from patients in their 50’s.
Real concept drift: changes in features and signals that render a previously learned relationship between features and targets no longer true
Verification latency: The period between the availability of an unlabeled test instance and the availability of its true label
Infinite Verification Latency: scenarios where it is impossible to get the true label for a prediction because the positive case lacks an intervention and a corresponding signal. i.e. if an algorithm classifies a patient as likely to be a no-show for an appointment and passes over them for a different patient, there will never be confirmation for the first patient being a true no-show.
Aside from drift and latency concerns, annotating true labels for predictions can be expensive! Some tips for controlling costs in label annotation could include taking a randomly sampled percentage of predictions to run through label annotation.
Proxy metrics are also another consideration for obtaining feedback on your predictions when your verification latency takes too long or if you have a short window of opportunity to act on the prediction opportunity.
Proxies can be an imperfect representation of the true label that the model should be predicting however — ideally any proxy should be eventually verified by the true label for confirmation.

Common monitoring metrics include:
Performance monitoring around F1-score, accuracy, and confusion matrix fields.
Confidence monitoring around the model’s predictive confidence over time.
Selecting just one monitoring metric opens the possibility of overfitting or gaming the metric. For performance and fairness monitoring, a slate of metrics should be used that provide a holistic, complex window of model performance.
Concept drift rate over time — how fast does your model performance degrade? Knowing your performance degradation rate is important for anticipating re-training schedules.

Five steps for dealing with concept drift include:
Setting up a process for concept drift detection.
The first step is to set up processes for monitoring and detecting concept drift. Measuring the ongoing accuracy of a model is key to achieving long term performance. Organisations can achieve this by maintaining labelled testing datasets or samples which have been curated by a team member. Any drop in performance over time that isn’t related to the quality of data may flag concept drift.
Maintaining a static model as a baseline for comparison.
It can be difficult to detect concept drift and understand if a model has become less accurate over time. A static model can be used as a baseline to understand any changes in model accuracy. It’s valuable to have a baseline model to measure the success of any changes you make to combat concept drift. A baseline static model can be used to measure the ongoing accuracy of amended models after each intervention.
Regularly retraining and updating the model.
A static machine learning algorithm is much more likely to experience concept drift. Generally trained in an offline or local environment, a static model won’t adapt to changing environments or scenarios. For models that deal with forecasting or predictions, a static algorithm developed on historic data can become inaccurate over time. Models deemed at risk from concept drift should be regularly retrained and updated to keep in line with evolving datasets and live environments.
Where possible, the static model can regularly be updated and trained with samples of new training data. This fine-tunes the model and lowers the risk of it becoming obsolete over time. Retraining should occur regularly to reflect new and emerging trends between input and output data. The frequency of the required update can be set by regularly assessing the accuracy of the machine learning model. For example, retraining might be required monthly, quarterly or every six months to maintain accuracy.
Weighting the importance of new data.
When developing some models data scientists can set the relative importance of different input data. New data can be recognised as of higher importance than older data by weighting input data by relative age. This will emphasise the importance of new data within the algorithm, adding less weight to historic data which may be out-of-date. If concept shift is occurring, focusing on newer data should mean the algorithm can adapt and stay accurate. However, this is not without risk as overweighting new data can severely impact model performance.
Creating new models to solve sudden or recurring concept drift.
In some circumstances, sudden concept drift may occur from global events or changes. In these cases, models trained on historic data will become less reliable as behaviour changes. An example could be the changes in customer behaviour during the COVID-19 pandemic and the lockdowns experienced across the globe. New models can be adapted from existing models to deal with new trends within these periods of change.

Class Imbalance
The common issue of class imbalance is exacerbated when it comes to drift detection. Class imbalance occurs when the proportion of data instances belonging to each class varies, causing certain classes to be underrepresented. It is usually the underrepresented classes in such situations that end up having higher misclassifications. Detecting drift between populations with imbalanced classes is complicated, and becomes more challenging when the data between windows cannot be stored due to memory issues. As such, approaches that cater to both concept drift and class imbalance in data streams are relatively less studied.[16]
Links
Github repo on MlOps:
ML Framework Tool Selection:

ML Model Security Resources

Machine learning models have unique vulnerabilities that consist of:
Data extraction: exposure of sensitive information and personally identifiable details in the data used for training the models.
Unintended Memorization
Deep learning models and in particular generative models suffer from unintended memorization1, they can memorize rare details from the training data. Using only inferences from the model, it is possible to extract sensitive information from the training dataset.
This method was used to extract credit card details and social security numbers from Gmail’s Smart Compose feature that helps you to write emails faster.
Membership Inference
Using this technique an attacker can infer whether a new data record is included in the training dataset or not.
Model extraction: copying a model by stealing coefficient weights and hyperparameters.
Model Stealing
Using these methods and by querying models (with predictions APIs), it is possible to extract an equivalent ML model. These attacks are successful against a wide variety of model types: logistic regressions, decision trees, SVMs and deep neural networks.
A set of model extraction attacks were used to steal model types and parameters from public machine learning services such as BigML and Amazon Machine Learning in 2016.
Model tricking: deceiving the model by manipulating predictive inputs, leading to inconsistent or incorrect predictions.
Adversarial Attacks
Adversarial attack is maybe one of the most known attacks on deep neural networks since it has been widely mediatized. It works by providing deceptive input at inference time. It’s a powerful method because the noise added to the input can be imperceptible and it does not depend on the architecture of the attacked model.
Model corruption: Techniques for perturbing the model’s inner functionality of the model that corrupt outputs.
Data Poisoning
Training data often include information harvested by automated scrapers or from unverified sources on the web, this is a vulnerability that can be exploited to tamper with models.
Backdoor attacks
These methods work by injecting a trojan in the training data or in a pre-trained model that is activated by a trigger at inference time to make the model output a specific result. It is model-agnostic.
Attacks on Transfer Learning
If a pre-trained model is corrupted with data poisoning or a backdoor attack, it is likely that a new model that is trained using the corrupted model as a starting point will be corrupted as well.
Federated Learning
considering its distributed nature, can be even more exposed to Model corruption through training data manipulation as we can assume that an ML hacker has a full control of part of the training samples and labels. Federated learning can also be subject to data extraction attacks through the leakage of gradients shared between the workers and/or the server.

image.png
ML Model security threats (source: Sahbi Chaieb)
ML Security Solution Approaches
Differential Privacy
Differential Privacy can be used against data extraction attacks. The aim of this technique, applied during the training of the model, is to approach similar outputs when the training is done with the sensitive data or without it. It tries to ensures that the model does not rely too much on individual samples 1. In practice, that comes at the expense of accuracy.
Homomorphic encryption
Homomorphic encryption is an encryption method that enables computations on encrypted data directly. It ensures that the decrypted result is the same as if the computation had been done on unencrypted inputs.
It is one of the defense mechanisms used in Federated learning.
Randomization methods
The goal of these techniques is to protect Deep Neural Networks models against perturbations by adding randomness to test points and using the fact that Deep Neural Networks are robust to random perturbations. A random input transformation can help mitigate adversarial effects. Random smoothing technique is used to perturb backdoor attack triggers 7.
Input Preprocessing
Input preprocessing can be used against backdoor attacks or adversarial attacks, by modifying the model input during training or testing. Autoencoders can be used to clean the input. An imperfect reconstruction can make the model unable to recognize triggers.
Adversarial training
In this method, we train the model by adding adversarial samples in the training data to improve its robustness against adversarial attacks.
Links

Explainable AI (XAI) Resources

Explainable AI is a subset of algorithm design that promotes the ability to explain the performance and decisions of algorithms to a human in accessible and easy-to-understand language. The ability to explain why an algorithm is making a recommendation, tracing the data that inspired the recommendation, and emphasizing the learned relationships between variables that informed a recommendation are all crucial parts of building trust and accountability with users.
Model explainability is important because it:
Allows models to be understood by users with less technical knowledge.
Provides accountability for regulatory or legal purposes.
Helps identify emerging bias or quality issues in the model.
Improves trust of the model’s decisions.

Explainable AI is composed of three main sub-areas:
Local model explainability
Local or individual model explainability is an approach taken to answer specific questions about individual model decisions. A client or stakeholder may have a query about why a given decision was made by the model, or a decision may be flagged during an audit. In the case of a model used in the financial sector, a customer may challenge why their mortgage application was rejected by the model. Local explainability is often used in very specific circumstances, and will usually be used on models after deployment.
Local model explainability as an approach is useful for understanding which specific features impacted a specific decision. Local model explainability is important for models that are deployed in regulated organizations. Organizations could be audited or must justify why a business decision was made by the model.
Machine learning models should be constantly monitored to detect model drift such as . Models can become inaccurate over time for a range of reasons. A local model explainability tool can be used to deep drive into specific features if a specific error or . This approach will help organizations understand the features that most caused the issue, to allow ongoing improvement and optimization.
Cohort model explainability
Whereas individual or local model explainability approaches will be used on specific model decisions, cohort model explainability is applied to subsets of data. This approach is mainly used during the model production phase, specifically in the step before deployment. This crucial step is a measure of the model’s generalization before deployment, to gauge its accuracy with new or unseen data.
Cohort model explainability is used to answer questions on potential model bias highlighted with a subset of input data. Cross validation may show that the model is less accurate with a specific subset of input data. This might be because of innate bias in the training dataset, or overfitting on a specific subset of data. Cohort model explainability will help organizations understand the specific features that may be causing this drop in accuracy with this subset. The approach can be used to measure and compare model accuracy between different cohorts or subsets of the data.
Global model explainability
Global model explainability techniques will focus on the features that have the most impact on all of the model’s outcomes or decisions. Whereas local model explainability will focus on individual decisions, global model explainability takes a holistic approach. This approach can be used to answer top line questions on how the model performs overall after deployment. For example, a question about how the model makes an average prediction could be answered using a global model explainability approach. For this reason it’s a common approach to answering questions posed by stakeholders with no prior data science experience. It provides a global or top-level understanding for how the model functions and makes decisions.
Global model explainability is also an approach used during the training phase of the machine learning life cycle. Data scientists managing the training process can use this approach to understand the model in more detail, highlighting the features which have the biggest impact on the average model decision. There may also be instances when a data scientist has created multiple iterations of the same model. Each one could rely on a different selection of features to complete the given task. Global model explainability can identify the major features of each of the models, and can identify and resolve over reliance on specific features which may cause bias.

Different techniques for allowing model explainability are necessary depending on the type of algorithm. Neural networks in deep learning are examples of black box techniques that build complex multi-layer rulesets from the data and require more sophisticated methods for enabling explainability. Other algorithms like decision trees identify simpler patterns in the data that are easier to interpret, such as a threshold point for age that might sway the model prediction towards one outcome by a certain percentage.
There’s a wide array of ML libraries that seek to build explainability into AI model deployment with their own technical architectures. This includes a wide array of additional resources, libraries, and tutorials on AI explainability.
Links

AI Bias Resources


There are two ways that the term bias can be used:
Statistical bias (bias in the context of calculating how accurately a ML model models the training dataset)
Methodology bias (bias in the context of fairness)

Below are some of the common methodology biases that impact algorithm products ()
Historical Bias
While gathering data for training a machine learning algorithm, grabbing historical data is almost always the easiest place to start. If we’re not careful, however, it’s very easy to include bias that was present in the historical data.
Take Amazon, for example; In 2014 they set out to build a system for automatically screening job applicants. The idea was to just feed the system hundreds of CVs and have the top candidates picked out automatically. The system was trained on 10 years worth of job applications and their outcomes. The problem? Most employees at Amazon were male (particularly in technical roles). The algorithm learned that, because there were more men than women at Amazon, men were more suitable candidates and actively discriminated against non-male applications. By 2015 the whole project had to be scrapped.
Sample Bias
Sample bias happens when your training data does not accurately reflect the makeup of the real world usage of your model. Usually one population is either heavily overrepresented or underrepresented.
When training a speech-to-text system, you need lots of audio clips together with their corresponding transcriptions. Where better to get lots of this data than audiobooks? What could be wrong with that approach?
Well, it turns out that the vast majority of audiobooks are narrated by well educated, middle aged, white men. Unsurprisingly, speech recognition software trained using this approach underperforms when the user is from a different socio-economic or ethnic background.
Label Bias
A lot of the data required to train ML algorithms needs to be labelled before it is useful. You actually do this yourself quite a lot when you log in to websites. Been asked to identify the squares that contain traffic lights? You’re actually confirming a set of labels for that image to help train visual recognition models. The way in which we label data, however, varies a lot and inconsistencies in labelling can introduce bias into the system.
Aggregation Bias
Sometimes we aggregate data to simplify it, or present it in a particular fashion. This can lead to bias regardless of whether it happens before or after creating our model. Take a look at this chart, for example:
image
It shows how salary increases based on the number of years worked in a job. There’s a pretty strong correlation here that the longer you work, the more you get paid. Let’s now look at the data that was used to create this aggregate though:
image
We see that for athletes the complete opposite is true. They are able to earn high salaries early on in their careers while they are still at their physical peak but it then drops off as they stop competing. By aggregating them with other professions we’re making our algorithm biased against them.
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.