This resource provides a framework for procurement officers to evaluate and monitor AI vendor tools at each stage in the product design pipeline, after the vendor is selected. It can be used by procurement officers to solicit answers to critical questions from AI vendors at each stage of the model design lifecycle.
for AI applications, this infographic first breaks down the procurement process into an “AI Model Lifecycle,” which aligns procurement officials and AI vendors under a common design process. Next, the framework provides key questions for procurement officials to ask at each design stage for evaluating and monitoring AI products. Finally, it recommends sharable artifacts and documentation that vendors should produce at each stage of the machine learning lifecycle. These artifacts can be used during the procurement process to promote transparency between the procuring organization and the vendor.
Each section below follows a similar format for each stage of the AI Model Lifecycle:
First, there is a summary of the model lifecycle stage.
Next is a list of questions for procurement officials to ask vendors that are specific to each stage.
Then is a list of the procurement clauses in the Responsible Language Generator tool that are relevant to each stage.
Finally, there is a list of additional resources in each stage for procurement officials to learn more about advanced concepts or industry standards.
An all-in-one infographic for questions and tools that procurement officers can use to guide contract vendors in responsible development of AI tools.
Project Scoping
Summary
The project scoping stage is critical for defining product requirements; mapping out anticipated users, impacted communities, and key stakeholders; and establishing a framework for defining what is in scope, what is beyond the capabilities of the system, and how the tool improves upon the current status quo.
There should be a clear mission statement around the purpose of the tool, how it would be used, and what metrics are needed to evaluate its performance. A risk analysis assessment is also useful at this point to flag potential design risks for patient health and vulnerable populations that might be negatively impacted by the tool.
Potential Questions for Procurement Officers to Ask Vendors
What problem is this algorithm solving?
What communities will be impacted by this algorithm and how should risk be measured?
What alternatives were considered for solving this problem? What would be the result of doing nothing?
This phase of the project lifecycle focuses on establishing the purpose and goals of the project to anchor the technical development work. Investing time and resources into the scoping phase provides clarity to model developers when making decisions around risk tolerance, privacy and security considerations, potential sources of bias, and requirements for availability, performance speed, and precision levels for predictions. These questions ask AI vendors to offer documentation and plans that verify that they understand the problem the tool is solving, that the tool will offer improvements over the status quo, and that they are anticipating how and where the tool may run into performance issues.
Relevant Procurement Template Clauses
Provision 5 (Transparency): “Bids incorporating frameworks for soliciting user design feedback from healthcare delivery professionals like doctors and nurses, as well as patients, will be favored.”
Impact assessments have been used in other industries to assess privacy, environmental, and human rights risks. Algorithmic impact assessments are high-level frameworks for establishing project purpose, success metrics, and potential risks at the beginning of the project.
Product requirements documents are traditionally used by stakeholders to communicate the product requirements to software developers, establishing the full set of features needed for the project release to be considered complete. This can be an effective resource for procurement officers as they establish a contract with AI vendors on the overall product deliverable.
A stakeholder map can be an effective way to map out communities that are involved or impacted by the development and usage of the AI tool. These can be helpful to procurement officers when assessing risks and sources of bias in the project scoping stage.
When data used in an algorithmic model is incorrect, downstream data that relies on that source will also be incorrect. The quality of features derived from that data also suffers, which means the performance of machine learning models will deteriorate during training, decreasing their predictive power. Any downstream consumers of that model will be negatively impacted as well.
Metadata is meant to help contextualize and model the data, and seeks to answer questions regarding the “who, what, when, why, and how” for the dataset.
The intended purpose of the dataset and what it measures;
The population described by the dataset;
The range of column variables used to structure information in the dataset (known as “schema”);
The schedule on which the dataset is refreshed with new data;
The lineage of the dataset that describes from where it was sourced (or whether it is the best primary source describing a person or event);
Who owns the creation, maintenance, quality, and issue resolution responsibilities for the dataset;
The preferred ways of accessing the dataset (e.g., API, database query, CSV file request, streaming data feed, etc.); and
The location of the dataset within the larger data ecosystem (including whether higher-level permissions and restrictions from the system apply to the dataset).
Sharing metadata details can help establish valuable high-level knowledge about the intended purpose of the dataset, key assumptions used in generating the data that impact its usage, and technical support processes for reporting product quality issues or requesting recovery efforts.
Knowing the metadata information necessary to answer these questions is crucial for the operation of consistent, timely, and resilient AI products.
Potential Questions for Procurement Officers to Ask
What datasets were used during training, and what kind of details are available about them?
How often is the dataset refreshed?
Is there a data retention or expiration policy?
Does the tool have the ability to delete an individual’s data?
What privacy and data protection regulations must be followed?
This phase of the model lifecycle focuses on how the input data for the model is collected, and how vendors prioritize privacy and security within their data management practices. Many regulatory compliance standards have requirements regarding the storage of data, including regular processes for deleting data, encrypting the data as it gets transported between databases, and “right to be forgotten” capabilities for deleting data about individuals. These questions for procurement officers aim to ensure that AI vendors are in compliance with the correct regulatory standards for the intended user group and have adequate documentation for describing the lineage of datasets involved in creating a model.
Relevant Procurement Template Clauses
Provision 4 (Data Quality): “Bidding contractors should provide a set of standards that they will use to define data quality, algorithmic system performance, and evaluation metrics. These standards should be used throughout the duration of the contract to guarantee a minimum level of quality and performance from the system.”
Helpful Tools for This Stage
Materials that describe data quality and metadata can be powerful, portable resources for promoting trust and transparency around datasets and pipelines that drive AI model creation. A few of the emerging interfaces for consumer-friendly metadata reporting interfaces include:
An AI vendor might maintain their own data catalog that records the physical, operational, and team metadata attributes of their data assets. These catalogs can be valuable reports to share with procurement officers for risk assessment, regulatory compliance, and operational maintenance purposes.
As data is prepared for training the AI model, software developers will make decisions around how to interpret or clean the dataset. This can range from deciding on a default value to fill in missing gaps in the data, to removing unusual rows of data that may have resulted in human error. These decisions can introduce implicit bias through the developer’s judgments on the data and their understanding of the problem space.
As a recent example of how data transformation can introduce bias, an algorithm used to diagnose kidney disease was found to apply a race-based correction factor to normalize differing bio-levels of creatinine between racial groups. A
found that “removing [the race-based correction factor] would shift the status of 29% of Black patients from having early-stage to advanced disease.” Before correction of this error, a significant proportion of Black patients had delayed care of kidney disease, believing their conditions were less severe .
The risk of introducing bias in this way is dependent on the specific context and is best assessed by a multidisciplinary team that can consider perspectives from engineering, social sciences, and legal precedents.
Potential Questions for Procurement Officers to Ask
What kinds of data imputation and transformation logics are being applied to the data?
Are changes in the transformation logic saved between versions for auditing purposes?
How often is the model re-trained?
How do you check for data quality issues like missing, incomplete, or duplicate data?
Are there demographic groups in the intended user population that are missing from the dataset?
This phase of the model lifecycle focuses on transforming the input data into a cohesive, clean form that is ready for model training and prediction usage. The cleaning and transformation process can bake assumptions and biases from the model developers into the dataset. By making decisions around what kinds of default values are used to fill in missing data, or how to handle duplicate data, developers inject their own ideas of “good data” into the model inputs. These questions ask AI vendors to have data-cleaning protocols that try to anticipate sources of bias that may impact the output tool. By documenting their transformation logic and assumptions, auditors are able to flag potential sources of bias before it causes larger downstream problems.
Relevant Procurement Template Clauses
Provision 4 (Data Quality): “Bidding contractors should provide a report to [insert organization] on any bias mitigation techniques, ethics checklists, or product requirements documents that have been created during the development of the algorithmic system.
Bids that can demonstrate multi-disciplinary approaches and teams for algorithmic system design, using expertise from engineering, social sciences, and legal professions, will be given a higher rating.”
Helpful Tools
A summary report that includes a data quality assessment, a list of imputation techniques used to fill in missing data, and plain-text explanations regarding the design process.
AI vendors can provide a high-level report that details the scope of missing data and overall data quality in the input datasets. The list of techniques used for filling in missing data (both at a row-level and a column-level), along with assumptions used in setting default values for the missing data, can also be provided. This report should also include plain-text explanations of why the vendor elected to use these transformation and imputation techniques.
Extract, transform, and load (ETL) source code responsible for applying data transformations.
Obtaining access to the technical source code for the tool may not be possible due to intellectual property concerns, but having this level of transparency can enable fine-grained audits of any assumptions and biases embedded during the data transformation stage.
As the AI model is trained on the input datasets, it develops its own internal logical rules for building predictions. The model takes inputs and calculates the probabilities of the different potential classification options to arrive at the highest likelihood result. These internal rules can be examined for logical validity in the relevant problem areas (i.e., does a model predicting hypertension risk in patients learn clinically significant rules around blood pressure, height and weight, and age?).
This kind of inspection in models is known as explainable AI, which is a subset of algorithm design that seeks to explain the performance and decisions of algorithms to a human in accessible and easy-to-understand language. The ability to explain why an algorithm is making a recommendation, trace the data that inspired the recommendation, and emphasize the learned relationships between variables that informed a recommendation are all crucial parts of building trust and accountability with users.
Model explainability is important because it:
Allows models to be understood by users with less technical knowledge;
Provides accountability for regulatory or legal purposes;
Helps identify emerging bias or quality issues in the model; and
Local model explainability is the ability to isolate the factors that have the greatest influence on an individual model decision. This allows a model to expose the pieces of source data that influenced the model’s internal logic to arrive at a prediction when explaining an algorithmic decision. For example, local explainability would be used to explain to a customer why exactly their insurance appeal was rejected by an algorithm.
Cohort model explainability relates to subsets of data that can point to generalizability issues in the model. Using cohort model explainability can help with fairness metric testing, where testing on different cohort groups can verify a model’s ability to generalize across protected demographic attributes without impacting performance.
Global model explainability is the ability to isolate the factors that have the greatest influence across all of the model’s predictions. Global explainability is a powerful step in checking for innate bias in a model. For example, global explainability measures can help isolate the most important features in the model. This can be used to flag features that depend on demographic attributes like race, or proxy variables for demographic attributes, like zip code.
Certain types of algorithm models are more explainable than others because their internal logical rules are easier for humans to understand in plain language. An algorithm built using decision trees builds internal rules like “if the patient is older than 45 yrs and male, recommend a colonoscopy.” Other algorithms, such as neural networks used in deep learning, may have complex internal logical rules that are not easily interpretable by humans. AI developers can make different decisions on the types of models they use based on the explainability requirements needed.
Potential Questions for Procurement Officers to Ask
Does the vendor conduct vulnerability analysis for its underlying open source software packages?
Is the AI model using algorithms that allow for explainable predictions?
This phase of the model lifecycle examines how the selection of algorithm types can impact the transparency and security of the end model. Model training is often an automated process, where a wide variety of training configurations are explored for the most optimal performance. Managing and storing these training configurations safely is crucial for being able to inspect the internal logic of the models and ensure consistency of the configurations over time. The questions above aim to ensure that AI vendors are consider the transparency requirements that will be required to audit, trust, and utilize the model. They also ask vendors to ensure safe software practices by performing regular
used in building their tools, and having patching processes for safely upgrading to new versions for security and performance improvements.
Relevant Procurement Template Clauses
Provision 5 (Transparency): “[insert organization] should be able to provide a public-facing explanation of the algorithmic system's purpose and operations in plain language. The Contractor should enable the algorithmic system to be able to accommodate explainability of the entire system, as well as individual predictions.
This may include providing the highest-impact variables influencing a decision. The ability to view this explanation should be provided as an option along with the corresponding prediction.”
A list of the most influential variables used in training the model (global explainability) can help verify that the model is making predictions using sensible logic. The ability to give an accounting of the most influential variables tied to a prediction (local explainability) will help users understand and trust the model recommendations.
A Software Bill of Materials report lists the open-source and commercial software components used by the AI tool. Identifying the underlying packages will provide procuring organizations with the ability to perform vulnerability analysis on their purchased tools and monitor for potential cybersecurity issues.
Want to print your doc? This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (