Gallery
AI Co-Pilot Evaluation Framework
Share
Explore

icon picker
Product-Market Fit, UX, Data, Systems

A repeatable framework to assess the real business value and product effectiveness of AI co-pilots
Product Assessment - Part 1
Core Focus Area
Sub-Focus Area
Question
Example Answer
Level of Capability
Product-Market-Fit, Synergies & Risks
10
Singular Focus
Does the co-pilot focus on a specific industry (eg. FSI, healthcare), function (eg. HR, F&A), persona (eg. developer, contact center associate) or a system (eg. System of Engagement)?
Eg. A co-pilot built for contact centers, that focusses on helping contact center associates in navigating through a support request.
Cross-Focus Area Synergies
If the above isn’t true, do the multiple focus areas have some form of synergies in the form of systems interfaced or data types supported.
Eg. Co-pilots can commonly share channels of engagement such as phone, chat & voice as systems of engagements across use cases and leverage CRMs as systems of record
Process Re-Engineering
Does the underlying process have redundant & unnecessary steps that may not be needed once the process is transformed? If yes, is the co-pilot re-engineering to address the redundancies?
Eg. Yes - Co-pilot brings collaboration between redundant steps to ensure that steps aren’t repeated and eliminates scope for human error in some areas, thus removing those review steps.
Depth of Process
When applicable, does the co-pilot have the capability to cross-functionally operate across multiple functions, or layers of offices to finish the end to end use case? If it does cover an end to end process, is it architected to serve individual tasks for individual micro-processes, while creating strong connections across each stage in the lifecycle?
Eg. A securities trade settlement co-pilot should be able to operate across the front office (traders), middle office (responsible for filing settlements) and back office (responsible for regulatory reporting), integrating with relevant systems and relaying various information between the desks.
Channels of Engagement
Does the co-pilot cover various channels of engegement that the use case demands? If yes, is product development efforts prioritized efficiently based on the volume of engagement by channel?
Eg. A support co-pilot for a telecommunication company focusses largely on seamless interactions for voice based conversations with customers, however, also supports user inputs text message & Whatsapp
Simple Processes done at High Volumes
Is the co-pilot able to solve for a repeatable business process conducted by several FTEs, at a high cadence in a linear way consistently across multiple FTEs, multiple times a day with little variance?
Eg. A co-pilot on automating a driver onboarding process involves reading license information, inputting it in a system of record and cross checking it with tax document information, is done consistently across 1000s of reps at a business process outsourcing company
Complex Processes split across a large FTE pool
Are they solving for a wide variety of business processes, done by one of multiple FTEs, thus leading to a higher sum total of overall transaction volume?
Eg. A co-pilot automating a sales commission calculation process should be able to allocate deal assignments to finance analysts, triage to leadership for review and trigger a submission to payroll systems
Financial Risk Controls
Does the co-pilot address or provides mechanisms to address any financial risks associated with incorrectly inferring inputs or generating inaccurate outputs?
Eg. Yes - the contract renewal co-pilot generates a human review step wherein it juxtaposes the contract snippet with the renewal value and the extracted value side by side for human review
Financial Accuracy
Does the co-pilot achieve an error rate low enough to be not substantial enough to avoid any financial losses? Is the variance in the output relatively consistent?
Eg. Yes - the accounts receivables co-pilot calculates the monthly recurring revenue within 0.1% of the actual value, based on historical trends
Regulatory Risk Controls
Does the co-pilot have adequate controls & experience design to address for regulatory risks, if there are any?
Eg. The corporate actions co-pilot triages the final step of a corporate actions release process to an analyst for final review prior to submission
User Experience
7
Interactivity
Does the co-pilot effectively break down the information it needs into various steps? Does it properly utilize those steps in cross-referencing information with source systems, automatically pulling up information that can be referenced to still optimize for user input?
Eg. The customer support co-pilot takes a methodical approach towards answering a customer request, asking for their basic information while cross referencing and confirmation their account information, surfacing relevant knowledge articles to ensure that they’ve conducted basic troubleshooting.
Re-Engineering
Are the steps designed to be in-sync with the current workflow, while optimizing for the process re-engineering efforts?
Eg. The customer support co-pilot surfaces the knowledge article’s key steps in-line with the chat and is able to surface it in a consumable format rather than sharing an obscure link.
Consistency
Is the intended agent workflow consistent across several team members, but at the same time accounts for the differences in the nature of micro-processes that they do?
Eg. The rideshare driver onboarding co-pilot always surfaces the driver license information followed by the populated tax forms for all processing agents to ensure consistency and process complaince.
Collaboration
Is the workflow collaborative enough to split the work item transaction volume across multiple team members? If how does it enable auto-assignment, check-in / check-out, triaging and supervisor review?
Eg. The credit card fraud processing co-pilot triages any transactions above $1000 to the team lead for review to double check the predicted fraud outcome and avoid any false positives.
Statefulness
Has the co-pilot been built to be stateful? Does it have the capability to augment information from key source systems? Does it remember past transactions and can a user import inputs & outputs from such past transactions?
Eg. The account payable co-pilot has the capability to pull up past transactions for the managed services vendor to flag anytime a transaction is more than 10% higher than the standard monthly payment.
Suggestiveness
Is the co-pilot suggestive? Does it provide users with suggestions via augmentation of information from existing knowledge bases and memory and does that help in creating added layers of controls?
Eg. The account payable co-pilot suggests the analyst that the transaction can be made after 15 days to fully utilize the NET 30 payment term stipulated by the vendor, thus conserving the company’s cash position.
Change Management
Is the co-pilot workflow editable to accomodate for business changes? If yes, is this change less resource intensive and time consuming?
Eg. FP&A Co-pilot has a creator workbench that allows a user to switch input formats, integration with source systems, order of steps, positioning of review stages and triaging rules.
Data Types & Quality
2
Data types
If the use case requires unstructured data extraction, is the co-pilot capable enough to extract data such as free-form text, complex tables, highly variable key value pairs from varied input sources such as emails, scanned documents, chats, calls?
Eg. The Accounts Receivables co-pilot is able to read through the SoW to extract the payment schedule for the software services and export them to a tabular format for input to CRMs
Data Quality
For use cases with low quality or missing data, can the co-pilot identify gaps in data quality, extrapolate them, find references in source systems or triage them to source users for achieving completeness?
Eg. Sales credit calculation co-pilot faces several instances where the contract value isn’t itemized by software and services and is able to triage the work item to a revenue operations analyst in such cases.
Systems Integration
6
Beadth by Function
Does it have integrations with the major systems specific to that vertical or function such as CRMs, ERPs, HR / Payroll?
Eg. A sales AI co-pilot should come with an OOTB integration with SFDC, ZoomInfo, Gong
Beadth by System Type
Does it integrate with the major types of systems such as record, action, engagement, reference and insight to create a stateful, multi-modal and connected experience?
Eg. An HR co-pilot integrates with Workday as well as ADP to understand both aspects of an employee including their work department, manager as well as payroll information.
Depth of Integration - Modern Applications
Can it interface with both modern and legacy applications? Does it have OOTB connections & tools to quickly integrate custom APIs?
Eg. A sales credit co-pilot for trading desks integrates with API driven systems such as Vestmark for trade tracking, as well as legacy systems such as Keylink used for regulatory purposes.
Depth of Integration - Legacy Applications
For non-API based systems, is there any screen scraping? If yes, how sensitive is the screen scraping to changes in the underlying system?
Eg. The sales credit co-pilot is able to leverage an intelligent scraping mechanism for emulating manual interaction with terminals that contain information regarding the traded security.
Augmenting Insights to Generate Intelligence
Does the co-pilot integrate with systems of insights & store historical transactional information to generate unique insights over the course of the usage?
Eg. A sales co-pilot summarizing meeting notes for all customer calls can potentially augment common customer pain points and product features requested across customer calls to generate unique insights.
Fragmentation across market segments
Does the co-pilot have the capability to integrate across a vastly fragmented set of tools based on its use case and target market segment?
Eg. The payroll co-pilot has the capability to integrate across several HR systems including ADP, Workday, Gusto, Paychex, Rippling & Zenefits thus being able to target both enterprise and mid-market customers.

Level of Capability

Share
 
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.