How to set up Model Governance as you operationalize Machine Learning?

10 min readJan 1, 2022

Artificial intelligence (AI) has become quite powerful to solve previously unsolvable problems and is increasingly being adopted at companies across all industries. It is estimated that businesses will double their spending on AI systems to a projected $79.2 billion [1] to operationalize AI products in 2022.

But the economic opportunities AI presents don’t come without risk. As frequent news stories indicate, companies employing AI face business, ethical, and compliance challenges. When not addressed, these issues can lead to financial loss, lack of trust, negative publicity, and regulatory action. Industries differ widely in the scope and approach taken to address these risks, in large part due to the varying regulations governing each.

In 2021, ModelOps [5, 6, 7] (or AI/ML model operationalization) has emerged as a framework focusing primarily on the governance and life cycle management of a wide range of operationalized artificial intelligence (AI) and decision models, including machine learning, knowledge graphs, rules, optimization, linguistic and agent-based models. Core capabilities of ModelOps include continuous integration/continuous delivery (CI/CD) integration, model development environments, champion-challenger testing, model versioning, model store, and rollback.

Depiction of the various parts of ModelOps stack by Gartner in 2021

One of the reasons for the emergence of ModelOps is the unique set of characteristics of AI models which make it difficult for them to manage and govern like traditional software. Fundamentally, models are different from conventional software artifacts in 2 ways:

a) Model quality decays over time. As the data is not stationary, the relationships that models learn also shift with time, making them unreliable. Data scientists use a term called data drift to describe how a process or behavior can change, or drift, as time passes. There are three kinds of data drift that we need to be aware of: concept drift, label drift, and feature drift.

Types of data drift: concept drift, label drift, and feature drift

b) Unlike conventional code, models are a BlackBox. Even an expert data scientist will find it difficult to really understand how the model is arriving at a particular prediction.

AI needs a new DevOps a.k.a ModelOps stack [4]

These 2 aspects make it necessary for a new kind of governance and operational framework for AI Models.

Though some sectors, in particular finance, have implemented policies and systems designed to safeguard against potential adverse effects of models, there is not yet a canonical approach to AI Model Governance that protects against business, ethical and regulatory risks.

The current state of Model Governance in Banking

Model Governance in banks is mandated by SR-11–7 [3] and its OCC attachment for model risk management (MRM) within banking organizations. Over the years, banks have implemented policies and systems designed to safeguard against the potential adverse effects of models. After the 2008 financial debacle, banks had to comply with the SR 11–7 regulation, the intent of which was to ensure banking organizations were aware of the adverse consequences (including financial loss) of decisions based on AI. A typical bank may be running 100s or 1000s of quantitative or statistical models. The number keeps increasing every year by 10–15%. A single model failure can cause a Bank to lose billions of dollars. The advent of AI and ML models adds more challenges to Model Governance.

Model Governance in Banks goes far beyond compliance, it is integral to successfully run a financial services business. Today, access to trusted and high-quality AI models is essential to effectively using enterprise data − now considered a strategic asset − to drive better decision-making and business results. Banks are heavily dependent on these models to help them make the best decisions and navigate an increasingly competitive landscape. Banking executives are expected to rely on models − not just gut instinct and experience − when making decisions about deploying capital in support of lending and customer management strategies. Moreover, the stakeholders including shareholders, board members, and regulators, want to know how the models are making business decisions, how robust they are, the degree to which banking executives understand these models, etc.

Financial institutions in the United States are regulated by a number of regulatory entities at local (yellow), state (yellow), federal (blue), and international levels (green) [2]

The MRM and governance practices in place within the financial sector originate with SR 11–7, whose intent is to ensure banking organizations are “aware of the adverse consequences (including financial loss) of decisions based on models that are incorrect or misused” and advises “active model risk management” as the mechanism through which to do so. While these guidelines have been implemented throughout the financial sector, they do not cover every risk associated with the accelerating adoption of AI, nor do they fit naturally within the current standard AI development lifecycle.

Rise of Industry-Agnostic Model Governance

Whereas SR-11–7 and the Model Governance practices it spawned were intended to ensure stability within the financial system, the growing number of privacy (GDPR, CCPA, AAA) and bias laws intend to ensure ethical and transparent use of data. In the past year, we’ve seen progress on AI regulations from European Commission’s proposal, NIST publishing principles on Explainable AI. It is heartening to see the Office of Science and Technology from Whitehouse creating a bill of rights for an AI-powered world. Local governments are waking up to this and now New York City law requires bias audits of AI hiring tools, to be enforced starting January 2023. Specifically, with respect to AI, GDPR (currently the most comprehensive regulation), mandates the following:

Fair and transparent processing of personal data (5.1.a)
A comprehensive record of all processing of personal data that includes the purposes of the processing and a description of the categories of data subjects and personal data processed (30.1)
Right of data portability and deletion for individuals (17), and the right not to be subject to automated decision-making, including profiling (22)
Right to “meaningful information about the logic involved” in automated decisions (13.2.f)
The nomination of a “data protection officer” to monitor and advise on the fair and lawful collection and use of personal data (37.1)

These risks span applications of AI across industries and use cases, from recommender systems for personalized shopping experiences to optimization systems for insurance policy premiums or hospital resource allocation.

Challenges to set up Model Governance

The goal of Model Governance is to identify and minimize the risks associated with the models deployed. The process has many steps covering the development, implementation, testing, and deployment stages. Following are some of the model governance challenges listed in [2] by a top US bank.

A simplified model governance lifecycle [2] in financial services showing model development, production deployment, regulatory oversight, and various feedback loops.

An increasing number of models under management puts pressure on ML teams and a lot of teams still use basic spreadsheets to track the model inventory.
As the complexity of the AI models increases, it prolongs the time taken to validate them. It can take as much as 6–12 months to validate a sufficiently complex AI model. The large model size has been increasing 10x every year for the last few years. This is starting to look like another Moore’s Law [10].

Model complexity continues to grow at an exponential rate.

Lack of tools to explain the complex Models enable existing Governance teams to restrict Modelers to conservative approaches. Restrictions around the usage of Deep Neural Networks in many areas despite showing promising results hurt the business ROI for the institutions.
OCC audits are uncovering models that should not be running in production. Current governance practices translate to high compliance costs, for example in a given year US financial industry spends about $80 billion for model compliance. And between 2008–2016, US financial institutions paid close to $320 billion in regulatory fines [8].
Most teams do intermittent monitoring of their models in an ad-hoc manner. Intermittent monitoring fails to identify critical changes in the environment, data drifts, or data quality issues.
Issues don’t get detected or rectified promptly due to a lack of run-time monitoring and mitigation. They only get caught during model retraining or regulatory inquiries by which time the institution is already at risk of business loss, reputational damage, and regulatory fines.
Regulatory complexity and uncertainty make governance increasingly difficult. For US credit models alone, one has to make sure models are adhering to regulations like Fair Housing Act, Consumer Credit Protection Act, Fair Credit Reporting Act, Equal Credit Opportunity Act, Fair and Accurate Credit Transactions Act. It is possible for an AI model to be deployed in multiple territories where one jurisdiction has more conservative guidelines.
A growing number of model metrics are being proposed to quantify model bias using demographic parity, equalized odds, and other group fairness metrics [10]. Adding more metrics to an already non-scalable, manual model monitoring process does not help.

A new approach to Model Governance

These risks and the variety of AI applications and development processes call for a new Model Governance framework that is simple, flexible, and actionable. A streamlined Model Governance solution is a five-step workflow.

Model Governance workflow for the Modern Enterprise

Map: Record all AI models & training data to map and manage the model inventory in one place.
Validate: Conduct an automated assessment of feature quality, bias, and fairness checks to ensure compliance.
Approve: Ensure human approval of models prior to launch to production and capture the model documentation and reports in one place.
Monitor: Continuously stress test and performance test the models and set up alerts upon identifying outliers, data drift, and bias.
Improve: Derive actionable insights and iterate on your model as your customers, business, and regulations change.

Automated Model Governance with ModelOps

The long list of Model Governance challenges motivates the need for new solution approaches. Below is a blueprint of a ModelOps solution that can enable Model Governance and move AI towards a Self-Regulated AI [2].

A blueprint of a ModelOps based solution to enable Model Governance leading to regulated AI Models [2]

Let us look at the must-have capabilities of a ModelOps solution that enables Model Governance.

Continuous monitoring and reporting of Models, Datasets both pre and post-deployment. Monitoring must include essentials like: a) Monitoring of input streams for data drift, population stability, and feature quality metrics. b) Data Quality monitoring for missing values, range violations, unexpected inputs etc.
Integration of key self-regulatory analysis modules like: a) Explainability analysis for troubleshooting the models as well as to answer regulatory and customer inquiries. b) Fairness analysis that can help look at intersections of protected classes across metrics like Disparate Impact, Demographic Parity, etc.
Reusable templates to generate automatic reports and documentation like: a) Ability to integrate custom libraries explaining models and/or fairness metrics. b) Customization and configuration of the reports specifying what inputs and outputs are to be presented.
Runtime mitigation with human-in-loop alerting. The goal is to maintain model behavior more effectively during runtime. System capabilities include: a) Scenario-based mitigation for well-defined control paths that can be discovered during pre-deployment testing of the model on historical data or known established scenarios like holiday peak payment activity. b) System-level remediation through the use of alternative models such as using shadow AI models for population segments if the primary AI model shows detectable bias during monitoring.
Robustness tests for AI models. Continuous monitoring provides the opportunity to collect vast amounts of runtime behavioral data. A ModelOps system can then slice and dice data [11] to identify the weaknesses, failure patterns, and risky scenarios in the data. Robustness tests can help reassure AI Governance teams by showing a variety of scenarios being covered through this analysis.
Configurable risk policies and regulatory guidelines. Ability to configure a risk policy [8] for each of the model types to track through the lifecycle and setup of approval criteria will help Governance teams to ensure regulatory oversight of all the models getting deployed and maintained.
Autonomous Governance. Taking this forward, we can have an intelligent system that can automatically govern models resulting in a self-regulated state of AI. This requires a Model Governance Controller looking at all the models being monitored, absorbing model validation reports, and looking for anomalous behavior to send warnings to the human to take look at the problematic models.

Conclusion

In this article, we’ve illustrated the challenges of Model Governance in both regulated and unregulated industries and demonstrated the need for an Industry Agnostic Model Governance for AI that can help us create self-regulated AI systems in the future. We argue that ModelOps, a new enterprise solution blueprint [2] being developed to operationalize AI is the best way to implement Model Governance capabilities in the organization. And then we listed the capabilities of a Model Ops system that can be useful for an effective and automated Model Governance in the enterprise.