top of page
Writer's pictureRajesh Koppula

An Approach to Policy for AI Model Risk Management in Financial Services

Updated: Jul 29

In the realm of AI Model Risk Management, financial institutions are tasked with the formidable responsibility of identifying, assessing, and mitigating risks associated with AI models. This encompasses the evaluation of model accuracy, fairness, transparency, and explainability to ascertain that their deployment aligns with regulatory guidelines and organizational objectives. The proactive management of AI model risks not only fosters trust among stakeholders but also safeguards against potential financial, operational, and reputational ramifications.


As Financial Services Industry is embracing AI/LLM's POC's into their operational use cases, there is an urgent need to address the importance of overhauling and updating the Model governance (MG) and Model Risk Management (MRM) frameworks to accommodate AI innovations. These two functions typically report directly to the Board of Directors at Financial Institutions(FI's) and also serve as conduit to passing the regulatory audits, and hence it is important to understand how AI/ML/GenAI/LLM's are impacting the existing frameworks and work on rapidly revising them while keeping the board of directors apprised of the changes and help calm their nerves. Independent Model Validations/Model Audits(IMV/MA) of these frameworks on regular basis will generate comprehensive reports that continue to distill recommendations for evolving new governance frameworks for LLM's use in Financial Services.


Given the complexity involved in Model explainability and hallucinations, especially with LLM's, developing a modern framework of IMV and MA will check the boxes for regulatory oversight and accelerate the adoption of GenAI use cases with compliance. Adoption of these new frameworks will accelerate approvals at the board of directors and regulatory agencies, and thereby will help realize the ROI on AI investments and rapidly move the AI/LLM use cases from POC's to mainstream delivery and adoption.

Before we propose changes to existing frameworks, let's first try to understand MRM/MG functions in more detail.


Model Risk Management (MRM)

Model Governance (MG)

Focus

Identifying, assessing, monitoring, and controlling the risks associated with model use.

Establishing the overall framework, policies, and procedures for managing models.

Scope

 More specific, centered on the potential negative impacts of model errors or failures.

Broader, encompassing the entire model lifecycle from development to retirement.

Responsibilities

Includes risk assessment, mitigation strategies, and contingency plans.

Defines roles and responsibilities for model development, validation, implementation, monitoring, and review.

Outcome

Protects the organization from financial or reputational losses due to model issues.

Ensures a robust model management infrastructure.

Pre Gen AI era, before the LLM's started parading the world, the above framework is very robustly developed, well defined and already running seamlessly in Financial Services. As most of the models used in Pre-Gen AI era are quantitative models (typically regression models), there were no hallucinations in the models for the greater good and further the processes in the model lifecycle ( development, validation, implementation, monitoring and review) were well established.


Now with LLM's/Gen AI use cases, five things have changed significantly that impacts MRM/MG.


First, FI's are developing POC's with their own custom LLM's from Foundational Models ( LLAMA, Gemini, OpenAI, Cohere, Mistral, etc ) by recalibrating the weights of the Foundation models. The recalibration of the foundational model weights is achieved by the re-vectorization with custom/proprietary FI data. This means that the model development process has now two distinct processes (Foundational Model Development , Custom LLM Development- Recalibration of Foundation models). So, when IMV is conducted, one needs to study both the models and opine on the choice of the Foundation model used, the make up/documentation of how Foundational model was developed in the first place, and how the re-vectorization was achieved using custom data and RAG* (Retrieval-Augmented Generation). Without a detailed analysis of the choices and methods used to develop the custom LLM, one cannot pass the sniff test for Model Development. This is a big deviation from the traditional model development. This also exposes the agency of ownership between foundation model(s) and custom LLM, when in fact the output that the customer consumes has dependency on both the models.


Second, there is a time lag (we believe it is at least 6+ months) between the Foundational models (lag even between the release schedule of Foundation models and the data snapshots used) and the Custom LLM model development thereby leading to out of context data time mis-matching. We believe this mismatching of time snapshots in the development of the custom LLM's aggravates the hallucinations and drift in model performance over time. What it also means is that the custom LLM's and the Foundational models need to be constantly refreshed as frequently as possible to reduce the time lag differences and hallucinations. This necessitates the importance of CI/CD process in AI/LLM Model usage. However, implementing CI/CD will make the life very difficult for model validations, as the data and the models are both going to change dynamically, making it impossible for the validations to anchor on a snapshot. A good approach for CI/CD on LLM's should be initiated after the first iteration, process set up and approval of Model Validation report. In addition, we believe CI/CD should go hand in hand with the development of Continuous Model Validation (CMV) - a net new novel process that needs to developed and automated.


Third, the Model Validations need to be conducted across two different models(Foundation, Custom LLM) in four distinct time frames( Foundational model development time snapshot T0 for benchmarking foundation model(s) performance, Custom LLM model development & validation at time snapshot T1, Implementation model performance and monitoring metrics at time snapshot T2 for testing custom LLM and drift for Foundation model, Out of time model performance snapshot T3 to measure drift for custom LLM and Foundation model). This process is also a big change to the prior traditional model validations. Further more, the LLM's have an entirely new set of model performance metrics that are new to the regulators and financial services industry. While Foundational models publish their performance metrics, a thorough documentation of how these metrics are derived independently at different points in time is needed. In the absence of the Technology firms not publishing the foundational model performance metrics on an ongoing basis, it leads to stale metrics, aggravates the hallucinations and mis-informs the timing of when these models need to be refreshed. We also believe that the test use cases for the model validations for Foundation models need to be refreshed with new context as time elapses to measure the actual drift in model performance for Foundational models.


Fourth, Implementation is a big change when compared to the traditional models and is also the most important and the most complex in nature. As the modern technology landscape is increasingly becoming more complex (On-Prem, hybrid cloud, multi cloud), the implementation of LLM's need to be anchored on redundancy across cloud environments. The model validation and MG requires to study the model processes and overall infrastructure and evaluate the model metrics for regular monitoring and review. Identifying vulnerabilities and measuring risks in the model infrastructure, redundancy for system/LLM disruptions, security postures with the choice of foundation models and LLMs, augmentation with human/manual interventions and processes makes this critically important for reducing operational risks.


Fifth, the validation of models has to coexist with the business user feedback sent to the custom LLMs enriched with enterprise context in typical RAG applications. This feedback - typically upvote/down vote or some kind of a ranking of the output, is sent to the user can be sent to the LLMs installed within the enterprise. This allows for the LLM models to be finetuned with RLHF (Reinforcement Learning with Human Feedback). This also raises an important aspect of whether to send in all the feedback through the LLM APIs or batch it with a level of periodicity as the MRM&V framework would allow. This adds another mode of complexity to the overall model governance needed with the advent of LLMs.


With all the above context, we propose Katalyst Street LLM FI-Audit, a new framework for AI/LLM Independent model validations/ for Financial Services Industry. We believe that independent bodies/firms should conduct these validations and generate comprehensive model validation/audit reports and recommendations to the board of directors and regulators to improve MRM/MG functions of the financial institutions.



As we navigate the complexities of AI & LLM policy in the financial services domain, a robust approach to Model Governance becomes imperative. By establishing clear accountability structures, rigorous documentation processes, and continuous monitoring mechanisms, institutions can fortify their model management frameworks while fostering a culture of compliance and innovation.


Navigating the intricate landscape of Model Governance requires a strategic blend of technological prowess, regulatory acumen, and operational agility. Leveraging advanced tools such as explainable AI, model validation frameworks, and risk quantification methodologies equips organizations with the necessary arsenal to address the evolving challenges of AI integration effectively.


*RAG stands for Retrieval-Augmented Generation. It's a technique in natural language processing (NLP) that combines the strengths of both retrieval-based and generative-based artificial intelligence (AI) models.


How RAG works:

  1. Retrieval: The AI model first searches through a vast amount of custom databases and information (like documents, databases, or the internet) to find relevant data related to the user's query.

  2. Augmentation: The retrieved information is then combined with the original query.

  3. Generation: A generative AI model uses this combined information to create a comprehensive and informative response.


To learn more about Katalyst Street LLM FI-Audit framework methodology and engaging our services for a comprehensive Independent LLM audit report and recommendations, reach us at contact@katalyststreet.com


Acknowledgement : We would like to acknowledge, express gratitude and thank our Guest Writer, Shankar Ramanathan, Global Head - AI& Advanced Analytics Portfolio Solutions Banking & Capital Markets, Capgemini US for thoroughly reviewing this thought piece and providing additional color and contributions around RLHF complexity with LLM's.

94 views
bottom of page