Understanding strategies that go beyond traditional Model Risk Management

“Aviation laws were written in blood. Let’s not reproduce that methodology with AI” — Siméon Campos

In 2018, Bloomberg’s story "Zillow's Algorithm-Fueled Buying Spree Doomed Its Home-Flipping Experiment" made quite a headline. It outlined Zillow's daring entry into the iBuying world, betting on its ML-powered Zestimate algorithm to revolutionize home flipping for profit. Despite a carefully structured start, incorporating local real estate experts to authenticate the algorithm's pricing, Zillow shifted to a fully algorithmic approach in the quest for faster offers. This move, however, did not pay off.

The Zestimate struggled to adapt to the swift inflation in the 2021 real estate market, prompting Zillow to take action to enhance the appeal of its offers. The company embarked on an ambitious buying spree, reportedly acquiring as many as 10,000 homes per quarter. However, the human workforce struggled to keep up with the sheer scale and speed of these acquisitions, a challenge exacerbated by the concurrent outbreak of the pandemic. In the face of mounting difficulties, including a backlog of unsold properties, Zillow decided to halt its offers in October 2021. Subsequent months witnessed homes being resold at a loss, leading to a substantial inventory write-down exceeding $500 million.

We initiate our discussion with a rather unfortunate incident, as the fall of Zillow's iBuying venture is embedded within a complex framework of causes. Although it's impossible to extricate this incident from the global pandemic of 2020 that disrupted the housing market, it certainly paves the way for a rich analysis. In this article, we'll use this as an example and shine a light on how the principles of governance and risk management discussed in our series could possibly avert such unfortunate debacles in the future.

Going beyond Model Risk Management

In the previous article, we discussed in detail how Machine Learning Risk Management (MRM) constitutes a comprehensive framework, along with a series of procedures aimed at identifying, assessing, mitigating, and monitoring risks associated with the development, deployment, and operation of machine learning systems. In this part, we will explore various strategies and practices beyond the realm of traditional Model Risk Management that prove to be exceptionally beneficial, especially concerning ML safety. We will commence by discussing the AI incident response.

AI Incident Response plan

The Zillow fiasco underscores an AI Incident of Failure, illustrating how a well-crafted algorithm could not keep pace with the fast-changing real estate market, leading to significant financial and reputational damage. Despite the best model training and validation tests, as noted even in the SR-11 guidelines, eliminating model risk is impossible, highlighting the pressing need for a solid Incident Response plan.

An AI incident plan is a preplanned strategy for quickly and effectively addressing AI issues, helping organizations swiftly identify, contain, and eliminate AI incidents and prevent costly or hazardous situations, especially crucial for smaller or new organizations. It is a well-regarded practice in computer security, with organizations like NIST and SANS underscoring its importance in managing ML and AI complexities. Like computer incident response, the AI incident plan operates in six clear phases, each vital for reducing risk in AI.

Phase 1: Preparation

To effectively prepare for AI incidents, organizations should define the incident parameters, allocate response budgets, develop communication plans, and implement technical safeguards. Practicing scenarios through tabletop exercises with key personnel enhances readiness.

Starter questions for an AI incident preparation phase. Image by author.

Phase 2: Identification

Identification involves detecting system failures, attacks, or abuses. It combines general security methods with specialized AI monitoring, like detecting concept drift or algorithmic discrimination. Once an issue is identified, relevant stakeholders, including management, are alerted.

Phase 3: Containment

Containment refers to mitigating the immediate harm caused by an incident, with the goal of reducing the initial damage. Incidents can have a tendency to spread beyond their point of origin, impacting other aspects of a business and its customers. The approach to addressing such issues may differ depending on their root cause, whether it is an external attack, an internal error, or misuse of the AI system. When deemed necessary, initiating communication with the public during the containment phase is advisable.

Phase 4: Eradication

Eradication means fixing the affected systems to stop the problem. This could be by blocking attacked systems to prevent further damage or shutting down a faulty AI system and temporarily using a trusted, simpler system instead. After eradication, the incident should not cause any more harm.

Phase 5: Recovery

Recovery involves fixing affected systems, preventing future issues, and possibly reviewing or improving technical procedures, especially if a mistake or malicious act caused the problem.

Phase 6: Lessons Learned

Lessons learned mean making changes or betterments to how we respond to AI incidents based on what worked well and what didn't within the current problem. These improvements can be related to either the process or the technology used.

Learnings from the Zillow iBuying Incident: Insights for AI Incident Response

After reviewing our AI incident response plan, let’s dive back into the Zillow iBuying saga. What insights can we gather from this chapter about the Zillow iBuying situation? Based on the public reports on this topic, it becomes apparent that there were potential red flags. These include Zillow's lack of human oversight, insufficient assessment of financial risks, and the absence of appropriate governance structures. While the specific answers regarding Zillow remain uncertain, it underscores the significance of extracting valuable lessons from this case to enhance our readiness and response to AI-related challenges within our own organizations, including:

Lesson 1: Validate with domain experts.
Lesson 2: Anticipate failure modes.
Lesson 3: Governance is crucial.
Lesson 4: AI incidents can scale rapidly.
Lesson 5: Emerging technologies always entail risks.

Additional Practices for Enhanced Risk Management

Besides the AI incident response discussed above, practices adopted from financial audits, data privacy, software development best practices, and IT security bring significant value to the field.

Model audits and assessments

Model audits are a formal evaluation exercise focusing on an ML system, ensuring compliance with specific policies, regulations, or laws. These formal evaluations, usually conducted by independent third parties, prioritize transparency and exhaustive testing. Model assessment is a similar but more informal check, possibly done by internal or external groups, checking for various issues such as bias, safety, data privacy harms, and security vulnerabilities.

For a deeper dive into model audits and assessments, two papers, Algorithmic Bias and Risk Assessments: Lessons from Practice and Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing provide excellent insights and frameworks for conducting these audits and assessments.

Impact assessments

Impact assessments are gaining traction in ML policies and proposed laws for anticipating and documenting potential system challenges. These assessments make it easier for AI designers and operators to understand and be responsible for the possible problems their systems could cause. However, they are just a beginning step. They should be done regularly and taken into account along with other factors to get a full picture of the risks. It's so important for them to be done by people who are not part of the ML teams being assessed to avoid any bias and ensure a thorough check.

While impact assessments play a pivotal role in risk management and governance strategies, their execution by independent professionals and integration with other risk factors is essential for overall efficacy.

Appeal, override, and opt-out

Have you seen the Report inappropriate predictions function in Google's search bar? It’s a basic way for users to point out problems. This feature allows users to challenge or correct ML systems decisions. This idea, also known as actionable recourse or redress, can vary in complexity. Another approach is the opt-out option, letting users skip automated processing. Both these options, recognized by many data privacy and US consumer finance laws, are crucial in defending consumer rights against automated ML errors. However, many ML systems still lack these features due to the planning and resources needed to integrate them from the onset.

Reporting inappropriate predictions through Google. Image by author.

Pair and double programming

Machine learning algorithms can be complex and unpredictable, making it hard to ensure they work correctly. Some top ML organizations double-check their work using two main methods:

Pair Programming
— Two experts code the same algorithm separately.
— They then collaborate to sort out any differences and ensure both versions work the same way.

Interestingly, Large Language Models (LLMs) are now being incorporated into pair programming. A recent course titled Pair Programming with a Large Language Model delves into the nuances of collaborating with LLMs in real-time coding scenarios.

Double Programming
— One person writes the same algorithm twice, each time using a different programming language.
— They then compare and reconcile any differences between the two versions.

Both methods help in finding and fixing bugs early, making sure the algorithm is solid before being used in real-world applications.

Security permissions for model deployment

There is a concept in IT security called the least privilege, which emphasizes that no system user should have excess permissions. Despite its significance, this is often neglected in ML systems, potentially causing safety and performance issues. It is a recognized practice that diverse roles, like product managers or executives, should make the final call on software release to avoid bias and ensure a thorough assessment.

The principle of least privilege demonstrated by privilege rings for the Intel x86. Image by Hertzsprung at English Wikipedia, CC BY-SA 3.0

During development sprints, it is essential for data scientists and engineers to have full control over their environments. However, as significant releases or reviews approach, the IT permissions to make changes to user-facing products should shift to other roles within the organization. This transition of control serves as a checkpoint, ensuring that unapproved or faulty code does not get deployed, thereby enhancing the security and reliability of the system.

Bug Bounties

Bug bounties are rewards given by organizations to people who find problems in their software, including machine learning systems. They are not just for finding security issues but also for problems related to things like safety, privacy, and reliability.

By offering money or other rewards, organizations encourage people to give feedback and find issues in their ML systems, making them more reliable and secure. If organizations are worried about making their bug bounties public, they can hold internal events where different teams search for problems in their ML systems. The key is to provide good incentives to get the best results.

Through bug bounties, we use monetary rewards to incentivize community feedback in a standardized process.

Many companies have launched Bug Bounty programs to detect and rectify vulnerabilities in their systems. Here are a few examples:

1. In 2021, Twitter (Now known as X) announced its inaugural algorithmic bias bounty challenge to explore potential biases in its image-cropping algorithm. This algorithm utilized an XAI technique known as a saliency map to determine which part of a user-uploaded image was most engaging.

Image Cropping Algorithm used by Twitter (now Known as X) for displaying images on feed. The algorithm was later dropped. Image by author.

2. On April 11, 2023, OpenAI announced a bug bounty program, inviting the security research community to participate. Rewards for findings range from $200 for low-severity issues to up to $20,000 for exceptional discoveries.

3. Meta has a history of running bug bounty programs for its platforms. However, when they introduced LLaMA 2 — their open-source large language model in February 2023, they also released a Responsible Use Guide. This guide included options for reporting bugs and security issues.

Conclusion

This article emphasizes the importance of governance, incident response, and expert validation in responsible AI development. As we explore strategies that go beyond the usual Model Risk Management, including AI incident response plans and borrowing practices from financial audits, data privacy, software development, and IT security, it becomes clear that a multifaceted approach is crucial to address the ever-changing challenges of AI in a responsible and secure way. The lessons learned from Zillow’s experience remind us of the need for strong risk management in AI, leading us toward the creation of more reliable and ethical AI systems in the future.

This article was originally published in Towards Data Science.

Bridging Domains: Infusing Financial, Privacy, and Software Best Practices into ML Risk Management