The rise of AI incidents: A framework for AI Governance

Tymoteusz Olszewski

September 9, 2025 | AI

According to an MIT report, as many as 95% of corporate GenAI projects fail to produce the expected results. For an AI Governance lead, it’s a warning of operational chaos, rising costs, and losing control. The compliance department, meanwhile, faces an audit nightmare, struggling with the fundamental question of how to prove they are in command of the technology. This article combines these two perspectives to show the source of the problems and outline a path from firefighting to collaborative risk management.

Key takeaways:

A deep gap exists between expectations and reality, as the vast majority of business AI projects fail to deliver a return on investment.
Failures in AI systems, from bias to privacy breaches, are an increasingly common and unavoidable trend that leads to significant financial and legal losses.
Reacting to incidents after they happen is an inefficient and dangerous strategy, making proactive risk management essential at every stage.
To effectively manage risk, AI Governance policies must be deeply integrated with existing software development lifecycle (SDLC) practices.
Automated platforms for oversight offer measurable savings, reducing testing and audit costs by 57% and compliance reporting costs by up to 81%.

The problem of AI incidents: scale and costs

As artificial intelligence finds its way into a growing number of critical systems, such as transport, healthcare, finance, and energy, the potential impact of its failures grows at an alarming rate. Errors that were once theoretical now lead to real-world damage. An analysis of the available data shows these are not isolated cases, but part of a problem appearing with increasing frequency.

The scale of AI failures

The scale of the problem has become so large that, following the example of the aviation and cybersecurity sectors, systematic efforts have begun to document these events. Publicly available databases, such as the AI Incident Database and the AIAAIC Repository, have collectively recorded thousands of incidents. Others, like the MIT AI Risk Database, have catalogued over 1,600 unique risks. The data clearly shows an accelerating trend, proving that this is a growing, not a marginal, phenomenon.

Costs and consequences of AI incidents

Despite enormous enthusiasm and capital investment, the reality of corporate AI implementation is far from optimistic. The MIT report “The GenAI Divide” paints a bleak picture, finding that 95% of business projects based on generative AI do not produce tangible results. This failure rate stands in stark contrast to the scale of investment. In the first half of 2025 alone, investors poured over $44 billion into AI startups and tools, and Goldman Sachs forecasts that this figure could reach nearly $200 billion by the end of the year.

An additional hidden cost is the so-called “verification tax.” The problem is that AI models often generate incorrect answers but present them with great confidence. As a result, employees must spend time carefully checking every piece of information obtained from the AI, which negates the promised time savings.

Four main risk vectors in AI

Analysis of thousands of documented incidents reveals four fundamental types of risk that companies must address: a lack of transparency, algorithmic bias, a crisis of faithfulness, and threats to data privacy.

Problem	Definition	Key statistic	Impact
Lack of Transparency	The inability to understand how an AI model makes decisions (the “black box” problem).	75% of companies believe a lack of transparency will increase customer churn (Zendesk).	Regulatory fines (e.g., under the EU AI Act), lawsuits, erosion of trust.
Algorithmic Bias	Systematic errors leading to unfair or discriminatory outcomes.	The COMPAS algorithm was twice as likely to incorrectly flag black defendants as repeat offenders.	Legal liability, reputational damage, operational failures.
Crisis of Faithfulness	Generating information that is factually incorrect or entirely fabricated (hallucinations).	AI hallucinations caused an estimated $67.4 billion in global damages in 2024 alone.	Direct financial losses, disinformation, brand damage, “verification tax.”
Data Privacy	Unauthorised use or disclosure of sensitive personal information.	The global average cost of a data breach reached $4.88 million in 2024 (IBM).	Regulatory penalties (GDPR), loss of trust, financial losses (Equifax: $1.38 billion).

A lack of transparency

As many as 75% of companies fear that a lack of transparency in AI will lead to customer churn, as shown in a Zendesk CX Trends report. This “black box” problem occurs when a model’s internal mechanisms are too complex for a human to understand, posing a direct business and regulatory threat. It is worth distinguishing between three concepts: transparency (openness about how a model is built), explainability (the ability to justify a specific result), and interpretability (a general understanding of how a model makes decisions).

The concept of Explainable AI (XAI) has emerged in this context. Achieving full and reliable explainability is, however, practically unfeasible with current technology, which is why even regulations like the EU AI Act do not explicitly require it in any risk category. Even so, in sectors like finance and healthcare, companies must justify every significant decision, such as granting a loan. An unexplainable algorithm that rejects an application without reason can violate consumer rights and expose a company to heavy fines.

Algorithmic bias

The COMPAS algorithm used in American courts was twice as likely to incorrectly classify black defendants as future repeat offenders than white defendants. This is a stark example of algorithmic bias, a phenomenon where AI systems systematically discriminate against certain groups, often unintentionally, by reproducing historical prejudices.

Bias can also be more mundane, such as an e-commerce model that suggests certain products based only on a customer’s gender or race, thereby reinforcing stereotypes. Similar errors have occurred on a massive scale, as with Amazon’s recruitment tool that learned to discriminate against women. Public mistrust is so high that, according to the American Staffing Association, 49% of job seekers believe AI tools are more biased than people.

A crisis of faithfulness (hallucinations)

Analysts estimate that global losses from AI hallucinations reached $67.4 billion in 2024 alone. This crisis of faithfulness occurs when an AI system generates information that sounds plausible but is false or completely fabricated. The consequences can be severe, as shown by cases like Stefanina’s Pizzeria, which had to deal with a wave of customers demanding non-existent promotions.

Data privacy

Since the public release of ChatGPT, the number of phishing emails has increased by over 4000%. This shows how AI can amplify threats to data privacy. Consumer concerns are widespread: 81% are convinced that AI companies will use their data in ways they are not comfortable with. The financial consequences of breaches are huge. According to IBM data, the average cost of an incident in 2024 was $4.88 million—a figure that can be devastating for any budget. In extreme cases, like the data breach at Equifax, the total cost can reach $1.38 billion.

Proactive AI lifecycle management

Problems with transparency and bias show that AI systems are not inherently reliable. Building their reliability requires careful engineering, with rigorous checks at every stage. Teams must shift from a reactive to a proactive model, where they systematically reduce risk at every stage. Such a model relies on three pillars: rigorous testing, continuous monitoring, and strategic fine-tuning.

Rigorous testing and validation

Testing AI systems is far more complex than for traditional software. In the pre-production phase, when preparing a model for the real world, teams must drastically expand the scope of testing to include integration, performance, and security tests that verify resilience against AI-specific attacks, such as adversarial attacks. The central goal of validation is to check if the model can generalise. This means it must perform correctly on data it has not seen during training, which prevents the phenomenon of overfitting.

Continuous real-time monitoring

Deploying a model is not the end but the beginning of a decisive stage in its life. The performance of AI models naturally degrades over time in a process known as “model drift,” which occurs when real-world data begins to differ from the data on which the model was trained.

Studies show that models stop performing their intended function within an average of 18 months of deployment when teams do not actively monitor them. An essential continuous monitoring strategy therefore involves tracking business metrics (KPIs), analysing input and output data, and using statistical tests for automated drift detection.

Strategic model fine-tuning

When monitoring reveals a decline in performance, teams must take corrective action. One of the most effective techniques is supervised fine-tuning, which involves further training a model on a smaller, specialised dataset to eliminate unwanted behaviours. Case studies show that this technique can effectively eliminate bias, for example, in loan approval systems.

Auditor: a platform for AI Governance and compliance

Auditor is a tool that addresses the challenges of document analysis, management, and the simplification of manual audit processes for AI systems. The platform brings benefits to both AI Governance and Compliance teams.

Simplifying audits and document management

For compliance teams focused on auditing and document compliance, Auditor offers a set of tools that automate the most tedious tasks. A specialised Doc. Auditor module allows for contextual cross-analysis of documents, and a key gap analysis feature detects omissions and inconsistencies in real time. The system automatically generates compliance reports, creating a consistent and reliable audit trail that significantly speeds up the process of gathering evidence for auditors.

Through integration with SharePoint and a built-in Policy Manager that monitors changes in regulations such as DORA, GDPR, the AI Act, and NIS2, Auditor centralises the management of multiple policies.

Automating the AI lifecycle and monitoring

For those responsible for AI Governance, who care about the quality and maintenance of systems, Auditor automates the entire lifecycle. The platform allows for rapid testing of prototypes and fully automates the testing and auditing of production systems in real time.

The AI model, in a sense, tests itself, and this process generates evidence of compliance. Oversight teams receive real-time data to assess risk, impact, bias, and confidentiality. This automation, described as a “human-in-the-loop” system, can reduce costs related to testing, auditing, and supervision by 57%, and in the case of generating reports and analyses, by as much as 81%.

Control over key risks

The platform is designed to control the four fundamental types of risk. It confronts the transparency problem by operating on measurable metrics, not as a ‘black box’ itself. To counter bias, the system actively helps with its detection and provides data for fairness analysis. It also manages the risk of hallucinations by evaluating response quality and supporting the fine-tuning process. Finally, it protects data privacy through on-premise integration, ensuring no data leaves the client’s environment.

Conclusion

The high, 95% failure rate of corporate GenAI projects is direct proof that AI system failures are an unavoidable and costly consequence of a lack of management frameworks.

So, where is the disconnect?

This analysis shows that the only effective solution is a change in approach. Instead of reacting to problems, teams must proactively manage them by incorporating AI Governance into every stage of the software development lifecycle (SDLC). Implementing these principles manually at scale is not feasible. At the same time, reports from Artificial Intelligence News show that only 5% of companies have mature AI management systems.

Success, therefore, depends on investing in automated platforms like Auditor that provide the necessary control and monitoring. Ultimately, organisations face a choice: either continue with costly and risky experiments, or adopt a thoughtful, engineering-led approach. It is the only one that allows for safe development in the age of AI.

Frequently Asked Questions (FAQ)

Why do so many corporate artificial intelligence projects fail?

According to a prominent MIT report, an estimated 95% of corporate projects using generative ai do not produce their expected results. This high failure rate for AI systems is often a direct consequence of lacking effective governance frameworks to manage the new risks involved. This creates a significant gap between expectations for artificial intelligence and the reality of its implementation, leading to costly failed AI projects.

What do reported incidents from an AI incident database reveal about the main risks?

Analysis of thousands of reported incidents from sources like the AI Incident Database reveals four fundamental types of risks. Companies must address these risks to prevent future harms from their AI systems. These primary and secondary risks are:

A lack of transparency in how the AI makes decisions.
Algorithmic bias leading to unfair or discriminatory harms.
A crisis of faithfulness, where the AI generates incorrect data, causing damaging incidents.
Threats to data privacy from the misuse of the tool.

What is algorithmic bias in AI systems?

Algorithmic bias is a serious risk where an AI system systematically discriminates against certain groups. These incidents often happen unintentionally as the AI reproduces historical prejudices found in its training data. A stark example is the COMPAS algorithm, which caused discriminatory harms by being twice as likely to incorrectly flag black defendants as repeat offenders in its reports.

What are AI hallucinations and what are their harms?

AI hallucinations are dangerous incidents where an AI system generates information that sounds plausible but is false or completely fabricated data. The harms from these errors can be severe; analysts’ reports estimate that global financial losses from these specific AI incidents reached $67.4 billion in 2024 alone, a truly crucial problem for modern business AI.

How does a lack of AI transparency create new risks?

A lack of AI transparency, known as the “black box” problem, is a major business and regulatory risk. It occurs when an AI system’s decision-making process is too complex for a human to understand. This erodes trust and can result in regulatory fines, representing one of the most significant risks for any company deploying AI.

How can companies proactively manage AI risks and incidents?

To proactively manage AI risks, teams must shift from reacting to incidents to a model where they systematically reduce risk at every stage of the AI development lifecycle. This process relies on three pillars: rigorous testing to find errors, continuous tracking of performance, and strategic fine-tuning to prevent future AI incidents.

Why is continuous monitoring of AI models crucial for safety?

Continuous monitoring is crucial for safety because the performance of AI models naturally degrades over time in a process known as “model drift.” This drift in AI performance can lead to serious incidents if not actively monitored. Maintaining oversight with this practice is the only way to ensure the tool functions as intended.

What is the Auditor platform for AI Governance?

Auditor is a tool that addresses the challenges of AI Governance and compliance. It automates tedious tasks, like generating compliance reports and creating a reliable audit trail in its database. This allows teams to assess AI risks, impact, and bias in real-time, providing the support needed to manage complex AI systems and their many reports.

How do AI incidents, beyond business losses, affect society, and what is the role of policymakers?

AI incidents can inflict significant harms on society, extending far beyond financial costs. When AI systems fail, they can lead to severe real-world consequences, such as wrongful arrests based on flawed facial recognition data, perpetuating systemic discrimination. In the most extreme cases, these actual incidents have even led to death. These events erode public trust and impact every community. In response, policymakers are becoming crucial in shaping the future of the world’s AI development. They are tasked with creating regulations that protect citizens, often based on reports and evidence from researchers.

From a technical standpoint, what does a responsible AI development process look like?

A responsible AI project begins with researchers carefully selecting a high-quality dataset for training, which could include anything from text to video files. Throughout the development of the software, detailed metadata is tracked to ensure transparency. This process often involves multiple partners, including ethics and security teams. Before deployment, there is typically a code freeze, a critical point where no new changes are made, to ensure stability. This structured approach is fundamental to safety, preventing the misuse of the technology and ensuring the final tool functions as intended, providing real support to its human users.

This blog post was created by our team of experts specialising in AI Governance, Web Development, Mobile Development, Technical Consultancy, and Digital Product Design. Our goal is to provide educational value and insights without marketing intent.

Our services

Our solutions

Delivery acceleration

Our integrations

Get to know us:

Careers

Knowledge Hub

Speedtalks – our podcast

Our blog

Our recent atricles