My last article introduced what I called the maturity illusion. Most organizations are scaling AI activity without a calibrated view of where they stand, and the costs of that sequencing error accumulate across four dimensions before they become visible. That argument was about maturity. This one is about the layer underneath it. The delivery environment determines whether what your organization has built is fit for the business to rely on.
Adoption failure has a standard diagnosis in most organizations. The business isn't ready. Culture is resistant. Leaders haven't championed it. Change management is missing. Those challenges dominate our advisory work, and the next article in this series will address them directly. But they are not the whole story, and in some cases they are not even the primary story. A substantial portion of adoption failure has supply-side roots that analytics leaders may be misreading as demand-side resistance.
The business doesn't adopt what it doesn't trust. And trust breaks down before the business ever evaluates a model. It breaks down in the infrastructure that was supposed to deliver the model reliably, in the data quality that determines whether the model's outputs hold under scrutiny, and in the documentation that tells a business stakeholder or an auditor what the model is based on, how it was validated, and what its limits are. Three delivery problems drive most of that breakdown. None of them are new to analytics leaders. Most organizations have not measured them honestly.
[Webinar] Getting the Business to Use AI: Lessons from Pella's Agentic Journey
Pella Corporation deployed AI across 14+ manufacturing plants and is now deep into agentic initiatives redefining how decisions get made on the plant floor. Join Jacey Heuer, Pella's head of AI, data science and advanced analytics, as he shares the unvarnished account of how they got there.
The Infrastructure Gap Stops Models in Their Tracks
We’ve all seen the statistics: more than 80 percent of machine learning models never reach production. This figure points to a specific and common failure mode. The operational infrastructure required to move a model from development into a production environment where it can be used simply does not exist. The algorithms work. The production environment doesn't.
The downstream effects of that gap are predictable. Data scientists spend 40 to 60 percent of their time on infrastructure tasks rather than on model development or business problem solving. Manual handoffs, bespoke integration, environment configuration, and ad hoc monitoring consume the time that should go toward building. Deployment cycles that should take hours stretch across months. The models that do reach production frequently run without drift detection, without automated retraining, and without the monitoring infrastructure to catch degradation before it reaches the business.
The business experiences this as unreliability. A model works for a quarter and then starts producing outputs that feel wrong. The team that built it investigates, manually. The investigation takes weeks. The business, which never had full visibility into how the model was built or maintained, loses confidence and routes around it. Analytics leadership reads this as adoption failure and starts planning training programs. The root problem is a deployment pipeline that cannot sustain what it produced.
A production-grade MLOps pipeline addresses this through six stages. It begins with data ingestion using automated schema validation and quality scoring, moves through feature engineering with standardized transformations stored in a central feature store, and requires distributed infrastructure and automated experiment tracking at the model training stage. Model validation gates advancement on performance testing and bias detection results. Deployment runs through canary rollouts with rollback capability. Production monitoring issues automated alerts when performance degrades beyond five percent from baseline. Most enterprise AI environments have components of this. Few have the full pipeline automated and governed. The assessment question is whether your pipelines can sustain production at the reliability and speed the business expects.
The target most analytics leaders should be measuring against is deployment cycles below five hours, uptime above 99.9 percent, and automation covering more than 85 percent of the ML lifecycle. Very few non-digital-native enterprises are at those thresholds today. The honest question is how far the current environment is from them, and whether the investment case for closing that gap has been made to executive leadership in terms they can act on.
Data Quality Is a Threshold, Not a Spectrum
The second delivery problem runs deeper than infrastructure. Before a model can fail to reach production, the data it would be built on must clear a quality threshold that most organizations have not formally defined or measured. IIA recommends data and AI leaders to apply a four-tier data readiness taxonomy to their environment.
“Production ready” data scores between 85 and 100 and is approved for production AI and machine learning with standard monitoring. “Conditional data,” scoring 70 to 84, is approved with enhanced monitoring and documented limitations — the business can use it, but only with explicit acknowledgment of what it can and cannot support. “Development only data,” scoring 50 to 69, is suitable for prototyping and testing but should not reach production. Data that scores below 50 requires remediation before any AI use. The minimum threshold for model development is a score of 70.
The scoring framework evaluates data across seven dimensions: completeness, accuracy, consistency, timeliness, validity, uniqueness, and bias. Each dimension creates specific failure modes in AI systems. Missing data causes models to learn incorrect patterns and develop biased predictions toward available data segments. Timeliness failures mean models are operating on stale inputs in environments where decisions are time-sensitive. Bias in training data produces discriminatory outputs that the model then systematizes and scales. And these are not edge cases. They are the predictable results of deploying AI on data that was never assessed for production readiness.
What makes this a trust problem rather than merely a technical one is what happens when business stakeholders encounter the consequences. They don’t see the data quality score. They see a model output that produces a customer recommendation that makes no sense, a risk score that conflicts with what the underwriter knows from experience, a maintenance prediction that misses a failure the frontline operator could have told you was coming. They form a judgment about the model — that it can’t be trusted — and they form it quickly. That judgment is sticky. It does not reverse easily when the model is later improved or the data quality issue is addressed.
The practical step for analytics leaders is to apply the four-tier taxonomy to the data assets powering your current production models and your next priority use cases. “Good” data in the abstract is worthless to the business. What you’re looking for is whether that data scores above the 70-point production threshold on each of the seven dimensions, who has formally reviewed and documented that assessment, and what the remediation roadmap is for any asset that falls short. In most organizations running honest assessments, a significant share of data assets supporting active AI models have never been formally scored. The production risk is present. It is simply undocumented.
Documentation Gaps Are a Business Confidence Problem
The third delivery problem is documentation. Documentation gaps of 20 to 40 percent are typical across enterprise AI programs at the point of first structured review. Those gaps are not primarily a compliance liability, though they create one. They are a business confidence problem.
A business stakeholder who is being asked to make decisions based on an AI output needs to be able to answer basic questions about what they are relying on. Where does the data come from? When was the model last validated? What are its known limitations? What happens when it is wrong? Has someone with authority reviewed it and approved it for this use case? If those questions cannot be answered with documented evidence, the business will not adopt the model in any setting where the answers are consequential. In regulated industries, banking, insurance, and healthcare, that standard is explicit. In complex operational environments, it is informal but equally real. A plant manager will not change how maintenance decisions are made based on a model no one can explain.
In customer intelligence environments, the same dynamic plays out differently but produces the same result. Personalization models, recommendation engines, and customer segmentation outputs require business unit adoption to generate value. A merchandising director or marketing leader who cannot get a straight answer about how a model was built, what data it trained on, or when it was last validated will route around it and rely on intuition. That pattern is consistent across retail, telecom, travel, and CPG. The consequence is that analytics capability concentrates in specialist teams, business units never change their decision behavior, and investment in personalization and customer intelligence shows limited impact on actual made.
Three mandatory approval gates govern documentation in a well-designed AI governance structure. Data acquisition requires review and formal sign-off from legal, compliance, and security, covering a privacy impact assessment, a data quality report card scoring above 70, and a bias risk assessment. Model validation requires technical review and ethics board sign-off, including validation reports covering accuracy, robustness, and fairness. Production deploy requires business review and change advisory board authorization, including rollback plans and security scans. Each gate produces documentation. That documentation is what allows the business to trust what it is using, and what allows an auditor, a regulator, or a senior executive to verify the claim.
Most enterprise AI programs have informal versions of some of these gates. They do not have all three as formal, documented requirements with consistent enforcement. So, what we see in most environments is that some models have clean lineage, validation documentation, and clear approval records. Others were moved into production quickly under business pressure and lack any of it. The business cannot distinguish between the two from the outside. When a model without documentation fails or produces a questionable output, the credibility damage affects the program broadly, not just the specific model that caused the problem.
The audit exercise that most exposes documentation gaps is to pull the ten most consequential AI models currently in production and ask for the data quality report card, the validation report, and the formal approval record for each. What typically surfaces is that a fraction have complete documentation, a larger fraction have partial documentation that would not survive a structured review, and some have none at all. The business, at some level, knows the documentation is not there, and that knowledge shapes how much it is willing to rely on what it is being asked to use, amplifying the adoption barrier.
What the Supply-Side Audit Reveals
Taken together, these three delivery problems form a supply-side trust gap that most analytics leaders have measured only through self-assessment or through vendor diagnostics conducted by partners embedded in their existing ecosystem.
The same organizations now pursuing agentic AI at scale are relying on that supply-side environment to support it. Agentic systems take actions, chain decisions, and operate without direct human involvement in individual outputs. The infrastructure, data quality, and documentation requirements for that level of autonomy are higher than for traditional analytics delivery, and the consequences of supply-side gaps are proportionally larger. A governance documentation problem that creates friction in a dashboard environment creates a liability in an agentic one.
The supply-side audit is not a slow-down. It is a prerequisite for making AI adoption decisions on accurate information rather than optimistic ones. The organizations in IIA's assessment work that are seeing real adoption progress are not the ones that assumed their delivery environment was ready. They are the ones that measured it, found the gaps, and addressed them before asking the business to rely on what was built on top.
That is the diagnostic lens the next article in this series applies to the demand side. Once the delivery environment is genuinely trustworthy, the question shifts: is the business positioned to use what now sits on top of it? That question belongs to change management rather than technology, and most organizations are getting that order wrong.
The Baseline
Audit the strength of your data, analytics, and AI operating model. The Baseline reveals structural risk, business misalignment, and what to fix first in 30 days.