A new report from MIT has sent shockwaves through the enterprise AI world. According to the State of AI in Business 2025 study, 95% of generative AI pilots deliver zero return on investment. The findings, based on 300 public deployments and more than 150 executive interviews, suggest that billions of dollars have been spent on AI experiments that never scale — and that most organizations are stuck on what MIT researchers call the “GenAI Divide.”
The numbers are stark. Forty percent of organizations say they’ve deployed AI tools, but only 5% have managed to integrate them into workflows at scale. Most projects die in pilot purgatory. Meanwhile, headlines are warning of an “AI bubble,” and investors are shorting AI stocks on the idea that generative AI’s big enterprise moment is already stalling out.
But not everyone agrees with that reading.
“Confidently wrong is the problem,” says Tanmai Gopal, co-founder and CEO of PromptQL, a unicorn AI company that counts OpenAI, Airbus, Siemens, and NASA as customers. “If the system is not always accurate even the tiniest percent of the time, I need to know when it’s not. Otherwise, my minutes turn into hours; the ROI disappears.”
The Verification Tax
In his blog post, Being “Confidently Wrong” Is Holding AI Back, Gopal describes what he calls the “verification tax.”
“I don’t know when I might get an incorrect response from my AI. So I have to forensically check every response.”
This tax explains much of what MIT labeled as the GenAI Divide. Enterprises eagerly launch pilots, but employees end up spending so much time double-checking outputs that the promised efficiencies never materialize.
It’s not that generative AI lacks raw horsepower — the models can be dazzling. It’s that their confidence is uncalibrated. In regulated or high-stakes industries, one bad answer can outweigh ten good ones. As Gopal puts it: “For serious work, one high-confidence miss costs more credibility than ten successes earn.”
The Learning Gap
MIT’s researchers framed the same issue differently. They found that most enterprise AI tools don’t retain feedback, adapt to workflows, or improve over time. Without those qualities, they stall.
Gopal agrees. “Without high-quality uncertainty information, I don’t know whether a result is wrong because of ambiguity, missing context, stale data, or a model mistake. If I don’t know why it’s wrong, I’m not invested in making it successful.”
That insight matters because it reframes the entire conversation. If AI isn’t failing due to lack of capability, but because it hasn’t been designed to communicate its limits and learn from corrections, then the fix is less about building bigger models — and more about building humbler ones.
How PromptQL Solves It
PromptQL has built its entire platform around solving this exact problem — what Gopal calls the difference between being “confidently wrong” and “tentatively right.”
Instead of presenting outputs as gospel, PromptQL calibrates confidence at the response level:
- Quantifies uncertainty. Every answer comes with a confidence score. If the system is unsure, it abstains — effectively saying “I don’t know.
- Surfaces context gaps. Rather than hiding uncertainty, the system flags why an answer may be unreliable: missing data, ambiguity, or lack of context.
- Builds an accuracy flywheel. Each abstention or correction becomes training fuel. PromptQL captures those signals, letting the system improve continuously — closing the “learning gap” MIT identified as the number one cause of pilot failure.
- Integrates into workflows. Instead of sitting in a chatbox, PromptQL embeds directly into enterprise processes like contracts, engineering, or procurement, so uncertainty flags and corrections appear exactly where the work is happening.
“The starting point of this loop is if an AI system could tell the user when it’s not certain about its accuracy in a concrete and native way,” Gopal writes. That loop — abstain, get corrected, learn — is what he calls the accuracy flywheel. “We don’t need perfection; we need a loop that tightens.”
Tentatively Right Beats Confidently Wrong
This humility-first approach has led to adoption in some of the most skeptical corners of the enterprise market. While 95% of pilots stall, PromptQL is closing seven- and eight-figure contracts with Fortune 500s, government agencies, and regulated industries — the exact places MIT says AI has struggled to gain traction.
The company is living proof that enterprise AI is not failing. The wrong kind of enterprise AI is.
As Gopal puts it: “No amount of solving any other problem — integration, data readiness, organizational readiness — will change the fact that AI’s tendency to be confidently wrong keeps it out of real-world use cases.”
A Different Conclusion
The takeaway, then, is not that AI is doomed to fail. It’s that enterprises must demand a different kind of AI: one that is transparent about its uncertainty, tightly integrated into workflows, and capable of improving with every interaction.
The MIT report is right to highlight the GenAI Divide. But if we only focus on the 95% that failed, we miss the 5% that are actually scaling — and why.
The companies that build and adopt AI that admits when it doesn’t know are quietly rewriting the story. PromptQL is one of them.
And if their traction holds, the conclusion isn’t that enterprise AI is a bubble. It’s that a small handful of companies have already figured out how to burst it.