To be a business is to constantly work toward improved operations. As a business grows, this usually leads to the possibility of using predictive AI, which is the kind of analytics that improves existing, large-scale operations.
But the mystique of predictive AI routinely kills its value. Rather than focusing on the concrete win that its deployment could deliver, leaders get distracted by the core tech’s glamor. After all, learning from data to predict is sexy.
This in turn leads to skipping a critical step: forecasting the operational improvement that predictive AI operationalization would deliver. As with any kind of change to large-scale operations, you can’t move forward without a credible estimation of the business improvement you stand to gain – in straightforward terms like profit or other business KPIs. Not doing so makes deployment a shot in the dark. Indeed, most predictive AI launches are scrubbed.
So why do most predictive AI projects fail to estimate the business value, much to their own demise? Ultimately, this is not a technology fail – it’s an organizational one, a glaring symptom of the biz/tech divide. Business stakeholders delegate almost every aspect of the project to data scientists. Meanwhile, data scientists as a species are mostly stuck on arcane technical metrics, with little attention to business metrics. The typical data scientist’s training, practice, shop-talk and toolset omits business metrics. Technical metrics define their comfort zone.
Valuating Predictive AI Isn’t Hard
Estimating the profit or other business upside of deploying predictive AI – aka ML valuation – is only a matter of arithmetic. It isn’t the “rocket science” part, the ML algorithm that learns from data. Rather, it’s the much-needed prelaunch stress-testing of the rocket.
Say you work at a bank processing 10 million credit card and ATM card transactions each quarter. With 3.5% of the transactions fraudulent, the pressure is on to predictively block those transactions most likely to fall into that category.
With ML, your data scientists have developed a fraud-detection model that calculates a risk level for each transaction. Within the most risky 150,000 transactions – that is, the 1.5% of transactions that are considered by the model most likely to be fraud – 143,000 are fraudulent. The other 7,000 are legitimate.
So, should the bank block that group of high-risk transactions?
Sounds reasonable off the cuff, but let’s actually calculate the potential winnings. Suppose that those 143,000 fraudulent transactions represent $18,225,000 in charges – that is, they’re about $127 each on average. That’s a lot of fraud loss to be saved by blocking them. But what about the downside of blocking them? If it costs your bank an average of $75 each time you wrongly block due to cardholder inconvenience – which would be the case for each of the 7,000 legit transactions – that will come to $525,000. That barely dents the upside, with the net win coming to $17,700,000.
So yeah, if you’d like to gain almost $18 million, then block those 1.5% most risky transactions. This is the monetary savings of fraud detection, and a penny saved is a penny earned.
But that doesn’t necessarily mean that 1.5% is the best place to draw the line. How much more might we save by blocking even more? The more we block, the more lower-risk transactions we block – and yet the net value might continue to increase if we go a ways further. Where to stop? The 2% most risky? The 2.5% most risky?
Viewing Predictive AI’s Range Of Deployment Options
To navigate the range of predictive AI deployment options, you’ve just got to look at it:
This shows the monetary win for a range of deployment options. The vertical axis represents the money saved with fraud detection – based on the same kind of calculations as those in the previous example – and the horizontal axis represents the portion of transactions blocked, from most risky (far left) to least risky (far right). This view has zoomed into the range from 0% to 15%, since a bank would normally block at most only the top, say, two or three percent.
The three colors represent three competing ML models: two variations of XGBoost and one random forest (these are popular ML methods). The first XGBoost model is the best one overall. The savings are calculated over a real collection of e-commerce transactions. So was the previous example’s calculations.
Let’s jump to the curve’s peak. We would maximize the expected win to more than $26 million by blocking the top 2.94% most risky transactions according to the first XGBoost model.
But this deployment plan isn’t a done deal yet – there are other, competing considerations. First, consider how often transactions would be wrongly blocked. It turns out that blocking that 2.94% would inconvenience legit cardholders an estimated 72,000 times per quarter. That adverse effect is already baked into the expected $26 million estimate, but it could incur other intangible or longer-term costs; the business doesn’t like it.
But the relatively flatness that you can see near the curve’s peak signals an opportunity: If we block fewer transactions, we could greatly reduce the expected number wrongly blocked with only a small decrease in savings. For example, it turns out that blocking 2.33% rather than 2.94% cuts the number of estimated bad blocks in half to 35,000, while still capturing an expected $25 million in savings. The bank might be more comfortable with this plan.
Establishing The Credibility Of ML Valuation
As compelling as these estimated financial wins are, we must take steps to shore up their credibility, since they hinge on certain business assumptions. After all, the actual win of any operational improvement – whether driven by analytics or otherwise – is only certain after it’s been achieved, in a “post mortem” analysis. Before deployment, we’re challenged to estimate the expected value and to demonstrate its credibility.
One business assumption within the analysis described so far is that unblocked fraudulent transactions cost the bank the full magnitude of the transaction. A $100 fraudulent transaction costs $100 (while blocking it saves $100). And a $1,000 fraudulent transaction indeed costs ten times as much.
But circumstances may not be that simple, and they may be subject to change. For example, certain enforcement efforts might serve to recoup some fraud losses by investigating fraudulent transactions even after they were permitted. Or the bank might hold insurance that covers some losses due to fraud.
If there’s uncertainty about exactly where this factor lands, we can address it by viewing how the overall savings would change if such a factor changed. Here’s the curve when fraud costs the bank only 80% rather than 100% of each transaction amount:
It turns out, the peak decreases from $26 million down to $20 million. This is because there’s less money to be saved by fraud detection when fraud itself is less costly. But the position of the peak has moved only a little: from 2.94% to 2.62%. In other words, not much doubt is cast upon where to draw the decision boundary.
Another business assumption we have in place is the cost of wrongly blocking, currently set at $75 – since an inconvenienced cardholder will be more likely to use their card less often (or cancel it entirely). The bank would like to decrease this cost, so it might consider taking measures accordingly. For example, it could consider providing a $10 “apology” gift card each time it realizes its mistake – an expensive endeavor, but one that might turn out to decrease the net cost of wrongly blocking from $75 down to $50. Here’s how that would affect the savings curve:
This has increased the peak estimated savings to $28.6 million, and moves that peak from 2.94% up to 3.47%. Again, we’ve gained valuable insight: This scenario would warrant a meaningful increase in how many transactions are blocked (drawing the decision boundary further to the right), but would only increase profit by $2.6 million. Considering that this guesstimated cost reduction is a pretty optimistic one, is it worth the expense, complexity and uncertainty of even testing this kind of “apology” campaign in the first place? Perhaps not.
Deciding The Whether, Which And How Of Predictive AI Deployment
For a predictive AI project to defy the odds and stand a chance at successful deployment, business-side stakeholders must be empowered to make an informed decision as to whether, which and how: whether the project is ready for deployment, which ML model to deploy and with what decision boundary (percent of cases to be treated versus not treated). They need to see the potential win in terms of business metrics like profit, savings or other KPIs, across a range of deployment options. And they must see how certain business factors that could be subject to change or uncertainty affect this range of options and their estimated value.
We have a name for this kind of interactive visualization: ML valuation. This practice is the main missing ingredient in how predictive AI projects are typically run. ML valuation stands to rectify today’s dismal track record for predictive AI deployment, boosting the value captured by this technology up closer to its true potential.
Given how frequently predictive AI fails to demonstrate a deployed ROI, the adoption of ML valuation is inevitable. In the meantime, it will be a true win for professionals and stakeholders to act early, get out ahead of it and differentiate themselves as a value-focused practitioner of the art.