We’re pleased to have a guest post this week from Kaiser Fung, a business statistician and author of the new book, Numbers Rule Your World: The Hidden Influence of Probabilities and Statistics on Everything You Do. He has written a series of posts on his blog recording his thoughts as he makes his way through the bestseller Super Freakonomics. Some of the holes he punched were reported on Boston.com (third section down). He continues the series with this new entry that you are seeing here first.
The rise of predictive analytics
The authors of SuperFreakonomics, Steven Levitt and Stephen Dubner, shone a light on the practice of predictive analytics when they described how one British mystery man (aka “Ian Horsley”) used data collected by a bank on its customers to sniff out suspected terrorists in a chapter titled “Why should suicide bombers buy life insurance?”
Business executives who are developing advanced analytics capabilities should read this chapter with perked interest because the same type of statistical models is a workhouse for a variety of tasks such as credit scoring and targeting of marketing offers.
Using Levitt and Dubner’s story as a port of embarkation, I discuss what managers should know about this important analytical tool while clarifying some parts that were left unexplained in the bestseller.
Predictions are educated guesses
Don’t let anyone tell you otherwise: a prediction is a guess, only that it is an educated guess. Predictive models learn from the past. If there is no history, there can be no prediction. The starting point of Horsley’s model was the identities of about 100 suspected terrorists who held accounts at his bank. Analyzing only these targets is foolhardy: even if all suspects are Muslims, it does not follow that all Muslims are suspects. In order to find features that can differentiate suspected terrorists from regular customers, Horsley examined the bank’s entire database of multiple million customer profiles.
Similarly, if you want to predict and prevent fraudulent commercial transactions, it is not sufficient to notice that 50% of past frauds took place in Miami because not all transactions originating from Miami—and it is a populous city—would be fraudulent. An effective predictive model is able to distinguish the small set of suspicious transactions from the large number of normal ones. The key to success is the judicious selection of positive and negative indicators of the target behavior.
Correlation not causation
Levitt and Dubner considered the list of indicators used by Horsley’s model to identify suspected terrorists:
Positive Indicators Negative Indicators
Has a Muslim name Has a savings account
Is male Withdraws money from ATMs on Friday afternoons
Is 26-35 years old Has life insurance
Owns a mobile phone
Is a student
Rents rather than owns
Undisclosed “Variable X”
The presentation in SuperFreakonomics may lead you to visualize this as a check list. You might wonder if the algorithm ever finds female terrorists. Be assured that it can, as predictive models typically assign an importance weight to each indicator, and then compute a weight-informed composite score for every customer. A female terrorist suspect can attain a high score by virtue of those indicators unrelated to gender.
Further, you might object that purchasing a mobile phone does not make one a terrorist. This complaint is akin to those leveled at credit scoring models since they emerged in the 1960s. For instance, customers whose credit reports have been frequently assessed have lower credit scores, all else being equal. Critics charge that a request by a business or employer to review one’s credit report would not cause a fall in one’s creditworthiness.
But predictive algorithms do not in fact make this cause—effect assertion: they just notice a consistent pattern of such customers falling behind on payments at a rate several times above the average. Nearly all predictive models work by discovering correlations between the data and the target behavior but they rarely explain the causes of such relationships.
Like baseball hitters, predictive modelers carry batting averages. Levitt and Dubner illustrated this fact using a hypothetical example:
Assume 500 terrorists are hiding among 50 million adults in the UK. Consider an algorithm with an advertised accuracy rate of 99 percent, by which they mean its false positive rate and false negative rate are both 1 percent. Such an algorithm will miss 5 of the 500 terrorists, and at the same time, falsely label 500,000 people as suspected terrorists.
Baseball fans will immediately recognize that the batting average reflects in part how aggressively the hitter swings at pitches. This crucial observation has been omitted from the chapter. In the example above, the algorithm implicated a total of 500,495 adults as terrorist suspects (when we expect only 500). While this number represents only about 1 percent of the adult population, its sheer size is staggering! This hitter has swung at a lot of bad pitches.
Levitt and Dubner correctly rejects such an algorithm as “not even close to good enough”. They later contrast this poor performance with the “great predictive power” of Horsley’s model, which is said to have identified 30 suspects, five of which are “almost certainly involved in terrorist activities”. What to make of this performance?
On one level, as discussed in the chapter, Horsley’s batting average of 5 out of 30 (0.167), while still low, is much more respectable than the hypothetical example of 495 out of 500,495 (< 0.001). But, again, the authors have omitted a crucial consideration. We observe that this hitter does not swing at many pitches: out of 50 million people, the algorithm targets only 30 suspects. When we believe there to be 500 terrorists at large, rounding up only 30 people ensures that no fewer than 470 will pass through the screening as false negatives. This hitter is piling on strikes.
The evaluation of predictive models requires a three-legged stool: you must look at the false positive rate, the false negative rate, as well as the level of selectivity, which is analogous to how aggressively the hitter swings at pitches. These three legs are linked; changing one necessarily moves the other two. Beware of those who only show you one leg.
Say you are using an algorithm to identify target customers who are most likely to respond to a direct marketing offer. You expect a baseline response rate of about 30 percent. By selecting 20 percent of your customer base, a typical algorithm may yield a false positive rate of 7 percent and a false negative rate of 50 percent. Doubling the selection to 40 percent of your base may raise the false positive rate to 27 percent but lower the false negative rate to 30 percent.
The level of selectivity is a joint decision between the marketing and finance functions, and it determines the expected ROI for the marketing campaign. The cost of a false positive is wasting marketing dollars on customers who reject the offer while the cost of a false negative is losing sales by not reaching the customers who want the offer. In most businesses, these costs are asymmetric.
If you worry more about lost sales, you should send the offer to 40 percent of your customer base, accepting the much higher rate of false positives. If you care more for customer profitability, then you should restrict the number of offers, accepting the higher false negative rate. Predictive models provide information to aid decision-making; they cannot replace sound business judgment.
Notice that these costs are benign when compared to the costs of errors in detecting suspected terrorists, countable in human lives. For this reason, I am highly skeptical that this type of predictive technology can be useful to terrorist detection because it is simply not accurate enough. A key difference between the two applications is that terrorists are exceedingly rare among the adult population while the chance of customers responding to a marketing offer is much higher.
Predictive models is a workhorse of advanced business analytics, frequently used in credit scoring, targeting marketing offers, and other applications. When used properly, they have the potential to dramatically improve ROI by a judicious trade-off between false positive errors (wasted resources) and false negative errors (lost sales). Models provide information for decision-making; managers make decisions.
All predictive models produce educated guesses, based on analyzing historical outcomes. The conjunction of multiple indicators, suitably weighted, is used to identify targets within the population. Algorithms uncover correlations but rarely shed light on cause—effect relationships.
For more discussion, please see Chapters 2 and 4 of my book, Numbers Rule Your World: The Hidden Influence of Probabilities and Statistics on Everything You Do. You will find a few additional notes on my book blog(http://junkcharts.typepad.com/numbersruleyourworld).