Influence of Probabilities and Statistics on Everything You Do

Kaiser FungWe’re pleased to have a guest post this week from Kaiser Fung, a business statistician and author of the new book, Numbers Rule Your World: The Hidden Influence of Probabilities and Statistics on Everything You Do. He has written a series of posts on his blog recording his thoughts as he makes his way through the bestseller Super Freakonomics. Some of the holes he punched were reported on Boston.com (third section down). He continues the series with this new entry that you are seeing here first.

The rise of predictive analytics

The authors of SuperFreakonomics, Steven Levitt and Stephen Dubner, shone a light on the practice of predictive analytics when they described how one British mystery man (aka “Ian Horsley”) used data collected by a bank on its customers to sniff out suspected terrorists in a chapter titled “Why should suicide bombers buy life insurance?”

Business executives who are developing advanced analytics capabilities should read this chapter with perked interest because the same type of statistical models is a workhouse for a variety of tasks such as credit scoring and targeting of marketing offers.

Using Levitt and Dubner’s story as a port of embarkation, I discuss what managers should know about this important analytical tool while clarifying some parts that were left unexplained in the bestseller.

Predictions are educated guesses

Don’t let anyone tell you otherwise: a prediction is a guess, only that it is an educated guess. Predictive models learn from the past. If there is no history, there can be no prediction. The starting point of Horsley’s model was the identities of about 100 suspected terrorists who held accounts at his bank. Analyzing only these targets is foolhardy: even if all suspects are Muslims, it does not follow that all Muslims are suspects. In order to find features that can differentiate suspected terrorists from regular customers, Horsley examined the bank’s entire database of multiple million customer profiles.

Similarly, if you want to predict and prevent fraudulent commercial transactions, it is not sufficient to notice that 50% of past frauds took place in Miami because not all transactions originating from Miami—and it is a populous city—would be fraudulent. An effective predictive model is able to distinguish the small set of suspicious transactions from the large number of normal ones. The key to success is the judicious selection of positive and negative indicators of the target behavior.

Correlation not causation

Levitt and Dubner considered the list of indicators used by Horsley’s model to identify suspected terrorists:

Positive Indicators Negative Indicators

Has a Muslim name                                         Has a savings account
Is male                                                                 Withdraws money from ATMs on Friday afternoons
Is 26-35 years old                                              Has life insurance
Owns a mobile phone
Is a student
Rents rather than owns
Undisclosed “Variable X”

The presentation in SuperFreakonomics may lead you to visualize this as a check list. You might wonder if the algorithm ever finds female terrorists. Be assured that it can, as predictive models typically assign an importance weight to each indicator, and then compute a weight-informed composite score for every customer. A female terrorist suspect can attain a high score by virtue of those indicators unrelated to gender.

Further, you might object that purchasing a mobile phone does not make one a terrorist. This complaint is akin to those leveled at credit scoring models since they emerged in the 1960s. For instance, customers whose credit reports have been frequently assessed have lower credit scores, all else being equal. Critics charge that a request by a business or employer to review one’s credit report would not cause a fall in one’s creditworthiness.

But predictive algorithms do not in fact make this cause—effect assertion: they just notice a consistent pattern of such customers falling behind on payments at a rate several times above the average. Nearly all predictive models work by discovering correlations between the data and the target behavior but they rarely explain the causes of such relationships.

Batting averages

Like baseball hitters, predictive modelers carry batting averages. Levitt and Dubner illustrated this fact using a hypothetical example:

Assume 500 terrorists are hiding among 50 million adults in the UK. Consider an algorithm with an advertised accuracy rate of 99 percent, by which they mean its false positive rate and false negative rate are both 1 percent. Such an algorithm will miss 5 of the 500 terrorists, and at the same time, falsely label 500,000 people as suspected terrorists.

Baseball fans will immediately recognize that the batting average reflects in part how aggressively the hitter swings at pitches. This crucial observation has been omitted from the chapter. In the example above, the algorithm implicated a total of 500,495 adults as terrorist suspects (when we expect only 500). While this number represents only about 1 percent of the adult population, its sheer size is staggering! This hitter has swung at a lot of bad pitches.

Levitt and Dubner correctly rejects such an algorithm as “not even close to good enough”. They later contrast this poor performance with the “great predictive power” of Horsley’s model, which is said to have identified 30 suspects, five of which are “almost certainly involved in terrorist activities”. What to make of this performance?

On one level, as discussed in the chapter, Horsley’s batting average of 5 out of 30 (0.167), while still low, is much more respectable than the hypothetical example of 495 out of 500,495 (< 0.001). But, again, the authors have omitted a crucial consideration. We observe that this hitter does not swing at many pitches: out of 50 million people, the algorithm targets only 30 suspects. When we believe there to be 500 terrorists at large, rounding up only 30 people ensures that no fewer than 470 will pass through the screening as false negatives. This hitter is piling on strikes.

The evaluation of predictive models requires a three-legged stool: you must look at the false positive rate, the false negative rate, as well as the level of selectivity, which is analogous to how aggressively the hitter swings at pitches. These three legs are linked; changing one necessarily moves the other two. Beware of those who only show you one leg.

Asymmetric costs

Say you are using an algorithm to identify target customers who are most likely to respond to a direct marketing offer. You expect a baseline response rate of about 30 percent. By selecting 20 percent of your customer base, a typical algorithm may yield a false positive rate of 7 percent and a false negative rate of 50 percent. Doubling the selection to 40 percent of your base may raise the false positive rate to 27 percent but lower the false negative rate to 30 percent.

The level of selectivity is a joint decision between the marketing and finance functions, and it determines the expected ROI for the marketing campaign. The cost of a false positive is wasting marketing dollars on customers who reject the offer while the cost of a false negative is losing sales by not reaching the customers who want the offer. In most businesses, these costs are asymmetric.

If you worry more about lost sales, you should send the offer to 40 percent of your customer base, accepting the much higher rate of false positives. If you care more for customer profitability, then you should restrict the number of offers, accepting the higher false negative rate. Predictive models provide information to aid decision-making; they cannot replace sound business judgment.

Notice that these costs are benign when compared to the costs of errors in detecting suspected terrorists, countable in human lives. For this reason, I am highly skeptical that this type of predictive technology can be useful to terrorist detection because it is simply not accurate enough. A key difference between the two applications is that terrorists are exceedingly rare among the adult population while the chance of customers responding to a marketing offer is much higher.

Conclusion

Predictive models is a workhorse of advanced business analytics, frequently used in credit scoring, targeting marketing offers, and other applications. When used properly, they have the potential to dramatically improve ROI by a judicious trade-off between false positive errors (wasted resources) and false negative errors (lost sales). Models provide information for decision-making; managers make decisions.

All predictive models produce educated guesses, based on analyzing historical outcomes. The conjunction of multiple indicators, suitably weighted, is used to identify targets within the population. Algorithms uncover correlations but rarely shed light on cause—effect relationships.

For more discussion, please see Chapters 2 and 4 of my book, Numbers Rule Your World: The Hidden Influence of Probabilities and Statistics on Everything You Do. You will find a few additional notes on my book blog(http://junkcharts.typepad.com/numbersruleyourworld).

Related Posts Plugin for WordPress, Blogger...
  • Colin

    An important point that Levitt and Dubner were trying to make was that “Five out of 30 isn’t perfect–the algorithm still misses many terrorists and still falsely identivies some innocents–but it sure beats 495 out of 500,495.” (SuperFreakonomics, p. 95)

    In other words, catching 5 terrorists is worth the next-to-nothing cost of building a computer model and the cost of surveilling 30 suspects. Yes, there are still, by our hypothetical estimate, 470 terrorists at large and not found by Horsley’s model. But look at the scoreboard: Horsley still caught 5 terrorists by using his computer at his desk in a bank.

    Compare this to the alternative, if Horsley had just been a regular banker and never developed his algorithm. All 475 terrorists would still be at large, vice 470.

    Yes, the model is not completely accurate. But Horsley made a managerial decision, just like you said managers should do. He reduced his false-positive rate to a workable number (5 out of 30) at the cost of overlooking a very large number of false negatives.

    In Horsley’s case, his false-positives were 25 suspects who turned out not to be terrorists, and his false negatives were the ~470 terrorists whom Horsley didn’t identify.

Facebook

Twitter

LinkedId