Getting too fancy by using complex and layered data science approaches can magnify the issues in data instead of controlling them. This blog will explain why and illustrate with a real-world example that I also discussed in The Analytics Revolution to show that the old rule of keeping it simple fully applies to complex areas like data science.
A Surprising, But Recurring, Pattern
One pattern surprised me when I was first confronted with it. Namely, when building analytical processes that must be operationalized to an enterprise scale, simpler solutions can actually perform better than fancy solutions . . . not just from a systems and processing perspective, but also from an analytical perspective! This can be true even when, theoretically, a more sophisticated method should work better. I’m convinced that this is because data always has some uncertainty, is often sparsely populated, and is never fully complete. This can be especially true with some of the low-level data utilized for operationally oriented analytics today.
At some point as analytics applied to a dataset get more sophisticated and layered, there is a risk of magnifying the errors and uncertainties in the data rather than controlling and accounting for them. In addition, it is easy to overfit a model, which means that a model is complex enough to start to incorporate the random variation in the data set modeled rather than real effects. Overfit becomes apparent when a model is applied to a validation sample and performs poorly. Let’s next look at an example.
A Real-World Challenge
A few years ago, my team was implementing sales forecasting for products at an individual store level for a large retailer. The scope spanned hundreds of millions of store / product combinations. Most products sold frequently and in a consistent fashion, which matched the assumptions of the most commonly used algorithms for this type of forecast. This client, however, had many products that didn’t fit standard sales patterns. Specifically, there were products that would not sell for many weeks or months, but that would then sell a large volume in a single order. Think of items like floor tiles that might not be bought often, but when someone chooses it for a kitchen remodel, a large amount of the tile will be purchased in a single order. The client called this type of situation “lumpy demand”.
My team was hired to develop customized forecasting approaches to deal with sales patterns, such as lumpy demand, that were not a fit for the standard models. It was not a surprise that due to the lack of match to the standard assumptions, the client had found that the standard models were underperforming expectations in these situations. Given the scale of the organization, the exceptions still represented millions of store / product combinations. As a result, even handling these “exceptions” was still a big challenge from a scale perspective.
We knew from the start that an offshore consulting firm was also given the same project and that whichever team came back with better results would be selected to continue the work moving forward. The offshore team we were competing with had more people on the project than we did, so I didn’t think we could win with brute force. I knew from past experience with the other consulting firm that they would try a bunch of fancy algorithms to maximize forecast accuracy for the test cases. However, I also knew that there was a good chance that the methods wouldn’t scale as needed. I asked my team to start with the simplest algorithms and add extra complexity and sophistication up until they were unable to be sure that the solution would scale to the level we required.
The Real-World Results
As the project started, I assumed the other team’s absolute forecast accuracy would beat ours, but that the amount of effort required to scale the other team’s solution would be so massive that it wouldn’t be feasible. We would therefore win the follow-on contract because our slightly less accurate forecasts were able to scale as required. I was shocked when our forecasts were actually more accurate!
My belief is that given the incomplete and sparse nature of the data, fancy multi-step algorithms amplified the noise instead of controlling it. My team thought we had given up some analytic power to enable operational deployment. However, it ended up that our simple approach was better and we hadn’t given up anything at all. This was a nice result not only because we won the follow-on business, but because we could proceed with confidence that we were implementing not just a scalable solution, but a scalable solution that performed very solidly without giving up analytical power to achieve that scalability.
Don’t assume fancy is always better. It is very easy for talented data science people to want to show what they know and use cutting edge techniques. There are occasions where this can make sense and it will work best. However, there are also situations where that approach can backfire. As a result, always be sure to test a simple approach as a baseline alongside something more sophisticated. That way, you’ll be sure that you’re only adding complexity where it is actually required to achieve the desired results.
Originally published by the International Institute for Analytics
Bill Franks, Chief Analytics Officer, helps drive IIA's strategy and thought leadership, as well as heading up IIA's advisory services. IIA's advisory services help clients navigate common challenges that analytics organizations face throughout each annual cycle. Bill is also the author of Taming The Big Data Tidal Wave and The Analytics Revolution. His work has spanned clients in a variety of industries for companies ranging in size from Fortune 100 companies to small non-profit organizations. You can learn more at http://www.bill-franks.com.
You can view more posts by Bill here.