New Technology Is Not An Easy Button For Big Data
By Bill Franks, Sep 06, 2012
It is good to remember in today’s hype-filled big data world that there is no “easy” button for big data. In fact, in many ways, big data is quite difficult to deal with. Many organizations seem to be falling for the fallacy that simply implementing new tools or platforms will “automagically” solve their big data problems. Unfortunately this isn’t the case.
For example, there is a common belief that MapReduce platforms such as Teradata Aster or Hadoop can tame big data in and of themselves. In reality they don’t inherently enable new functionality or analytic logic to be executed. Rather, they allow you to scale certain kinds of functionality and analytic logic in a way that makes the functionality and logic much more powerful and widely applicable.
This is an important distinction – and one I want to explore in detail.
Many organizations seem to be thinking of MapReduce as a magic bullet or “easy” button for handling big data. Just set up a system, and your big data problems are solved, right? Wrong. Once the system is in place, it is still necessary to develop the analytic processes that run against it. There really is no shortcut here. If you want great analytics, you’re going to have to build your processes just like you always have. Organizations that don’t understand this fact will be disappointed when they realize they aren’t instantly getting the value they expected from their investment.
As I said earlier, MapReduce doesn’t inherently enable new functionality. When you hear about MapReduce environments, you will quickly come to a discussion of leveraging languages such as Java or Python. It just so happens that these languages have been around for quite a while. They had strong followings before the concept of MapReduce came into existence. Most users of these languages have never used, and may never use, a MapReduce architecture as part of their work. However, they code away day to day developing processes just like their big data focused counterparts.
What many people don’t take the time to think about is that whatever logic you develop today in Java to run in a MapReduce environment is something you could have written in Java years ago. The exact same code, the exact same output for a given piece of data. This is why I said that MapReduce doesn’t directly cause any new analytic logic to come into existence. Rather, MapReduce provides a highly scalable platform so that logic can be executed at a scale far surpassing what was possible in the past.
This last point is the value that MapReduce brings. Having a terrific facial recognition or text parsing algorithm doesn’t do much good if there is no way to scale the process to a big data environment. MapReduce provides that ability. It lets organizations apply algorithms to a much wider base of problems and a much larger amount of data. It allows logic that wasn’t practical to build into your analytic processes to become practical.
This no different than how parallel database platforms provide value. A Massively Parallel (MPP) database system runs on SQL just like a non-MPP system. An MPP system doesn’t enable new functionality in the absolute sense, but it does provide the ability to scale an SQL process. As a result it enables far more value to be derived and a much wider set of problems to be practically addressed than when using a non-MPP architecture.
In summary, we can expect MapReduce to continue to be a force behind the taming of big data. But, the onus will still be on the organizations that use it to develop and implement the required analytic processes just as they always have had to do in the past. Many analytics that were theoretically possible, but impractical, will no longer be a problem. That will lead to a lot of value. The key is to understand what the architecture will do for you, and to not underestimate the effort required to use it correctly. It will take work to get the benefits. There is no “easy” button for big data.
About the author
Bill Franks is Chief Analytics Officer for Teradata, where he provides insight on trends in the analytics and big data space and helps clients understand how Teradata and its analytic partners can support their efforts. His focus is to translate complex analytics into terms that business users can understand and work with organizations to implement their analytics effectively. His work has spanned many industries for companies ranging from Fortune 100 companies to small non-profits. Franks also helps determine Teradata’s strategies in the areas of analytics and big data.
Franks is the author of the book Taming The Big Data Tidal Wave (John Wiley & Sons, Inc., April, 2012). In the book, he applies his two decades of experience working with clients on large-scale analytics initiatives to outline what it takes to succeed in today’s world of big data and analytics. The book made Tom Peter’s list of 2014 “Must Read” books and also the Top 10 Most Influential Translated Technology Books list from CSDN in China.
Franks’ second book The Analytics Revolution (John Wiley & Sons, Inc., September, 2014) lays out how to move beyond using analytics to find important insights in data (both big and small) and into operationalizing those insights at scale to truly impact a business.
He is a faculty member of the International Institute for Analytics, founded by leading analytics expert Tom Davenport, and an active speaker who has presented at dozens of events in recent years. His blog, Analytics Matters, addresses the transformation required to make analytics a core component of business decisions.
Franks earned a Bachelor’s degree in Applied Statistics from Virginia Tech and a Master’s degree in Applied Statistics from North Carolina State University. More information is available at www.bill-franks.com.