INTRODUCTION
Big-thinking gurus often argue that we have moved from the agricultural economy to the industrial economy to the data economy. It is certainly true that more and more of our economy is coordinated through data and information systems. However, outside of the information and software industry itself, it’s only over the last decade or so that data-based products and services really started to take off. The impact goes beyond previous roles for data and analytics; instead of slowly improving back-office decisions, the data economy promises new business models and revenue sources, entirely new operational and decision processes, and a dramatically accelerated timescale for businesses.
Around the turn of the 21st century, as the Internet became an important consumer and business resource, online firms including Google, Yahoo, eBay, and Amazon began to create new products and services for customers based on data and analytics. Google’s search algorithms, Amazon’s product recommendations, and eBay’s fraud detection approaches are examples of these offerings. A little later in the decade, network-oriented firms including Facebook and LinkedIn offered services and features based on network data and analytics—from “People You May Know” at LinkedIn to “Custom Audiences” for targeted ads at Facebook.
These online and network-oriented firms created the technologies and management approaches we now call “big data,” but they also demonstrated how to participate in the data economy. In those companies, data and analytics are not just an adjunct to the business, but the business itself. As large firms in other industries also adopt big data and integrate it with their previous approaches to data management and analytics, they too can begin to develop products and services based on data and analytics, and enter the data economy. No company—even those in traditional industries—can afford to ignore the opportunity to participate in the data economy. GE is one of the early adopters of this approach among industrial firms. It is placing sensors in “things that spin” such as jet engines, gas turbines, and locomotives, and redesigning service approaches based on the resulting data and analysis. Since half of GE’s revenues in these businesses come from services, the data economy becomes critical to GE’s success.
“Analytics 3.0” is an appropriate name for this next evolution of the analytics environment, since it follows upon two earlier generations of analytics use within organizations. It represents a new set of opportunities presented by the data economy, and a new set of management approaches at the intersection of traditional analytics and big data.
ANALYTICS 1.0—TRADITIONAL ANALYTICS
Analytics, of course, are not a new idea. To be sure, there has been a recent explosion of interest in the topic, but for the first half-century of activity, the way analytics were pursued in most organizations didn’t change that much. This Analytics 1.0 period predominated for half a century from the mid-1950s (when UPS initiated the first corporate analytics group in the U.S.) to the mid-2000s, when online firms began innovating with data. Of course, firms in traditional offline industries continued with Analytics 1.0 approaches for a longer period, and some organizations still employ them today. Analytics 1.0 was characterized by the following attributes:
Data sources were relatively small and structured, and came from internal systems;
The great majority of analytical activity was descriptive analytics, or reporting;
Creating analytical models was a “batch” process often requiring several months;
Quantitative analysts were segregated from business people and decisions in “back rooms”;
Very few organizations “competed on analytics”—for most, analytics were marginal to their strategy.
From a technology perspective, this was the era of the enterprise data warehouse and the data mart. Data was small enough in volume to be segregated in separate locations for analysis. This approach was successful, and many enterprise data warehouses became uncomfortably large because of the number of data sets contained in them. However, preparing an individual data set for inclusion in a warehouse was difficult, requiring a complex ETL (extract, transform, and load) process. For data analysis, most organizations used proprietary BI and analytics “packages” that had a number of functions from which to select. More than 90% of the analysis activity involved descriptive analytics, or some form of reporting.
The Analytics 1.0 ethos was internal, painstaking, backward-looking, and slow. Data was drawn primarily from internal transaction systems, and addressed well-understood domains like customer and product information. Reporting processes only focused on the past, with no explanation or prediction. Any statistical analyses often required weeks or months. Relationships between analysts and decision-makers were often distant, meaning that analytical results often didn’t meet executives’ requirements, and decisions were made on experience and intuition. Analysts spent much of their time preparing data for analysis, and relatively little time on the quantitative analysis itself.
ANALYTICS 2.0—BIG DATA
Starting in the mid-2000s, the world began to take notice of big data (though the term only came into vogue around 2010), and this period marked the beginning of the Analytics 2.0 era. The period began with the exploitation of online data in Internet-based and social network firms, both of which involved massive amounts of fast-moving data. Big data and analytics in those firms not only informed internal decisions in those organizations, but also formed the basis for customer-facing products, services, and features.
These pioneering Internet and social network companies were built around big data from the beginning. They didn’t have to reconcile or integrate big data with more traditional sources of data and the analytics performed upon them, because for the most part they didn’t have those traditional forms. They didn’t have to merge big data technologies with their traditional IT infrastructures because those infrastructures didn’t exist. Big data could stand alone, big data analytics could be the only focus of analytics, and big data technology architectures could be the only architecture.
Big data analytics as a standalone entity in Analytics 2.0 were quite different from the 1.0 era in many ways. As the big data term suggests, the data itself was either very large, relatively unstructured, fast-moving—or possessing all of these attributes. Data was often externally-sourced, coming from the Internet, the human genome, sensors of various types, or voice and video.
A new set of technologies began to be employed at this time. The fast flow of data meant that it had to be stored and processed rapidly, often with massively parallel servers running Hadoop for fast batch data processing. To deal with relatively unstructured data, companies had to employ “NoSQL” databases. Much of the data was stored and analyzed in public or private cloud computing environments. Other new technologies employed during this period included “in-memory” analytics and “in-database” analytics. Machine learning methods were employed to rapidly generate models that fit the fast-moving data. As a result of these new technologies, the overall speed of analysis was much faster—often reducing analysis cycle times from days and hours to minutes and seconds. Visual analytics—a form of descriptive analytics—often crowded out predictive and prescriptive techniques. It is important to note that big data was often not accompanied by “big analytics,” perhaps because of the challenges of data management.
The ethos of Analytics 2.0 was quite different from 1.0. The new generation of quantitative analysts was called “data scientists,” with both computational and analytical skills. Many data scientists were not content with working in the back room; they wanted to work on new product offerings and to help shape the business. There was a high degree of impatience; one big data startup CEO told me, “We tried agile [development methods], but it was too slow.” Since the big data industry was viewed as a “land grab,” companies wanted to acquire customers and capabilities very quickly.
ANALYTICS 3.0—FAST IMPACT FOR THE DATA ECONOMY
Big data, of course, is still a popular concept, and one might think that we’re still in the 2.0 era. However, there is considerable evidence that large organizations are entering the Analytics 3.0 era. It’s an environment that combines the best of 1.0 and 2.0—a blend of big data and traditional analytics that yields insights and offerings with speed and impact.
Although it’s early days, the traits of Analytics 3.0 are already becoming apparent. The most important trait is that not only online firms, but virtually any type of firm in any industry, can participate in the data economy. Banks, industrial manufacturers, health care providers, retailers—any company in any industry that is willing to exploit the possibilities—can all develop data-based offerings for customers, as well as supporting internal decisions with big data.
There is considerable evidence that when big data is employed by large organizations, it is not viewed as a separate resource from traditional data and analytics, but merged together with them. According to Murli Buluswar, the Chief Science Officer at insurance giant AIG:
“From the beginning of our Science function at AIG, our focus was on both traditional analytics and big data. We make use of structured and unstructured data, open source and traditional analytics tools. We’re working on traditional insurance analytics issues like pricing optimization, and some exotic big data problems in collaboration with MIT. It was and will continue to be an integrated approach.”
In addition to this integration of the 1.0 and 2.0 environments, other attributes of Analytics 3.0 organizations are described below:
MULTIPLE DATA TYPES, OFTEN COMBINED
Organizations are combining large and small volumes of data, internal and external sources, and structured and unstructured formats to yield new insights in predictive and prescriptive models. Often the increased number of data sources is viewed as incremental, rather than a revolutionary advance in capability. At Schneider National, for example, a large trucking firm, the company is increasingly adding data from new sensors—monitoring fuel levels, container location and capacity, driver behavior, and other key indicators—to its logistical optimization algorithms. The goal is to improve—slowly and steadily—the efficiency of the company’s route network, to lower the cost of fuel, and to decrease the risk of accidents.
A NEW SET OF DATA MANAGEMENT OPTIONS
In the 1.0 era, firms employed data warehouses with copies of operational data as the basis for analysis. In the 2.0 era, the focus was on Hadoop clusters and NoSQL databases. Now, however, there are a variety of options from which to choose in addition to these earlier tools: Database and big data appliances, SQL-to-Hadoop environments (sometimes called “Hadoop 2.0”), vertical and graph databases, etc. Enterprise data warehouses are still very much in evidence as well. The complexity and number of choices that IT architects have to make about data management have expanded considerably, and almost every organization will end up with a hybrid data environment. The old formats haven’t gone away, but new processes need to be developed by which data and the focal point for analysis will move across staging, evaluation, exploration, and production applications.
TECHNOLOGIES AND METHODS ARE MUCH FASTER
Big data technologies from the Analytics 2.0 period are considerably faster than previous generations of technology for data management and analysis. To complement the faster technologies, new “agile” analytical methods and machine learning techniques are being employed that produce insights at a much faster rate. Like agile system development, these methods involve frequent delivery of partial outputs to project stakeholders; as with the best data scientists’ work, there is an ongoing sense of urgency. The challenge in the 3.0 era is to adapt operational and decision processes to take advantage of what the new technologies and methods can bring forth.
INTEGRATED AND EMBEDDED ANALYTICS
Consistent with the increased speed of analytics and data processing, models in Analytics 3.0 are often being embedded into operational and decision processes, dramatically increasing their speed and impact. Firms like Procter & Gamble are embedding analytics in day-to-day management decision-making, with its “Business Sphere” management decision rooms and over 50,000 desktops equipped with “Decision Cockpits.”
Other firms are embedding analytics into fully automated systems based on scoring algorithms or analytics-based rules. Some are building analytics into consumer-oriented products and features. In any case, embedding the analytics into systems and processes not only means greater speed, but also makes it more difficult for decision-makers to avoid using analytics—usually a good thing.
HYBRID TECHNOLOGY ENVIRONMENTS
It’s clear that the Analytics 3.0 environment involves new technology architectures, but it’s a hybrid of well-understood and emerging tools. The existing technology environment for large organizations is not being disbanded; some firms still make effective use of relational databases on IBM mainframes. However, there is a greater use of big data technologies like Hadoop on commodity server clusters; cloud technologies (private and public), and open-source software. The most notable changes in the 3.0 environment are attempts to eliminate the ETL (extract, transform, and load) step before data can be assessed and analyzed. This objective is being addressed through real-time messaging and computation tools such as Apache Kafka and Storm.
A related approach being explored is a new discovery platform layer of technology for data exploration. Enterprise data warehouses were initially intended for exploration and analysis, but they have become production data repositories for many organizations, and getting data into them requires expensive and time-consuming ETL work. Hence the need for a new layer that facilitates data discovery.
DATA SCIENCE/ANALYTICS/IT TEAMS
Data scientists often are able to run the whole show—or at least have a lot of independence—in online firms and big data startups. In more conventional large firms, however, they have to collaborate with a variety of other players. In many cases the “data scientists” in large firms may be conventional quantitative analysts who are forced to spend a bit more time than they like on data management activities (which is hardly a new phenomenon). And the data hackers who excel at extracting and structuring data are working with conventional quantitative analysts who excel at modeling it. This collaboration is necessary to ensure that big data is matched by big analytics in the 3.0 era.
Both groups have to work with IT, which supplies the big data and analytical infrastructure, provisions the “sandboxes” in which they can explore data, and who turns exploratory analyses into production capabilities. Together the combined teams are doing whatever is necessary to get the analytical job done, and there is often a lot of overlap across roles.
CHIEF ANALYTICS OFFICERS
When analytics and data become this important, they need senior management oversight. And it wouldn’t make sense for companies to have multiple leaders for different types of data, so they are beginning to create “Chief Analytics Officer” roles or equivalent titles to oversee the building of analytical capabilities. I have already mentioned the Chief Science Officer role at AIG. Other organizations with C-level analytics roles include University of Pittsburgh Medical Center, the Obama reelection campaign, and large banks such as Wells Fargo and Bank of America. We will undoubtedly see more such roles in the near future.
THE RISE OF PRESCRIPTIVE ANALYTICS
There have always been three types of analytics: descriptive, which report on the past; predictive, which use models based on past data to predict the future; and prescriptive, which use models to specify optimal behaviors and actions. Analytics 3.0 includes all types, but there is an increased emphasis on prescriptive analytics. These models involve large-scale testing and optimization. They are a means of embedding analytics into key processes and employee behaviors. They provide a high level of operational benefits for organizations, but they place a premium on high-quality planning and execution. For example, UPS is using data from digital maps and telematics devices in its trucks to change the way it routes its trucks. If the system (called ORION, for On-Road Integrated Optimization and Navigation) gives incorrect routing information to UPS’s 55,000 drivers, it won’t be used for long.
SUMMARY
Even though it hasn’t been long since the advent of big data, these attributes add up to a new era. It is clear from our research that large organizations across industries are joining the data economy. They are not keeping traditional analytics and big data separate, but are combining them to form a new synthesis. Some aspects of Analytics 3.0 will no doubt continue to emerge, but organizations of all sizes and in all industries need to begin transitioning now to the new model. It means changes in skills, leadership, organizational structures, technologies, and architectures. Together these new approaches constitute perhaps the most sweeping change in what we do to get value from data since the 1980s.
It’s important to remember that the primary value from big data comes not from the data in its raw form, but from the processing and analysis of it and the insights, decisions, products, and services that emerge from analysis. The sweeping changes in big data technologies and management approaches need to be accompanied by similarly dramatic shifts in how data supports decisions and product/service innovation processes. These shifts have only begun to emerge, and will be the most difficult work of the Analytics 3.0 era. However, there is little doubt that analytics can transform organizations, and the firms that lead the 3.0 charge will seize the most value.