Skip to content

You Need Some External Data

Historically, organizations used analytics in situations where they had lots of internal data. The data was the outcome of well-structured, repetitive processes supported by transactional systems. If, for example, you had sold a lot of products at a lot of different prices, you could do pricing analytics on data in your order management system. If you’d hired a lot of people, you could do analytics on your hiring process, using data from your HR system. If you had done a lot of promotions, you could do analytics to understand which ones worked best using data from marketing systems.

However, there are many processes for which you don’t have lots of internal data. What if, for example, your company is planning to introduce a new product or service that is unlike any one you’ve ever brought to market? You are unlikely to have data on how such offerings fare in the marketplace. Many organizations, faced with such uncertainty, simply “go with their gut” and introduce the new product or service without data or analytics.

External Data to the Rescue

It doesn’t have to be that way, however. In circumstances where you have little or no internal data, there is likely to be some external data that will help you make a better decision. A whole host of external providers and curators of external data is available to you. Let me provide a couple of examples, and then I’ll tell you what I think you should do with external data.

I’ve been an advisor for many years to a company called Signals Analytics, which was founded by former Israeli military intelligence officers. They specialize in what might be called “important but infrequent” decisions about processes like innovation and new product development. They find and curate a wide variety of unstructured external information sources—social networks, blogs, online forums, competitor announcements, job listings, and so forth. Consumer products companies pursuing innovation projects, for example, get access to data and analytics to “understand consumer needs,” “uncover emerging trends,” and “assess early concepts.” Signals also helps companies with unstructured decision-making about marketing and strategy—two other areas where internal data are often hard to come by.

Companies are also increasingly interested in understanding demand and supply for their offerings, which often also requires external data. In my book The AI Advantage, I wrote about the air charter firm XOJET, which has over 1,300 private jets available for charter. XOJET once used a simple set of spreadsheet rules derived from internal data to set prices. Now, however, with help from the machine learning technology company Noodle.ai, XOJET creates models based on external data to assess supply and demand and price their charter trips. The external datasets include industry-wide flight activity and aircraft location to establish competitive supply, and data on major demand-driving events, seasonal patterns, and booking curve observations to predict demand. Upon installing the new algorithm, the company’s revenue per occupied flight rose 5%.

How to Manage External Data

External data is unlike internal data in many ways, so it often needs to be managed with different methods. External data isn’t under an organization’s control, so it doesn’t make much sense to try to subdue it with tools like master data management. My view is that those top-down modeling tools don’t work all that well on internal data. But there is an alternative approach.

Catalogs, Not Models

Instead of creating a data model or set of master data management rules, they should create a catalog of their external data—a straightforward listing of what external data exists in the organization, where it resides, who’s responsible for it, and so forth. A catalog effort often reveals that both internal and external data are chaotic—duplicated, going under multiple names, old, expired, etc. It’s not easy to face up to all of the informational chaos that a cataloging effort can reveal. Perhaps needless to say, however, cataloging data is worth the trouble and initial shock at the outcome. A data catalog that lists what data the organization has or has access to, what it’s called, where it’s stored, who’s responsible for it, and other key metadata can easily be the most valuable information offering that an IT group or Chief Data Office can create.

Given that IT organizations have been more preoccupied with modeling the future than describing the present, enterprise vendors haven’t really addressed the catalog tool space to a significant degree. There are several catalog tools for individuals and small businesses, and several vendors of ETL (extract, transform, and load) tools have some cataloging capabilities built into their own tools. Some also tie a catalog to a data governance process, although in my experience, “governance” is right up there with “bureaucracy” as a term that makes many people wince.

At least a few data providers and vendors are actively pursuing catalog work, however. One company, Enigma, has created a catalog for public external data, for example. The company has compiled a set of public databases, and you can simply browse through its catalog (for free if you are an individual) and check out what data you can access and analyze. That’s a great model for what private enterprises should be developing, and I know of some companies (including Tamr, Informatica, Paxata, and Trifacta) that are developing tools to help companies develop their own catalogs for both internal and external data.

External Data People

Beyond an information catalog, organizations in ardent pursuit of external data are likely to need some human experts on the topic. Unfortunately, jobs involving finding, assessing, and wrangling external data are not common in our society. I’ve done some internet searching and found a few “external data analyst,” “external data coordinator,” and “external data specialist” roles, but you may have challenges staffing such positions. Universities don’t train people for them, so qualified candidates are probably going to be autodidacts.

The external data wranglers you do manage to hire will have some different responsibilities than the average data person. Since external data is often sold and bought in the marketplace, they’ll have to be knowledgeable about negotiating deals for it and reading licensing agreements. In a large company, it’s likely that multiple functions or business units will have bought the same data, so tracking down those redundant purchases is another important task. External data also often has quality problems—despite the fact that you’ve paid something for it—so managing data quality is another skill that your external data folks will need to possess.

Turning Free Data into Dollars

Of course, the best external data is that which you don’t have to pay for but can turn into valuable products and services. Governments, as you might imagine, are the leading source of free external data—or at least, as a taxpayer, you’ve already paid for your share of it. Many governments, including the United States, are increasingly listing available public data, often called “open data”—in the U.S., the relevant site is Data.gov. There are also sites that collect and sell public data—PublicData.com is an example.

There are examples of companies that have made quite a successful business out of open data. Climate Corporation, for example, was a startup that used government data to make recommendations to farmers about the best ways to grow crops. They dug up—according to a Harvard Business School case study—30 years of National Weather Service data; 60 years of crop yield data and 14 terabytes of soil type data from the U.S. Department of Agriculture; and satellite images, topography maps, and weather data from 1 million U.S. locations gathered by the U.S. government. They used all this external data, most of which was free, to build a “digital agriculture” business that was acquired by Monsanto (now Bayer) for $1.1 billion.

You may not strike external data gold to quite this degree, but chances are good that external data can help your company achieve its analytical objectives. Whether you have to pay for it or not, external data can tell much about the state of the world outside your organization. And that is an undeniably useful thing to know about.