To many, big data feels like the wild west; little or no structure or standards and an unclear picture of how to use the data for analysis. Will there ever be a day when big data will become easier to work with?
Yes!!! In fact, you can begin to standardize and simplify your use of big data today, even in cases where you don’t control the input stream. To understand how, it is important to consider the maturity curve for any new data source or analytic process.
The Maturity Curve For Analytic Processes
I spent many years creating new analytic processes. At times, these processes also utilized entirely new data sources. As a new process is being built, you deal with a lot of uncertainty which leads to a lot of adjustments to your original plan as work progresses. After all, if you have the entire problem figured out before you begin then you won’t be doing something totally new! However, as the data is better understood and the analytic process evolves, you can enter the standardization phase where everything stabilizes and converges on a final, standardized process.
A large part of why big data feels like the Wild West is that the work being done with it is often still in that initial, uncertainty-filled learning phase. It is easy to forget that the learning phase is nothing new and that there is nothing wrong with it. However, given the sudden rise of big data, organizations today are often going through the learning phase with more data and analytic processes than in the past. It can be overwhelming and make it seem that big data is somehow “wilder” than it really is.
Move Toward Standardization
As your organization continues down the path to analyzing big data, keep a focus on pushing through the learning phase and getting to the standardization phase. In time, you will develop standard approaches to processing, analyzing, and deploying big data. Soon enough, you’ll wonder why you were ever so worried about it.
As I discussed in Taming The Big Data Tidal Wave, it is necessary to differentiate between the standardization of the use of big data and the standardization of the raw big data feed itself. For example, it will never be possible to entirely standardize the language and grammar used in social media postings or emails. However, it is entirely possible to standardize the text analytics you’ll use to extract information from those postings and emails. Furthermore, it is entirely possible to standardize the way you’ll incorporate that standardized information into your analytic processes.
The first time you acquire a bunch of social media postings, online chat conversations, or emails, you’ll be starting from scratch. You’ll need to investigate what tools to use for text analytics, as well as how to apply them. Then, when doing a sentiment analysis, what words are “good” and what words are “bad”? Does that change based on context? These are questions that will be quite challenging at first, but will become much easier to answer with experience. Eventually, you’ll have your approach standardized.
Once you’ve extracted key information from your text feeds, you will also need experiment with how to leverage that data. What type of reports will best showcase the value from the data? How can you update predictive models to take into account the new information? Once again, it will feel like the Wild West at first. However, as you use the data, you’ll be able to determine how to standardize its use over time. Pretty soon, while the raw input stream is still out of your control, you’ll have complete control over how you ingest and transform the raw data and also how you put the data to work adding value to your analytic processes.
The lesson here is to not be too concerned with the inability to fully control your input streams. Instead, focus on how you leverage those input streams. The way you leverage the streams can be standardized and operationalized over time, bringing order to a formerly “wild” environment.
In the next few years, you can expect that individual organizations, as well as product vendors, will continue to simplify and standardize the process of harnessing big data for analytic processes. Of course, the newest data sources and the newest applications of the data will always be further behind the curve than those that are more established. Don’t let that frustrate you. That is how it has always been and is how it will be in the future as well.
To see a video version of this blog, visit my YouTube channel.