More than a decade ago when data mining was relatively new (remember the commercial with fashion models talking about mining their data on the runway?), many were advocating mining all of the (mostly transactional) data and had to be educated on the concept of sampling, which is still considered a best practice. While it is certainly a good thing that we can do more and do it faster than we could before on more data, it’s worth a moment to revisit some basics.
Fundamentally, data are measurements. Most of the big data out there were not generated with analysis in mind. Much of it, in fact, is generated to send you a bill. That said, there is certainly residual value in transactional data. Still more of the big data out there is text, also not necessarily produced with the intent of analyzing it. That has residual value, too.
Regardless of source or type of data in your collection of big data, are you measuring what matters? Are you measuring it well—in other words, are your data of sufficient quality for you to make good decisions? And how much better could they be if you had better data?
Of the data you have, what questions do you ask? Based on the answers you get, how confident are you in the decisions you make? What could make you more confident? More data? Better data? Different data altogether? Maybe you would answer that with more analytical skills to produce better answers faster or some combination of these things.
Even with big data, we still need to think about how we can most efficiently and effectively learn from it. Sampling is a key strategy to help you learn faster; so is experimentation, which often involves measuring things you aren’t currently measuring — yes, more data of potentially greater value (which would be a sample). Simulation is another way to learn more efficiently and effectively in many cases, generating still more data (also a sample).
Too often, we settle for what we have because it’s easy; we take what comes to us because we can. Those who learn from data fastest and apply that learning to their advantage are thinking more strategically about what they do with their data (big or small), the data they still need to collect or generate, and how to measure what matters better and faster. Where do you put yourself on the learning continuum — are you measuring what matters? Are you trying new things?
Originally published by International Institute for Analytics