The role of Citizen Data Scientist has been showing rapid growth, though not without some controversy. Many people are concerned that democratizing data science is about giving people capabilities way beyond what they are ready to handle and, therefore, ensuring disasters as a result. While bad outcomes can certainly happen if things aren’t planned and implemented well, it is possible to minimize risk by approaching a citizen data science program with the right mindset.
Is Democratizing Data Science Different From Democratizing BI?
I recently presented on a webinar about democratizing data science when a terrific question came in. It was of the nature, “People generally agree that enabling broader access to BI capabilities is safe, but isn’t enabling broader access to data science capabilities playing with fire?” I pointed out that while people think the broad use of BI and visualization tools is safe today, that wasn’t always the case.
When I started my career, there were no BI tools. To create a report, somebody had to hand write code. The only way to generate SQL was to write it. That took a high degree (at the time) of skill and technical ability. Early SQL generators and BI tools were notorious for how easily they could create bad SQL. It was considered risky to have anyone but an expert use them because you had to read the auto-generated SQL to validate it was correct. The tools enhanced productivity, but still needed expertise to use them safely. The idea of broadly deploying them wasn’t on the radar screen initially.
Today, BI and visualization tools are very mature, have ample security features, and are fed by stable, well-defined data. As a result, it is widely considered “safe” to let people who can’t code SQL use these tools for a range of purposes.
An Analogy: The Evolution Of Cooking
Even a hundred years ago cooking was difficult and took skill. Something as “easy” as a cake required a lot of ingredients to be prepared, mixed in the right proportions, and then carefully baked in a wood or coal oven that had no temperature controls. To bake a tasty cake required real skill.
Starting in the early- to mid-1900’s, cake mixes became commonplace. Today, even a bad cook like me can pick up a cake mix and in a few quick steps mix up the batter. Then, I can put the cake in a modern oven that will ensure it is cooked at the right temperature for the right amount of time. It may not be as good as a scratch cake, but I can do a decent job. It is widely considered “safe” for most anyone to make a cake using a mix because it is hard to mess the cake up.
People accept cake mixes, but most still balk at the idea that a fancy meal can be accomplished by a novice. However, we now have home delivery services that send carefully measured, pre-staged ingredients with detailed instructions on how to prepare a gourmet meal. A “citizen chef” like me may not be able to match a restaurant chef, but I can actually pull off a meal way beyond my base capability level. Combine this with cooking shows and how-to videos on the web and cooking has really been democratized in recent years.
Note that super advanced meals or meals that require special handling are likely not appropriate for just anyone. For example, even with a kit, I wouldn’t want to try to prepare steak tartar or sushi. While my citizen chef skills let me do a lot more than I used to do, I still have limits.
Without a doubt, we still need chefs who know what they are doing for many purposes. However, more people are eating better meals than ever before by using the democratizing meal kits available. Many meal kit users can’t move beyond the kits, but there are some who gain enough proficiency to begin to experiment with their own recipes.
Democratizing Data Science
Let’s return to the concern that citizen data scientists are a huge risk. Not long ago, this would have been true. However, if democratizing BI is considered to be like a cake mix, then democratizing data science can be considered a meal kit. A lot of things had to come together for meal kits to be feasible for democratizing cooking. Similarly, a lot of things have to come together to enable the democratization of data science.
Data science tools today are very robust and mature, we can tightly control the level of freedom we give a citizen data scientist, and we can inspect their work on the back end. It is possible for a citizen data scientist to handle “basic” or repetitive data science processes on their own as long as they stick within the parameters of the recipes they are given.
For this to work, the data engineering and data science teams have to serve as a meal kit company and pre-stage the ingredients of available data, available algorithms, and available output formats and channels for the users. Citizen data scientists will be able to do a lot of powerful things they couldn’t do before, but they’ll still be tightly bound by the ingredients and recipes they have access to.
Done appropriately, democratizing data science can free data scientists to focus on new, harder problems while they offload simpler problems to the citizen data scientists. However, for this to work, data scientists and engineers will have to help stage the environments and data and will also have to be available to provide assistance and to review processes created.
The net result is more people producing more analytics and driving more value for an organization. We moved from cooking being “experts only”, to basic things like cakes being democratized, to today where even fancy meals are fairly democratized. We’ve already taken analytics from “expert only” to simpler things like BI being democratized. It is time to take the next step and to also focus on democratizing data science to the greatest extent possible.
Originally published by the International Institute for Analytics
Bill Franks, Chief Analytics Officer, helps drive IIA's strategy and thought leadership, as well as heading up IIA's advisory services. IIA's advisory services help clients navigate common challenges that analytics organizations face throughout each annual cycle. Bill is also the author of Taming The Big Data Tidal Wave and The Analytics Revolution. His work has spanned clients in a variety of industries for companies ranging in size from Fortune 100 companies to small non-profit organizations. You can learn more at http://www.bill-franks.com.
You can view more posts by Bill here.