Skip to content

A Canary in the [Data] Coal Mine

Over the last ten years, dashboards have become a crucial asset in the analytics space. Stakeholders want dashboards built for limitless reasons; analysts and managers rely on them for daily insights; and the dashboard vendors (such Tableau and PowerBI) assert the power, ability, and potential of these dashboards. However, the truth about the capability and usefulness of dashboards is a bit stretched when comparing theory to reality.

Don’t get me wrong. Dashboards can be extremely useful tools in the analytics domain. But one of the downfalls is that there aren’t enough hours in the day for a human to analyze all data that is being placed into a dashboard. Furthermore, the number of dashboards assets seems to be growing exponentially within many organizations, which means that many dashboards can’t possibly be reviewed on a consistent basis.

What is a Dashboard

In simple terms, a dashboard is a visual representation of data. But given the unlimited potential of what data you display and how you display it, dashboards can serve many different purposes.

Some people will build a dashboard that is a scorecard of KPI (key performance indicators) metrics, while others will build charts and graphs, and some may build a dashboard that is really a data-dump of CSV (comma separated values) data. With the first two types of dashboards, we can probably assume that end-users are trying to understand business performance. And with the CSV dashboard, we can probably assume that end-users require access to raw data as a means of circumventing the need for database access and SQL skills.

Where I’d like to focus is on the first two types of dashboards, since they are being used for performance measurement. To better understand these dashboards, it’s important to ask why someone would need to measure performance in the first place. Now, I realize that this question may seem unintelligent, but I have a reason for asking.

We measure performance so that we can understand if something is performing at, above, or below expectations. And we want to understand performance-to-expectations so that we have an opportunity to correct problems or accelerate our successes. But do dashboards really give you these answers?

I’d suggest a qualified answer of, “sort of.”

What do Dashboards Show

Dashboards can provide insights into performance, but only if you look at the dashboard. While viewing a dashboard isn’t really a challenge, the plethora of dashboards in existence can make it challenging to find the right dashboard and keep up-to-date with the metrics that are being provided. Also, even if you manage to located the correct dashboard, your KPIs and data are aggregated, which means that you can’t always see the trees for the forest, so-to-speak. Let’s take an example from a company that I worked for years ago.

I was working at a global company where I was assessing fraud performance and potential risks. The company operated in dozens of countries, had multiple classifications of user types, a handful of different payment methods, and a few other important attributes. To report on our business, we needed a versatile dashboard to display KPIs as well as historical performance so that we could see where trends were moving in the wrong direction. But we also needed flexibility to be able to apply filters to our data so that we could drill into the performance of certain segments to detect potential issues.

This situation is pretty common across many dashboards that are built, and there isn’t anything wrong with building a dashboard for this purpose. But the problem that we have with this type of dashboard is that you can’t see problems until they become extremely large. In our dashboard, there were only five filters, but over 67,000 permutations of those filters. As a result, in each small segment, a massive problem could go undetected because it is a blip on the radar when all of the data is aggregated for the KPIs. But when you’re dealing with billions of dollars of transactions, these little blips can be quite impactful.

Now, we’ve established that we care about seeing these small problems before the turn into massive problems; however, no human can possibly review 67,000 permutations in a reasonable period of time. But a computer can, and we think about solving the problem with an automated solution outside of the dashboard with what I like to call a canary.

Evaluate Performance with Canaries, not Dashboards

To inspect performance, I started building what I referred to as canaries, which were named for the canaries used in coal mining operations. In coal mining, a canary was brought into the mine by the coal workers. If the canary died, coal workers were alerted to a problem with air quality and realized they were probably next in line to perish. While my canaries weren’t utilized for a morbid use case, they were used to let the computer do what it does best, which is to iterate over massive datasets in a short period of time.

Given this concept of canaries, I constructed rules (built in code), which would evaluate the performance of anything that needed to be measured. Those rules included metrics such as the number of new user registrations, dollar volumes, payment delinquencies, and more. The sky was the limit when thinking of what could be measured, and my canaries didn’t stop at the dashboard KPI level.

I even went as far as to build canaries on database tables for anomaly detection, data errors, and ETL problems, because it’s much better to detect issues as far up-stream as possible and avoid the propagation of bad data into other tables, dashboards, and reports.

With these canaries, I could easily surface insights that would indicate potential problems such as, “Our number of invoices sent to users in Poland increased by 22% last month,” which would have gone undetected in a global aggregate, because Poland was an extremely small part of the business. But this identification might be cause for an investigation if this increase was more than one or two standard deviations from the mean. Now, this increase could mean that we were succeeding in growing the business, but it could also mean that we were under attack from fraudulent users.

How to Develop Canaries

It’s been a number of years since I originally started developing these canaries, so I’m almost certain that there are dashboard plug-ins and other tools on the market to help. But again, even if you do find a dashboard-based tool for canaries, you can still benefit from building them outside of the dashboard because not everything that you need to monitor will be built in a dashboard. But if you aren’t building canaries today and your dashboard offers a solution, I’d recommended taking advantage of those capabilities to get you started.

In my environments, I chose to build my canaries using Python, SQL, and a scheduling engine (Airflow, Jenkins, etc.). The SQL statement was designed to extract data from the database for assessment, the Python script was used to call the SQL statement and assess the results against my rules, and Airflow was used to schedule the canary jobs to run on a scheduled interval (or run as child processes to ETLs that were already scheduled).

Inside of Python, I created multiple rules on specific fields that I wanted to be alerted to as potential performance concerns. For example, I wanted to be alerted to the number of new registrations at a global, country, and segment level. This would result in three different rules being created to account for the different levels of aggregation. After having the aggregated data, I had to have a basis for comparison so that I could set a threshold to check if the value was in or out of tolerance.

When you build these threshold values, you can do so with basic calculations or work towards building more advanced calculations. For example, you could hard-code an absolute value such as, “if the value is greater than 100, throw an alert.” Or you can build advanced calculations that are utilizing standard deviations or other tolerance measurements to detect anomalies. Depending on your business, you could even create more complex measures that account for seasonality of the business.

Integration

Once you have your canaries built and scheduled, you have to make the decision on where it is best to surface the information and if you should integrate with other processes. You may choose to surface the information at the command line, pipe the results to a summary dashboard, or use it to halt processes.

In many of my situations, I configured Python to send me summary emails for items that were out of tolerance so that I could investigate the situation(s). In more critical areas where I had many down-stream dependencies, I built these rules into our ETL process to “break” our ETL jobs, which prevented the propagation of “bad data” to other systems.

How to Get Started

Getting started is quite straight-forward. If you’re already writing SQL, you just need to create some new SQL scripts that compare the result set that you are serving to your dashboard, to some threshold value that you configure. Assuming that you are utilizing my recommended framework (found in Building a Better Analytics Organization.) for serving data to your dashboards, this shouldn’t be a significant effort.

However, sometimes this practice can be a bit more complex in nature depending on how your dashboards are designed. If you are embedding custom SQL code inside of your dashboards, you will have some additional work to complete because your canary will require you to utilize the same results that your dashboard utilizes. For this reason and more, I recommend building your analytics environment based off of my best practices, which you can read about in my seven-part series on Building a Better Analytics Organization.

Conclusion

I hope that this article helps you to build a stronger analytics environment and leverage this environment to surface more insights at a speed quicker than a human could perform.

Article Originally Published on Medium