Skip to content

The 4Ds in Data Storytelling: Making Art Out of Science

Data is everywhere. Anyone with some level of training, and nowadays with a bit of help from AI, can generate some scientific insights out of data and build fancy data visualizations. However, interpreting and selling the meanings behind the numbers and graphs is an art. When ChatGPT and generative AI come to the front stage, many concerns regarding being replaced by AI emerge. With clear instructions, AI can help us generate code, and visualization, and even build well-performing models that contain useful insights, but they struggle with telling compelling credible and memorable stories based on those insights. They can do science, but art is the unique skill humans possess, at least for now.

Depending on the audience, these data stories are essential to establish trust in collaborations or influence business decisions. Data scientists’ work without storytelling is merely digital fortune-telling. In this article, I want to share a 4D framework to help data scientists crack the data storytelling process and deliver data insights with higher efficiency and impact, and a bonus section with practical suggestions in the end.

Winning the Room Webinar

Based on IIA Chief Analytics Officer Bill Franks’ new book Winning The Room, this webinar provides concrete strategies and practical tips to clarify, simplify, and refine data-driven presentations in a way that maximizes comprehensibility without sacrificing accuracy. It will also utilize instructive and memorable visuals that illustrate how you can drive your points home and help your audience understand and retain your message.

Define

The first step of storytelling is to define a story. What exactly is a story? While fiction writers may give a more comprehensive answer, essentially, a story is a narrative that conveys a series of events with background settings, characters, and plots. To make a story interesting, it has to be engaging, intriguing, entertaining, or informative to the readers. The data storytelling process starts with defining a story that will keep your audience interested by establishing relatable settings, compelling characters, and intriguing plots.

The setting:

When you find interesting results from data, you need to set a relatable background to your receivers before delivering the findings. The setting of the storytelling establishes the context for the following communication. We first need to know what is the medium of this communication. Is it a presentation or a written report? Is it a deep-dive technical session or a high-level result review? This will direct your stories in different directions.

Then, we need to define the what and why in the story. What’s the context? Everyone involved in the communication needs to be on the same page. What’s the issue? Why are we having this communication?

Additionally, what are the action points of this communication? Assuming everyone agrees with your story, what’s the next step?

Setting up the background before laying out your story is crucial. It prepares you to communicate more structurally and efficiently, helping the audience resonate with your story by aligning them with you on the same page.

The characters:

Characters are the souls of stories. A good data story should have both you and the audience embedded in the story. There are two aspects: Who are the audiences, and what’s their relationship with you? Unfortunately, there is no panacea for crafting a story that will let all types of audiences resonate with you and your story. Different audiences have varying pain points, which makes it necessary to tailor your message accordingly. Ask yourself these questions as you prepare the story:

  • Who are the audiences? Are they technically strong or business-driven? Do they have a busy schedule? Do they only care about the high-level overview, or are they detail-driven? How would they benefit from this communication?
  • What’s their relationship with you? Are they someone new with whom you need to build trust and credibility? Or have you already established a relationship with them? Do you need their collaboration, or are you giving them results? Will you need them to make a decision or take action?

These are the questions that will help you navigate through the preparation. Note the likely possibility of making incorrect assumptions about your audience. For example, you may think they don’t care about details, but actually, they are super interested in your detailed thought process. Thus, it is vital to have prior communications to align everyone’s expectations. Or you could also have backup slides or evidence in case follow-up questions take a sub-track.

The plot:

The plot is the spine of a story. Consider the structure of a traditional three-act story, a widely utilized framework for organizing and presenting narratives in literature, theater, and film. This structure separates a story into three distinct acts, each with its own unique purpose and sequence of events:

Figure 1. A widely utilized framework, the structure of the three-act story can inspire your data stories.

We could use the same framework in telling a data story. In act one of our data story, we can briefly introduce the context and the background to ensure everyone is on the same page. Then, in act two, we delve deeper into the data and findings, introducing challenges that your audiences care about or try to solve. Here is where you move to the focus of the communication. It could be identifying a surprising trend or an important finding. Right away, you represent a breakthrough that tackles the challenges, perhaps discovering a hidden pattern that reshapes everyone’s perspective. Act three is all about resolution, where we present solutions and action points. At the end, take a moment to consider the wider impact of our data story, leaving our audience with memorable takeaways or required actions. The logic and flow behind the communication will define how the audiences receive and react to the messages. A memorable story plots ups and downs and audiences resonate with these stories because they help them tackle a pain point they face.

Display

Now that we have a structure in mind, we need to figure out the flesh building on top of the bones. A data story contains both the narratives and visual support, like slides or graphs. In this section, I will mainly focus on how to choose different data visualizations that match your data story and which tools to use that will help you better convey the messages.

Different plots:

There are so many choices in the visualization type to make a point. Depending on the scenarios, we can choose among line plots, bar plots, scatter plots, pie plots, tables, or just texts. Here are some common use cases in practice:

Line Plot: Line plots are helpful in displaying how one or more variables change over time, making them an excellent choice for visualizing trends and patterns in data. A line plot can be used to display a city's temperature changes over a year. This allows viewers to identify any seasonal patterns or temperature trends over time easily.

Pie Plot: Pie charts are a helpful tool for displaying the composition or proportions of a dataset or variable. They are particularly effective when you want to emphasize the relative size of different components. For instance, a pie chart is commonly used to depict the distribution of a household's monthly expenses, including items like rent, utilities, groceries, and entertainment. This provides a clear visual representation of where the money is being allocated.

Bar Plot: Bar plots are ideal for comparing values among categories. A pie plot can also show the comparison, but we can add a time horizon to the bar plot to extend the comparison over time. For example, we can use a bar plot to display the sales across different products for the last five years. In addition, bar plots are widely used as histograms that show data distribution.

Scatter Plot: Scatter plots are helpful in analyzing the relationship and correlation between two continuous variables. It can help identify patterns, clusters, or outliers in data. For instance, if you want to determine if there is a correlation between the changes in sales and price for a particular product over time, a scatter plot can visualize the relationship between the two variables.

Tables: Tables are a valuable tool for presenting detailed data. They are particularly effective when viewers need to access specific data points. It's important to note that tables typically display one data dimension, either cross-sectional or time series. Cross-sectional tables show value comparisons across subjects, while time series tables show one subject's value comparison over time. A panel table that includes both subject and time horizon may be too detailed to present verbally but is often used in writing as strong supporting evidence or for more in-depth analysis. While tables can be used to show time series comparisons, line graphs are typically more effective for this purpose. Thus, we use tables more on cross-sectional data.

Text: It’s not common to treat text as a visualization, but it can be the most effective one to deliver the message. It can be a sentence with key statistics highlighted. A text message like the one below immediately gets the audience's attention.

Figure 2. Simple, well-designed text can be the most effective way to deliver your message.

Text can be added to graphs for context, annotations, or explanations. This helps present messages better.

Different visualization tools:

There are a lot of visualization tools we can use to generate the ideal graph. We have professional tools such as Tableau and PowerBI, popular Python packages like Plotly, and user-friendly Excel. No matter which tool you use, the primary purpose is not to show how advanced code you can write or fancy graphs you can generate. Instead, the focus is on getting the message delivered correctly and efficiently. If you are more proficient with Excel, then don’t waste a lot of time figuring out Python syntax.

Another thing to consider is whether you need to share the visualization and how you should share it. Is it a static graph or screenshot, or do you need to allow interactions with the receiver? In addition, is this graph a one-time showcase, or do you need to reproduce this graph every so often when the data is updated? If you need to reproduce the graph, it is helpful to set up a template or write a modular code to save time in the future.

Operationalizing Data Storytelling Webinar

With the proliferation and sophistication of self-service tools, it is likely your organization uses visually stunning dashboards to track and consume data on a variety of business functions. But to realize the full value of this data, analytics practitioners need to think like product managers and reimagine these dashboards as customer-centric, Analytic Stories ®.

Declutter

Less is more unless you are writing a research paper or a documentary where you need tons of supporting evidence for almost every sentence you claim. Once you have the storyline and supporting visualizations, the next step is to consider what to do to remove unnecessary details so the audience can follow your story more easily without being distracted.

Exploratory vs explanatory:

Although it may take several weeks to uncover insights or create models, displaying the entire thought process is unnecessary simply because considerable effort was put into it, unless specifically requested by the audience. Focus on what your audience cares about the most. The business-oriented audience cares about how your findings benefit their business KPIs, and the technical audience cares about why you approach this problem this way and whether your model performs well. Depending on the time and format, tailor the length of your story based on your audience's needs.

Remove redundancy:

For a specific slide or a graph, consider removing the redundancies. Think about what is the main message and what are the unnecessary details. When you present less information, it is easier for your audience to grasp the primary information. Dr. Edward Tufte has developed a formula that quantifies the redundancy in a graph:

Figure 3. Every data ink used should support the story you tell. Ruthlessly remove chart junk.

The ideal data-ink ratio should be 1, meaning every ink used supports the story you tell. Otherwise, you need to remove the so-called chart junk. Common chart junk include:

  • Gridlines, you can annotate the selected data points;
  • Redundant axes, especially if data points are being labeled;
  • Unneeded notations or markers in the chart area;
  • Images or icons that distract from the data.

Here is an example of before and after decluttering:

Figure 4. Before and after example of decluttering

Note that decluttering is not the same as removing repetition. It is never too much to stress the main message multiple times in different sections to leave the audience with a deeper impression. Also, remember your audience might not be as familiar with data as you, so you might need to take more effects to build and emphasize the context by repetition.

Direct

Although where your audience’s attention goes is ultimately out of your control, there are good practices that help grasp audience attention, which helps you convey the messages correctly and efficiently. With this, you are more likely to influence a business decision, get future support, establish trust and credibility, etc. In general, you can remove distractions, highlight the main message, and pace and structure the delivery to help direct your audience’s attention. We have discussed removing distractions through decluttering in the last section. Let’s focus on the latter two in this section.

Highlight the main message:

When constructing our data story, we have our main message and supporting evidence or comparison that help make better points. In this case, we need to highlight the main message so that our audience will not get lost in the information sea. When deciding what is the main message, start from what you have discovered, then ask yourself what would interest your audience in your findings and what would you want your audience to do with this information. Focus on the impact by eliminating the points that your audience may already agree with. Afterward, we can direct our audience’s attention through pre-attentive attributes. These attributes include:

  • Size. Increase the size of the main information or key statistics;
  • Color. Note not to include too many colors on one page or slide. Stay with the same color palette. Sometimes, adjusting transparency will work. We usually don’t need a rainbow;
  • Highlights by using different fonts, make the text bold, italic, or add underscore;
  • Add special symbols like arrows or text annotation to help explain;
  • If you are presenting with PowerPoint, use animation to show your story piece by piece;
  • Depending on the scenario, we can add interactive plots to show different information based on feedback and comments.

Pace and structure the delivery:

You have figured out what to present to your audience. Now it’s time to figure out a structure to deliver the message with a logical order that is well-accepted by the audience. We need to consider both a horizontal flow and a vertical flow.

The three-act structure we mentioned in the “Define” section helps us build a logically horizontal flow to our story slide by slide. The common narrative flows are:

  • Chronologically: Present events by time and connect the impacts;
  • Big to small: Present the high-level big picture first, then break it down into details with more supporting evidence;
  • Conceptually: Present with logical order like IF > THEN > So;
  • By groups: Present the whole population first, then go to each group of interest.

Generally, no matter which horizontal flow you end up following, we should always have a “summary > detail > summary” format for the whole story. It is beneficial to give the audience an outline at the beginning, connect with the audience’s pain points during the process, and end with action points. Taking action is where the impacts of data storytelling come from.

It is also essential to maintain a reasonable vertical flow in our story. Vertical flow in a slide is achieved when the headline and data visual accurately highlight the same point with different detail levels. In business presentations, it's important to ensure that each slide conveys a clear message, with data and visuals that are supportive to the slide title and displayed clearly for the insights.

Bonus Section

Now you have learned a practical framework to structure the story, what are some good practices during the communication that can help you better deliver the message? Here are some useful suggestions:

Be confident:

You might not be as experienced or as technical as your audience, but remember, no one knows more about your work than yourself. You are your data story’s only expert.

Ask for feedback:

Ask for feedback before, during, and after the communication. You can ask what your audience expects ahead of time. Then, practice with others to ask whether they get the messages you expect to deliver. During the presentation, have certain break points to ask whether the audience has any questions and whether they are following all alright. After the presentation, follow up with the action points, questions unanswered during presentation, and ask for any feedback your audience can give for future communications.

Observe and practice:

Observe how others convey messages in different scenarios. Observe an experienced coworker effectively present technical findings, a manager persuading stakeholders to extend project timelines, or a product owner cherry-picking product features during launch day. Don’t limit yourself to your work environment. It can come from everyday life, for instance, when you are approached by a sales agent, or when you’re attending a conference.

Most importantly, don’t forget to summarize the learnings and put them into practice. Have rehearsals with yourself or with others to know how long it will take to deliver the whole story and whether you have made yourself clear through this story.

Practice giving a technical presentation to non-technical audiences:

It is very common for data scientists to present black-box model predictions to non-technical audiences. Figure out what they need before crafting your story. Balance the technical evidence with exciting results. To explain a necessary technical term to a non-technical audience, I find it beneficial to stay at a high level, give relatable examples, and always connect with the results. Don’t forget to check whether your audience understands before moving forward.

Watch the time:

It is important to respect your audience's time by limiting your data story to the agreed time. To know precisely how much information you need to include, rehearse beforehand. However, unexpected things always happen during the presentation, either due to a derail in the topic or an ask for further details. In this case, beyond leaving buffer times at the end, you also need to stay alert to the time when audiences ask for too many details or derail the discussion. You can say things like, “We can take this offline for time's sake,” or “Let’s move to the next topic since we only have X minutes left.” You can also ask a timekeeper to help you track the time while focusing on the content. You don’t want ever to have to cut the story short because you run out of time.

In closing, I’d like to reflect on book I recently read called The Psychology of Money by Morgan Housel. In one chapter, Housel said: “But stories are, by far, the most powerful force in the economy. They are the fuel that can let the tangible parts of the economy work, or the brake that holds our capabilities back.”

Consider the influence of storytelling: the stock market's volatility and the economy's expectations can shape future economic realities through a self-fulfilling prophecy. When everyone believes in a story, its power is immense. Data make us credible, but stories make us memorable. We data scientists need our audience to remember us, believe us, and trust us so they will take action willingly. Data storytelling is beyond science. It is art. However, it doesn’t mean we cannot improve this skill through observing and practicing. Try the 4D framework—Define, Display, Declutter, and Direct—for your next communication and let me know how it goes!

Originally published in Towards Data Science.