Chapter - Communicating with Visualization

Supplementary chapter prepared for the BWXT Data Science Workforce Training Pilot. This material is original to the program.

About this chapter

Making a chart for yourself while exploring data is one thing; making a chart that communicates to a stakeholder is another. The competency map lists reporting and visualization as a named skill. This chapter covers the principles of charts that inform and persuade — the difference between a plot that you understand and one that anyone can.

A chart has a job

Before choosing a chart type, decide what question it answers. Most charts do one of four jobs:

Job	Good chart
Compare values across categories	bar chart
Show a distribution	histogram, box plot
Show change over time	line chart
Show a relationship between two numbers	scatter plot

If you cannot say in one sentence what a chart shows, it is not ready.

A 2x2 grid of charts: a bar chart comparing categories, a histogram of a distribution, a line chart over time, and a scatter plot of a relationship. — Each chart job has a natural chart type: bar for comparison, histogram for distribution, line for change over time, scatter for relationships.

Show the code that generated this plot

python

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(0)
fig, axes = plt.subplots(2, 2, figsize=(10, 7))

axes[0, 0].bar(['A', 'B', 'C', 'D'], [30, 55, 20, 45], color='#2457c5')
axes[0, 0].set_title('Compare categories -> bar chart')

axes[0, 1].hist(rng.normal(50, 12, 500), bins=25, color='#2457c5')
axes[0, 1].set_title('Show a distribution -> histogram')

months = np.arange(1, 13)
axes[1, 0].plot(months, 20 + 3 * months + rng.normal(0, 2, 12), marker='o', color='#2457c5')
axes[1, 0].set_title('Change over time -> line chart')

x = rng.normal(50, 10, 120)
axes[1, 1].scatter(x, 0.6 * x + rng.normal(0, 6, 120), color='#2457c5', alpha=0.6)
axes[1, 1].set_title('Relationship -> scatter plot')
plt.tight_layout()
plt.show()

Principles for charts that communicate

One message per chart. Decide the single takeaway and make it obvious.
Label directly. Title, axes, and units. A reader should understand it without you in the room.
Remove clutter. Drop gridlines, 3D effects, and decoration that do not carry information.
Order with meaning. Sort bars by value, not alphabetically, unless order matters.
Use color sparingly and on purpose. Highlight the point; do not paint a rainbow.
Be honest. Start bar-chart axes at zero; do not crop to exaggerate a difference.

The same data can mislead or inform depending on these choices. Clutter buries the message:

Two bar charts of the same site data: the left is a rainbow-colored, heavily gridded, unsorted mess; the right is sorted, uses one highlight color, and labels the highest bar. — Same numbers, two treatments. The clean version sorts the bars, highlights the one that matters, and drops the decoration.

Show the code that generated this plot

python

import numpy as np
import matplotlib.pyplot as plt

categories = ['Site A', 'Site B', 'Site C', 'Site D', 'Site E']
values = [12, 19, 9, 25, 15]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(11, 4.4))

# Cluttered: rainbow colors, heavy grid, unsorted, no focus
ax1.bar(categories, values, color=['#e6194b', '#3cb44b', '#ffe119', '#4363d8', '#f58231'])
ax1.grid(True, color='#9aa5b1', linewidth=1.2)
ax1.set_title('Cluttered')

# Clean: sorted, highlight the max, minimal chrome
order = np.argsort(values)[::-1]
cats = [categories[i] for i in order]
vals = [values[i] for i in order]
colors = ['#2457c5' if v == max(vals) else '#cbd5e1' for v in vals]
ax2.bar(cats, vals, color=colors)
ax2.set_ylabel('defect rate (%)')
for spine in ('top', 'right'):
    ax2.spines[spine].set_visible(False)
ax2.set_title('Clean')
plt.show()

Two specific principles are easiest to see side by side. First, start the axis at zero — a truncated axis turns a tiny difference into a dramatic one:

Two bar charts of the same four quarterly values: the left starts the y-axis at 80 and exaggerates the differences, the right starts at zero and shows them as nearly equal. — Identical data. Starting the y-axis at 80 (left) fakes a big trend; the honest zero baseline (right) shows the values are nearly flat.

Show the code that generated this plot

python

import matplotlib.pyplot as plt

quarters = ['Q1', 'Q2', 'Q3', 'Q4']
values = [82, 84, 83, 86]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4.2))
ax1.bar(quarters, values, color='#b42318')
ax1.set_ylim(80, 87)   # truncated axis exaggerates the difference
ax1.set_title('Misleading: y-axis starts at 80')

ax2.bar(quarters, values, color='#2457c5')
ax2.set_ylim(0, 100)   # honest zero baseline
ax2.set_title('Honest: y-axis starts at 0')
plt.show()

Second, order bars by value so the ranking reads instantly:

Two bar charts of the same five categories: alphabetical order on the left makes the ranking hard to read, value-sorted order on the right makes the largest and smallest obvious. — Sorting by value (right) lets a reader rank the categories at a glance; alphabetical order (left) hides it.

Show the code that generated this plot

python

import numpy as np
import matplotlib.pyplot as plt

cats = ['Alpha', 'Bravo', 'Charlie', 'Delta', 'Echo']
values = [22, 8, 31, 14, 19]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4.2))
ax1.bar(cats, values, color='#cbd5e1')
ax1.set_title('Alphabetical: ranking is hard to see')

order = np.argsort(values)[::-1]
ax2.bar([cats[i] for i in order], [values[i] for i in order], color='#2457c5')
ax2.set_title('Sorted by value: ranking is obvious')
plt.show()

Exploration versus reporting

These are two different modes, and they call for different charts:

Exploration is for you. Quick matplotlib and seaborn plots (covered in the Dataset Statistics chapter) to find patterns. Rough is fine.
Reporting is for someone else. Fewer charts, each polished, each with one clear message and full labels. Cut everything that is not part of the point.

A common mistake is to hand a stakeholder an exploration plot — dense, unlabeled, ten series at once — and expect them to find the insight you already know. Do that work for them.

Tell a story, not a data dump

A good report leads with the takeaway, supports it with one or two clean visuals, and states what should happen next. The result of a defect-detection model only changes a decision when the person making it can see, in seconds, what the data says.

Practice Questions

What four "jobs" do most charts do, and which chart suits each?
Why should every chart have a single, stated takeaway?
Give three ways to remove clutter from a chart.
Why should a bar chart's value axis usually start at zero?
What is the difference between an exploration chart and a reporting chart?

What you'll be able to do

Key terms in this chapter

Chapter - Communicating with Visualization

About this chapter

A chart has a job

Principles for charts that communicate

Exploration versus reporting

Tell a story, not a data dump

Practice Questions

Check your understanding

Go deeper