Visual perception and effective plot construction

SISBID 2025
https://github.com/dicook/SISBID

Game: Which plot wears it better?

Coming up: 2 different plots of 2012 TB incidence (e.g. newly diagnosed cases) in Kenya, based on variables:

tb_kn |> 
  filter(year == 2012) |> 
  dplyr::select(sex, age, count) |>
  head()
# A tibble: 6 × 3
  sex   age   count
  <chr> <chr> <dbl>
1 m     15-24  4893
2 m     25-34  8149
3 m     35-44  5302
4 m     45-54  2493
5 m     55-64  1099
6 m     65+     669
  • In arrangement A, separate plots are made for age, and sex is mapped to the x axis.
  • Conversely, in arrangement B, separate plots are made for sex, and age is mapped to the x axis.

At which age(s) are the counts for males and females relatively the same?

Which plot makes this question easier to answer?

TWO MINUTE CHALLENGE 🔮 👽 👼

At which age(s) are the counts relatively similar across sex?

Which plot makes this easier? What do we learn from each? What’s the focus? What’s easy? What’s harder?

TWO MINUTE CHALLENGE 🔮 👽 👼

Write out a question that would be easier to answer from arrangement B.

Go to www.menti.com and use the code 2979 2396.

Three Variables

Next, we have two different plots of TB incidence in Kenya, based on three variables:

tb_kn |> select(year, sex, age, count) |> head(10)
# A tibble: 10 × 4
    year sex   age   count
   <dbl> <chr> <chr> <dbl>
 1  1995 m     15-24  2072
 2  1995 m     25-34  3073
 3  1995 m     35-44  1675
 4  1995 m     45-54   920
 5  1995 m     55-64   485
 6  1995 m     65+     296
 7  1995 f     15-24  1802
 8  1995 f     25-34  1759
 9  1995 f     35-44   741
10  1995 f     45-54   411
  • In plot type A, a line plot of counts is drawn separately by age and sex, and year is mapped to the x axis.
  • Conversely, in plot type B, counts for sex, and age are stacked into a bar chart, separately by age and sex, and year is mapped to the x axis

Is the trend for females generally decreasing over time? Which plot makes this easier?

TWO MINUTE CHALLENGE 🔮 👽 👼

Which type of plot makes it easier to answer

Is the trend for females generally decreasing over time?

01:50

TWO MINUTE CHALLENGE 🔮 👽 👼

What are the pros and cons of each way of displaying the same information? Should specific limits on axes be made?

Should the limits of the y axis in plot A include 0 (zero)?

00:30

TWO MINUTE CHALLENGE 🔮 👽 👼

Plot A shows the proportion as a line plot.
Plot B shows stacked bars scaled to 100% for females and males.

Is there an age effect in the proportion of incidence by gender? Is there a temporal trend in the proportions?

01:05

Perceptual principles

  • Hierarchy of mappings
  • Pre-attentive: some elements are noticed before you even realise it.
  • Color palettes: qualitative, sequential, diverging.
  • Proximity: Place elements for primary comparison close together.
  • Change blindness: When focus is interrupted differences may not be noticed.

Hierarchy of mappings

  1. Position - common scale (BEST)
  2. Position - nonaligned scale
  3. Length, direction, angle
  4. Area
  5. Volume, curvature
  6. Shading, color (WORST)

(Cleveland, 1984; Heer and Bostock, 2009)

TWO MINUTE CHALLENGE 🔮 👽 👼

Come up with a plot type for each of the mappings.

  1. Position - common scale (BEST)
  2. Position - nonaligned scale
  3. Length, direction, angle
  4. Area
  5. Volume, curvature
  6. Shading, color (WORST)

(Cleveland, 1984; Heer and Bostock, 2009)

01:40

Color palettes

display.brewer.all()
  • Sequential,
  • Diverging,
  • Qualitative

Color Brewer annotates palettes with attributes.

display.brewer.all()

Sequential

dsamp <- diamonds |>
  sample_n(1000)
(d <- ggplot(
  dsamp, aes(carat, price)) +
  geom_point(aes(
    colour = clarity)))

  • Emphasize one side of the spectrum

  • viridis package palette

    • maps to uniform grey scale

Sequential

d + scale_colour_brewer(direction = -1)

  • Default brewer sequential scale, blues.

  • Focus is on the dark blue.

Diverging

d + scale_colour_brewer(palette="PRGn")

  • Emphasize both ends, high AND low
  • De-emphasize middle

Qualitative

d + scale_colour_brewer(palette="Set1")

Map qualitative variables to most differentiated set of colors.

It’s possible to have too many colours to perceive differences.

TWO MINUTE CHALLENGE 🔮 👽 👼

Of the previous four colour schemes on the same data, which would be the most appropriate? Why?

  • viridis
  • ColorBrewer sequential Blues
  • ColorBrewer Diverging PRGn
  • ColorBrewer Categorical Set1
00:50

Color blind-proofing

clrs <- hue_pal()(9)
d + theme(legend.position = "none")

clrs <- dichromat(hue_pal()(9))
d + 
  scale_colour_manual("", values=clrs) + 
  theme(legend.position = "none")
  • Online checking tool coblis: upload an image and it will re-map the colors for different colour perception issues.
  • The package colorblind has color blind friendly palettes (Susan: but the colours are awful 😭).

Color blind Simulation

Original colours

Color blind view

Pre-attentive

Can you find the odd one out?

Pre-attentive

Is it easier now?

Proximity

Place elements that you want to compare close to each other. If there are multiple comparisons to make, you need to decide which one is most important.

Mapping and proximity

Same proximity is used, but different geoms.

  • Which is better to determine the relative ratios of males to females by age?

Mapping and proximity

Same proximity is used, but different geoms.

Which is better to determine the relative ratios of ages by sex?

Change blindness

ggplot(dsamp, aes(x=carat, y=price, colour = clarity)) +
  geom_point() +
  geom_smooth(se=FALSE) +
  scale_color_brewer(palette="Set1") +
  facet_wrap(~clarity, ncol=4)

Which has the steeper slope, VS1 or VS2?

Change blindness

Making comparisons across plots requires the eye to jump from one focal point to another.

It may result in not noticing differences.

ggplot(dsamp, aes(x=carat, y=price, 
                  colour = clarity)) +
  geom_point() +
  geom_smooth(se=FALSE) +
  scale_color_brewer(palette="Set1") 

Core principles

  • Make a plot of your data!
    • The hierarchy matters if the structure is weak or differences b/w groups are small.
  • Knowing how to use proximity is a valuable and rare skill
  • Use of colour: don’t over use
    • Too many colours
    • Mapping cts variable to colour to add another dimension

Core principles

  • Show the data!
    • Statistics are good if there’s too much data
    • Always plot the data for yourself to see the variability
  • One plot is never enough
    • Plot the data in different ways
    • Understand the relationships between variables

Your turn

This builds on the exercise from the previous session.

  • Using your choice of country, for example, Australia, make a set of plots to explore the TB incidence among males relative to females over different age groups for 2012.
  • Choose your best plot to answer this question: Is there a higher prevalence of TB among younger women in 2012?
07:00

Resources