Visual perception and effective plot construction

class: center, middle, inverse, title-slide

.title[
# Visual perception and effective plot construction
]
.subtitle[
## SISBID 2024 <br> <a href="https://github.com/dicook/SISBID" class="uri">https://github.com/dicook/SISBID</a>
]
.author[
### Di Cook (<a href="mailto:dicook@monash.edu" class="email">dicook@monash.edu</a>) <br> Heike Hofmann (<a href="mailto:hhofmann4@unl.edu" class="email">hhofmann4@unl.edu</a>) <br> Susan Vanderplas (<a href="mailto:susan.vanderplas@unl.edu" class="email">susan.vanderplas@unl.edu</a>)
]
.date[
### 08/14-16/2024
]

---

background-image: \url(images/who_wore_it_better.jpg)
background-size: 20%
background-position: 99% 50%

# Let's play a game: Which plot wears it better?

On the next slide we have made **two different plots** of 2012 TB incidence in the USA, based on two variables:

```
# A tibble: 12 × 3
   sex   age   count
   <chr> <chr> <dbl>
 1 m     1524    239
 2 m     2534    322
 3 m     3544    333
 4 m     4554    502
 5 m     5564    455
 6 m     65      529
 7 f     1524    161
 8 f     2534    262
 9 f     3544    169
10 f     4554    175
11 f     5564    148
12 f     65      243
```

- In arrangement A, separate plots are made for age, and sex is mapped to the x axis. 
- Conversely, in arrangement B, separate plots are made for sex, and age is mapped to the x axis.

If you were to answer the question:  .orange[At which age(s) are the counts for males and females relatively the same?] Which plot makes this easier?

---

🔮 👽 👼 **TWO MINUTE CHALLENGE**

We've got two different rearrangements of the same information. .orange[At which age(s) are the counts for males and females relatively the same?] Which plot makes this easier?

What do we learn? That is different from each? What's the focus of each? What's easy, what's harder?

<!--

Go to www.menti.com and use the code 4651 9428.

-->

---

🔮 👽 👼 **TWO MINUTE CHALLENGE**

<span class=" faa-float animated " style=" display: -moz-inline-stack; display: inline-block; transform: rotate(0deg);">Try to write out a question that would be easier to answer from arrangement B.</span>

???
- Arrangement A makes it easier to directly compare male and female counts, separately for each age group. Generally, male counts are higher than female counts. There is a big difference between counts in the 45-54 age group, and over 65 counts are almost the same.
- Arrangement B makes it easier to directly compare counts by age group, separately for females and males. For females, incidence drops in the middle years. For males, it is pretty consistently high across age groups.

<br>

---

On the next slide we have made **two different plots** of TB incidence in the USA, based on three variables:

```
# A tibble: 10 × 4
    year sex   age   count
   <dbl> <chr> <chr> <dbl>
 1  1997 m     1524    330
 2  1997 m     2534    701
 3  1997 m     3544   1127
 4  1997 m     4554    979
 5  1997 m     5564    679
 6  1997 m     65      944
 7  1997 f     1524    269
 8  1997 f     2534    449
 9  1997 f     3544    447
10  1997 f     4554    254
```

- In plot type A, a line plot of counts is drawn separately by age and sex, and year is mapped to the x axis. 
- Conversely, in plot type B, counts for sex, and age are stacked into a bar chart, separately by age and sex, and year is mapped to the x axis

If you were to answer the question:  .orange[Is the trend for females generally decreasing over time?] Which plot makes this easier?

---

🔮 👽 👼 **TWO MINUTE CHALLENGE**

Which type of plot makes it easier to answer: .orange[Is the trend for females generally decreasing over time?]

---

🔮 👽 👼 **TWO MINUTE CHALLENGE**

What are the pros and cons of each way of displaying the same information? Should specific limits on axes be made?

<span class=" faa-float animated " style=" display: -moz-inline-stack; display: inline-block; transform: rotate(0deg);">Should the limits of the y axes in plot type A have included 0 (zero)?</span>

<br>

<div class="countdown" id="timer_32423168" data-update-every="1" tabindex="0" style="top:0;right:0;">
<div class="countdown-controls"><button class="countdown-bump-down">−</button><button class="countdown-bump-up">+</button></div>
<code class="countdown-time"><span class="countdown-digits minutes">00</span><span class="countdown-digits colon">:</span><span class="countdown-digits seconds">30</span></code>
</div>
---

🔮 👽 👼 **TWO MINUTE CHALLENGE**

Plot A computes the proportion and displays this as a line plot. Plot B uses a 100% chart of stacked bars for females and males. .orange[Is there an age effect in the proportion of incidence by gender? Is there also a temporal trend in the proportions?]

---
# Perceptual principles

- Hierarchy of mappings
- Pre-attentive: some elements are noticed before you even realise it.
- Color palettes: qualitative, sequential, diverging.
- Proximity: Place elements for primary comparison close together. 
- Change blindness: When focus is interrupted differences may not be noticed.

---
# Hierarchy of mappings

.pull-left[
1. Position - common scale (BEST)
2. Position - nonaligned scale
3. Length, direction, angle
4. Area
5. Volume, curvature
6. Shading, color (WORST)

(Cleveland, 1984; Heer and Bostock, 2009)

🔮 👽 👼 **TWO MINUTE CHALLENGE**

Come up with a plot type for each of the mappings.

]

.pull-right[
<img src="images/list_of_plots.png" width="90%">
]

---
# Color palettes

.left-column[

``` r
display.brewer.all()
```

Sequential, diverging, qualitative: [Color Brewer web site](http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3) annotates the palettes indicating attributes of the palettes.

]
.right-column[

]

---
# Sequential

``` r
dsamp <- diamonds[sample(nrow(diamonds), 1000), ]
(d <- ggplot(dsamp, aes(carat, price)) +
  geom_point(aes(colour = clarity)))
```

Map quantitative variable into color scheme that emphasizes one ened, either high or low. Rainbow scheme with viridis palette gives true grey scale.

---
# Sequential

``` r
d + scale_colour_brewer()
```

Default brewer sequential scale, blues. Focus is on the dark blue.

---
# Diverging

``` r
d + scale_colour_brewer(palette="PRGn")
```

Map quantitative variable into color scheme that emphasizes both ends, either high AND low, de-emphasizing middle.

---
# Qualitative

``` r
d + scale_colour_brewer(palette="Set1")
```

Map quantitative variable into color scheme to most differentiated set. It's possible to have too many colours to perceive the differences.

---
class: inverse middle

🔮 👽 👼 **TWO MINUTE CHALLENGE**

Of the previous three colour schemes on the same data, which would be the most appropriate? And why do you think so?

---
# Color blind-proofing

``` r
library(scales)
library(dichromat)
clrs <- hue_pal()(9)
d + theme(legend.position = "none")
clrs <- dichromat(hue_pal()(9))
d + scale_colour_manual("", values=clrs) + theme(legend.position = "none")
```

Online checking tool [coblis](https://www.color-blindness.com/coblis-color-blindness-simulator/) allows you to upload an image and it will re-map the colors for different colour perception issues. The package `colorblind` has color blind friendly palettes (Susan VanderPlas: but the colours are awful, to my eye).

---

.pull-left[

Original colours

<img src="index_files/figure-html/show the default colour scheme-1.png" width="100%" style="display: block; margin: auto;" />
]

.pull-right[

Color blind view

<img src="index_files/figure-html/show the dichromat adjusted colors-1.png" width="100%" style="display: block; margin: auto;" />
]

---
# Pre-attentive

Can you find the odd one out?

---

Is it easier now?

---
# Proximity

Place elements that you want to compare close to each other. If there are multiple comparisons to make, you need to decide which one is most important.

---
# Mapping and proximity

.left-column[
Same proximity is used, but different geoms. Is one better than the other to determine the relative ratios of males to females by age?
]
.right-column[
<img src="index_files/figure-html/side-by-side bars of males/females-1.png" width="720" style="display: block; margin: auto;" />

<img src="index_files/figure-html/piecharts of males/females-1.png" width="720" style="display: block; margin: auto;" />
]

---
# Mapping and proximity

.left-column[
Same proximity is used, but different geoms. Is one better than the other to determine the relative ratios of ages by sex?
]
.right-column[
<img src="index_files/figure-html/side-by-side bars of age-1.png" width="720" style="display: block; margin: auto;" />

<img src="index_files/figure-html/piecharts of age-1.png" width="720" style="display: block; margin: auto;" />
]

---
# Change blindness

``` r
ggplot(dsamp, aes(x=carat, y=price, colour = clarity)) +
  geom_point() +
  geom_smooth(se=FALSE) +
  scale_color_brewer(palette="Set1") +
  facet_wrap(~clarity, ncol=4)
```

Which has the steeper slope, VS1 or VS2?

---

Making comparisons across plots requires the eye to jump from one focal point to another. It may result in not noticing differences.

``` r
ggplot(dsamp, aes(x=carat, y=price, colour = clarity)) +
  geom_point() +
  geom_smooth(se=FALSE) +
  scale_color_brewer(palette="Set1") 
```

---
# Core principles

- Make a plot of your **data**! The hierarchy matters primarily if the structure is weak, or if differences between groups is small. 
- Knowing how to use proximity is an extremely valuable skill, and not well utilised.
- Use of colour is a very valuable skill, and there are many bad habits to over-use, too many colours or precariously mapping to a continuous variable to add another dimension.

<br>

- Show the data! There are a lot of examples where the statistics are plotted, but the magic comes when you plot the data. Plot the statistics if the volume of data is overwhelming, to tighten the message, but still plot the data for yourself and to keep track of the variability.
- One plot is never enough! Plot the data in different ways, it will help digest the relationships between variables, to gain a better understanding of the data.

---
class: inverse middle
# Your turn

This builds on the exercise from the previous session.

- Using your choice of country, for example, Australia, make a set of plots to explore the TB incidence among males relative to females over different age groups for 2012.
- Choose your best plot to answer this question: .orange[Is there a higher prevalence of TB among younger women in 2012?]

<div class="countdown" id="timer_cdb643cb" data-update-every="1" tabindex="0" style="right:0;bottom:0;">
<div class="countdown-controls"><button class="countdown-bump-down">−</button><button class="countdown-bump-up">+</button></div>
<code class="countdown-time"><span class="countdown-digits minutes">07</span><span class="countdown-digits colon">:</span><span class="countdown-digits seconds">00</span></code>
</div>
---
# Resources

- [Claus O. Wilke, Fundamentals of Data Visualization](https://clauswilke.com/dataviz/)
- [Naomi Robbins, Creating More Effective Graphs](http://www.nbr-graphs.com)
- [Cleveland WS, McGill R. 1984. Graphical perception: Theory, experimentation, ...](https://www.tandfonline.com/doi/abs/10.1080/01621459.1984.10478080)
- [Heer J, Bostock M. 2010. Crowdsourcing graphical perception](http://vis.stanford.edu/files/2010-MTurk-CHI.pdf)
- [Antony Unwin, Graphical Data Analysis with R](https://www.crcpress.com/Graphical-Data-Analysis-with-R/Unwin/9781498715232)
- [Wagemans J et al. 2012. A Century of Gestalt Psychology in Visual Perception: I. Perceptual Grouping and Figure-Ground Organization. Psychological Bulletin 138:1172–1217](http://dx.doi.org/10.1037/a0029333)
- [Wagemans J, Feldman J, Gepshtein S, Kimchi R, Pomerantz JR, et al. 2012. A Century of Gestalt psychology in Visual Perception: II. Conceptual and Theoretical Foundations. Psychological Bulletin 138:1218–1252](https://doi.org/10.1037/a0029334)
- [Wickham H. 2013. Graphical criticism](https://vita.had.co.nz/papers/stat-graph-hist.pdf)
- [VanderPlas S, Goluch R, Hofmann H. 2019. Framed!](https://amstat.tandfonline.com/doi/full/10.1080/10618600.2018.1562937#.XS41dS1L21s)
- [VanderPlas S, Hofmann H. 2015 Signs of the Sine Illusion](https://amstat.tandfonline.com/doi/abs/10.1080/10618600.2014.951547?scroll=top&needAccess=true&journalCode=ucgs20#.XS413i1L21s)

---
# Share and share alike

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.