class: center, middle, inverse, title-slide .title[ # Visual perception and effective plot construction ] .subtitle[ ## SISBID 2024
https://github.com/dicook/SISBID
] .author[ ### Di Cook (
dicook@monash.edu
)
Heike Hofmann (
hhofmann4@unl.edu
)
Susan Vanderplas (
susan.vanderplas@unl.edu
) ] .date[ ### 08/14-16/2024 ] --- background-image: \url(images/who_wore_it_better.jpg) background-size: 20% background-position: 99% 50% # Let's play a game: Which plot wears it better? On the next slide we have made **two different plots** of 2012 TB incidence in the USA, based on two variables: ``` # A tibble: 12 × 3 sex age count <chr> <chr> <dbl> 1 m 1524 239 2 m 2534 322 3 m 3544 333 4 m 4554 502 5 m 5564 455 6 m 65 529 7 f 1524 161 8 f 2534 262 9 f 3544 169 10 f 4554 175 11 f 5564 148 12 f 65 243 ``` - In arrangement A, separate plots are made for age, and sex is mapped to the x axis. - Conversely, in arrangement B, separate plots are made for sex, and age is mapped to the x axis. If you were to answer the question: .orange[At which age(s) are the counts for males and females relatively the same?] Which plot makes this easier? --- 🔮 👽 👼 **TWO MINUTE CHALLENGE** <img src="index_files/figure-html/focus on one year gender side-by-side bars of males/females-1.png" width="720" style="display: block; margin: auto;" /> <img src="index_files/figure-html/focus on one year age side-by-side bars of age group-1.png" width="720" style="display: block; margin: auto;" /> We've got two different rearrangements of the same information. .orange[At which age(s) are the counts for males and females relatively the same?] Which plot makes this easier? What do we learn? That is different from each? What's the focus of each? What's easy, what's harder? <!-- Go to www.menti.com and use the code 4651 9428. <div style='position: relative; padding-bottom: 56.25%; padding-top: 35px; height: 0; overflow: hidden;'><iframe sandbox='allow-scripts allow-same-origin allow-presentation' allowfullscreen='true' allowtransparency='true' frameborder='0' height='315' src='https://www.mentimeter.com/embed/c7464477c3f1274f23886cf21c41ec89/ad3e75b80c75' style='position: absolute; top: 0; left: 0; width: 100%; height: 100%;' width='420'></iframe></div> --> --- 🔮 👽 👼 **TWO MINUTE CHALLENGE**
Try to write out a question that would be easier to answer from arrangement B.
??? - Arrangement A makes it easier to directly compare male and female counts, separately for each age group. Generally, male counts are higher than female counts. There is a big difference between counts in the 45-54 age group, and over 65 counts are almost the same. - Arrangement B makes it easier to directly compare counts by age group, separately for females and males. For females, incidence drops in the middle years. For males, it is pretty consistently high across age groups. <br> <img src="index_files/figure-html/unnamed-chunk-3-1.png" width="720" style="display: block; margin: auto;" /> <img src="index_files/figure-html/unnamed-chunk-4-1.png" width="720" style="display: block; margin: auto;" /> --- On the next slide we have made **two different plots** of TB incidence in the USA, based on three variables: ``` # A tibble: 10 × 4 year sex age count <dbl> <chr> <chr> <dbl> 1 1997 m 1524 330 2 1997 m 2534 701 3 1997 m 3544 1127 4 1997 m 4554 979 5 1997 m 5564 679 6 1997 m 65 944 7 1997 f 1524 269 8 1997 f 2534 449 9 1997 f 3544 447 10 1997 f 4554 254 ``` - In plot type A, a line plot of counts is drawn separately by age and sex, and year is mapped to the x axis. - Conversely, in plot type B, counts for sex, and age are stacked into a bar chart, separately by age and sex, and year is mapped to the x axis If you were to answer the question: .orange[Is the trend for females generally decreasing over time?] Which plot makes this easier? --- <img src="index_files/figure-html/use a line plot instead of bar-1.png" width="720" style="display: block; margin: auto;" /> <img src="index_files/figure-html/colour and axes fixes-1.png" width="720" style="display: block; margin: auto;" /> 🔮 👽 👼 **TWO MINUTE CHALLENGE** Which type of plot makes it easier to answer: .orange[Is the trend for females generally decreasing over time?]
−
+
01
:
50
--- 🔮 👽 👼 **TWO MINUTE CHALLENGE** What are the pros and cons of each way of displaying the same information? Should specific limits on axes be made?
Should the limits of the y axes in plot type A have included 0 (zero)?
<br> <img src="index_files/figure-html/unnamed-chunk-8-1.png" width="720" style="display: block; margin: auto;" /> <img src="index_files/figure-html/unnamed-chunk-9-1.png" width="720" style="display: block; margin: auto;" />
−
+
00
:
30
--- 🔮 👽 👼 **TWO MINUTE CHALLENGE** Plot A computes the proportion and displays this as a line plot. Plot B uses a 100% chart of stacked bars for females and males. .orange[Is there an age effect in the proportion of incidence by gender? Is there also a temporal trend in the proportions?]
−
+
01
:
05
<img src="index_files/figure-html/use a line plot for proportions-1.png" width="720" style="display: block; margin: auto;" /> <img src="index_files/figure-html/compare proportions of males/females-1.png" width="720" style="display: block; margin: auto;" /> --- # Perceptual principles - Hierarchy of mappings - Pre-attentive: some elements are noticed before you even realise it. - Color palettes: qualitative, sequential, diverging. - Proximity: Place elements for primary comparison close together. - Change blindness: When focus is interrupted differences may not be noticed. --- # Hierarchy of mappings .pull-left[ 1. Position - common scale (BEST) 2. Position - nonaligned scale 3. Length, direction, angle 4. Area 5. Volume, curvature 6. Shading, color (WORST) (Cleveland, 1984; Heer and Bostock, 2009) 🔮 👽 👼 **TWO MINUTE CHALLENGE** Come up with a plot type for each of the mappings.
−
+
01
:
40
] .pull-right[ <img src="images/list_of_plots.png" width="90%"> ] --- # Color palettes .left-column[ ``` r display.brewer.all() ``` Sequential, diverging, qualitative: [Color Brewer web site](http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3) annotates the palettes indicating attributes of the palettes. ] .right-column[ <img src="index_files/figure-html/unnamed-chunk-15-1.png" width="864" style="display: block; margin: auto;" /> ] --- # Sequential ``` r dsamp <- diamonds[sample(nrow(diamonds), 1000), ] (d <- ggplot(dsamp, aes(carat, price)) + geom_point(aes(colour = clarity))) ``` <img src="index_files/figure-html/mapping numbers to rainbow sequential scale-1.png" width="60%" style="display: block; margin: auto;" /> Map quantitative variable into color scheme that emphasizes one ened, either high or low. Rainbow scheme with viridis palette gives true grey scale. --- # Sequential ``` r d + scale_colour_brewer() ``` <img src="index_files/figure-html/mapping numbers to sequential scale-1.png" width="60%" style="display: block; margin: auto;" /> Default brewer sequential scale, blues. Focus is on the dark blue. --- # Diverging ``` r d + scale_colour_brewer(palette="PRGn") ``` <img src="index_files/figure-html/mapping numbers to diverging scale-1.png" width="60%" style="display: block; margin: auto;" /> Map quantitative variable into color scheme that emphasizes both ends, either high AND low, de-emphasizing middle. --- # Qualitative ``` r d + scale_colour_brewer(palette="Set1") ``` <img src="index_files/figure-html/mapping numbers to qualitative palette-1.png" width="60%" style="display: block; margin: auto;" /> Map quantitative variable into color scheme to most differentiated set. It's possible to have too many colours to perceive the differences. --- class: inverse middle 🔮 👽 👼 **TWO MINUTE CHALLENGE** Of the previous three colour schemes on the same data, which would be the most appropriate? And why do you think so?
−
+
00
:
50
--- # Color blind-proofing ``` r library(scales) library(dichromat) clrs <- hue_pal()(9) d + theme(legend.position = "none") clrs <- dichromat(hue_pal()(9)) d + scale_colour_manual("", values=clrs) + theme(legend.position = "none") ``` Online checking tool [coblis](https://www.color-blindness.com/coblis-color-blindness-simulator/) allows you to upload an image and it will re-map the colors for different colour perception issues. The package `colorblind` has color blind friendly palettes (Susan VanderPlas: but the colours are awful, to my eye). --- .pull-left[ Original colours <img src="index_files/figure-html/show the default colour scheme-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ Color blind view <img src="index_files/figure-html/show the dichromat adjusted colors-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Pre-attentive Can you find the odd one out? <img src="index_files/figure-html/is shape preattentive-1.png" width="720" style="display: block; margin: auto;" /> --- Is it easier now? <img src="index_files/figure-html/is color preattentive-1.png" width="720" style="display: block; margin: auto;" /> --- # Proximity Place elements that you want to compare close to each other. If there are multiple comparisons to make, you need to decide which one is most important. <img src="index_files/figure-html/a line plot on sex-1.png" width="720" style="display: block; margin: auto;" /> <img src="index_files/figure-html/a line plot on age-1.png" width="432" style="display: block; margin: auto;" /> --- # Mapping and proximity .left-column[ Same proximity is used, but different geoms. Is one better than the other to determine the relative ratios of males to females by age? ] .right-column[ <img src="index_files/figure-html/side-by-side bars of males/females-1.png" width="720" style="display: block; margin: auto;" /> <img src="index_files/figure-html/piecharts of males/females-1.png" width="720" style="display: block; margin: auto;" /> ] --- # Mapping and proximity .left-column[ Same proximity is used, but different geoms. Is one better than the other to determine the relative ratios of ages by sex? ] .right-column[ <img src="index_files/figure-html/side-by-side bars of age-1.png" width="720" style="display: block; margin: auto;" /> <img src="index_files/figure-html/piecharts of age-1.png" width="720" style="display: block; margin: auto;" /> ] --- # Change blindness ``` r ggplot(dsamp, aes(x=carat, y=price, colour = clarity)) + geom_point() + geom_smooth(se=FALSE) + scale_color_brewer(palette="Set1") + facet_wrap(~clarity, ncol=4) ``` <img src="index_files/figure-html/facetting plots can result in change blindness-1.png" width="50%" style="display: block; margin: auto;" /> Which has the steeper slope, VS1 or VS2? --- Making comparisons across plots requires the eye to jump from one focal point to another. It may result in not noticing differences. ``` r ggplot(dsamp, aes(x=carat, y=price, colour = clarity)) + geom_point() + geom_smooth(se=FALSE) + scale_color_brewer(palette="Set1") ``` <img src="index_files/figure-html/averlaying makes comparisons easier-1.png" width="70%" style="display: block; margin: auto;" /> --- # Core principles - Make a plot of your **data**! The hierarchy matters primarily if the structure is weak, or if differences between groups is small. - Knowing how to use proximity is an extremely valuable skill, and not well utilised. - Use of colour is a very valuable skill, and there are many bad habits to over-use, too many colours or precariously mapping to a continuous variable to add another dimension. <br> - Show the data! There are a lot of examples where the statistics are plotted, but the magic comes when you plot the data. Plot the statistics if the volume of data is overwhelming, to tighten the message, but still plot the data for yourself and to keep track of the variability. - One plot is never enough! Plot the data in different ways, it will help digest the relationships between variables, to gain a better understanding of the data. --- class: inverse middle # Your turn This builds on the exercise from the previous session. - Using your choice of country, for example, Australia, make a set of plots to explore the TB incidence among males relative to females over different age groups for 2012. - Choose your best plot to answer this question: .orange[Is there a higher prevalence of TB among younger women in 2012?]
−
+
07
:
00
--- # Resources - [Claus O. Wilke, Fundamentals of Data Visualization](https://clauswilke.com/dataviz/) - [Naomi Robbins, Creating More Effective Graphs](http://www.nbr-graphs.com) - [Cleveland WS, McGill R. 1984. Graphical perception: Theory, experimentation, ...](https://www.tandfonline.com/doi/abs/10.1080/01621459.1984.10478080) - [Heer J, Bostock M. 2010. Crowdsourcing graphical perception](http://vis.stanford.edu/files/2010-MTurk-CHI.pdf) - [Antony Unwin, Graphical Data Analysis with R](https://www.crcpress.com/Graphical-Data-Analysis-with-R/Unwin/9781498715232) - [Wagemans J et al. 2012. A Century of Gestalt Psychology in Visual Perception: I. Perceptual Grouping and Figure-Ground Organization. Psychological Bulletin 138:1172–1217](http://dx.doi.org/10.1037/a0029333) - [Wagemans J, Feldman J, Gepshtein S, Kimchi R, Pomerantz JR, et al. 2012. A Century of Gestalt psychology in Visual Perception: II. Conceptual and Theoretical Foundations. Psychological Bulletin 138:1218–1252](https://doi.org/10.1037/a0029334) - [Wickham H. 2013. Graphical criticism](https://vita.had.co.nz/papers/stat-graph-hist.pdf) - [VanderPlas S, Goluch R, Hofmann H. 2019. Framed!](https://amstat.tandfonline.com/doi/full/10.1080/10618600.2018.1562937#.XS41dS1L21s) - [VanderPlas S, Hofmann H. 2015 Signs of the Sine Illusion](https://amstat.tandfonline.com/doi/abs/10.1080/10618600.2014.951547?scroll=top&needAccess=true&journalCode=ucgs20#.XS413i1L21s) --- # Share and share alike <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.