Training course: Plotting Data for Communication and Exploration

Dianne Cook
Monash University
Produced for e61, September 23, 2024

About me

👋🏼 Hi!

Thanks for having me come and teach about data visualisation today.

  • Professor of Statistics at Monash University, in the Monash Business School.
  • PhD in Statistics from Rutgers University in New Jersey.
  • My undergraduate degree was in Mathematics and Statistics from University of New England, Armidale.
  • I moved to Monash University in 2015, after spending more than 20 years in the USA.
  • More than 100 publications on topics related to data visualisation.

Please introduce yourself to me, some time today. Let me know what your background is, and what you primarily work on.



Please feel free to stop me 🛑 and ask questions 🙋🏽 , or add comments, any time today.

Outline

  1. We’ll start with some clarification of the difference between data visualisation for communication and exploration. (30 mins)
  2. Then spend about 2/3 of the time on content for communication, which includes data management, plot specification, design principles and cognitive perception, assessing the effectiveness of a plot, including uncertainty. (4 hours)
  3. The last 1/3 will be primarily on exploration, including exploring missing values, interactive graphics, checking if patterns are real or spurious. (2 hours)

In reality, there is a substantial overlap in methodology between the two activities.

What is the difference between communication and exploration?

Exploration

Learn as much as possible about the data, as fast as possible, without missing anything.

Allow oneself to be surprised.

First think about what you might expect to see, and then you can evaluate whether is it surprising.

Communication

Do one thing well!

What is the main message to be communicated? The primary purpose for the plot design is to make this easy to see.

Is there a second, or a third message? This can be factored into the design secondarily.

Example: Exploration (1/2)

One of the ugliest plots ever, but one of the most useful.

Training set is different from the test set. Getting the best predictive accuracy on the test set will need training the model taking this difference into account.

The team won the competition!

Example: Exploration (2/2)

  • Look at the data from many sides.
  • Fix problems, re-plot.
  • Drill down to see finer detail.
  • Check your expectations or surprising patterns, using inferential methods.

Example: Communication

Primary message: incidence for young adults is increasing

This needs to be the first pattern that a viewer sees.



Secondary patterns: more incidence among men, older ages.

Plus, nicer axes, nicer labels, title, good colour, symbols, annotations

End of session 0

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.