EDA = visualise + manipulate + model

Hadley Wickham, Chief Scientist, RStudio

Visualisation alone is not enough to solve most data analysis challenges. The data may be too big or too messy to show in a single plot. In this talk, I'll outline my current thinking about how the synthesis of visualisation, modelling, and data manipulation allows you to effectively explore and understand large and complex datasets. This work is embedded in R so I'll not only talk about the ideas, but show concrete code for working with large sets of model. You'll see how you can combine the dplyr and purrr packages to fit many models, then use tidyr and broom to convert to tidy datas which can be visualised with ggplot2.

video link
Slides