class: center, middle, inverse, title-slide .title[ # Touring multivariate data ] .subtitle[ ## SISBID 2024
https://github.com/dicook/SISBID
] .author[ ### Di Cook (
dicook@monash.edu
)
Heike Hofmann (
hhofmann4@unl.edu
)
Susan Vanderplas (
susan.vanderplas@unl.edu
) ] .date[ ### 08/14-16/2024 ] --- # Pairwise plots .pull-left[ <img src="index_files/figure-html/scatterplot matrix-1.png" width="432" style="display: block; margin: auto;" /> ] .pull-right[ <br><br><br> What don't you see? Unless you have tours, you'll never know đĢŖ ] --- class: inverse middle center # Our first tour
What patterns do you see?
−
+
01
:
30
--- .pull-left[ ``` r # Run the tour animate_xy(penguins[,2:5], col=penguins$species, axes="off", fps=15) ``` ] .pull-right[ <img src="penguins2d.gif" width="100%"> ] --- class: inverse middle # What did you see? - clusters â -- - outliers â -- - linear dependence â -- - elliptical clusters with slightly different shapes â -- - separated elliptical clusters with slightly different shapes â -- --- # Which shows better separation? .pull-left[ <img src="penguins2d.gif" width="80%"> ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-5-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # What is a tour? .pull-left[ A grand tour is by definition a movie of low-dimensional projections constructed in such a way that it comes arbitrarily close to showing all possible low-dimensional projections; in other words, a grand tour is a space-filling curve in the manifold of low-dimensional projections of high-dimensional data spaces. <img src="images/hands.png" width="80%"> ] .pull-right[ `\({\mathbf x}_i \in \mathcal{R}^p\)`, `\(i^{th}\)` data vector `\(F\)` is a `\(p\times d\)` orthonormal basis, `\(F'F=I_d\)`, where `\(d\)` is the projection dimension. The projection of `\({\mathbf x_i}\)` onto `\(F\)` is `\({\mathbf y}_i=F'{\mathbf x}_i\)`. Tour is indexed by time, `\(F(t)\)`, where `\(t\in [a, z]\)`. Starting and target frame denoted as `\(F_a = F(a), F_z=F(t)\)`. The animation of the projected data is given by a path `\({\mathbf y}_i(t)=F'(t){\mathbf x}_i\)`. ] --- # Geodesic interpolation between planes .pull-left[ Tour is indexed by time, `\(F(t)\)`, where `\(t\in [a, z]\)`. Starting and target frame denoted as `\(F_a = F(a), F_z=F(t)\)`. The animation of the projected data is given by a path `\({\mathbf y}_i(t)=F'(t){\mathbf x}_i\)`. ] .pull-right[ <img src="images/geodesic.png" width="120%"> ] --- class: inverse middle center # Reading axes - interpretation Length and direction of axes relative to the pattern of interest --- <img src="images/reading_axes.001.png" width="100%"> --- <img src="images/reading_axes.002.png" width="100%"> --- # Reading axes - interpretation <iframe src="penguins.html" width="800" height="500" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> --- .pull-left[ ``` r ggplot(penguins, aes(x=fl, y=bd, colour=species, shape=species)) + geom_point(alpha=0.7, size=2) + scale_colour_discrete_divergingx(palette = "Zissou 1") + theme(aspect.ratio=1, legend.position="bottom") ``` <img src="index_files/figure-html/runthis13-1.png" width="90%" style="display: block; margin: auto;" /> Gentoo from others in contrast of fl, bd ] .pull-right[ ``` r ggplot(penguins, aes(x=bl, y=bm, colour=species, shape=species)) + geom_point(alpha=0.7, size=2) + scale_colour_discrete_divergingx(palette = "Zissou 1") + theme(aspect.ratio=1, legend.position="bottom") ``` <img src="index_files/figure-html/runthis14-1.png" width="90%" style="display: block; margin: auto;" /> Chinstrap from others in contrast of bl, bm ] --- class: inverse middle left There may be multiple and different combinations of variables that reveal similar structure. âšī¸ The tour can help to discover these, too. đ --- # Other tour types - .orange[guided]: follows the optimisation path for a projection pursuit index. - .orange[little]: interpolates between all variables. - .orange[local]: rocks back and forth from a given projection, so shows all possible projections within a radius. - .orange[dependence]: two independent 1D tours - .orange[frozen]: fixes some variable coefficients, others vary freely. - .orange[manual]: control coefficient of one variable, to examine the sensitivity of structure this variable. - .orange[slice]: use a section instead of a projection. - .orange[sage]: transform a 2D projection, to avoid data piling. --- class: inverse middle center # guided tour new target bases are chosen using a projection pursuit index function --- `$$\mathop{\text{maximize}}_{F} g(F'x) ~~~\text{ subject to } F \text{ being orthonormal}$$` .font_small[ - `holes`: This is an inverse Gaussian filter, which is optimised when there is not much data in the center of the projection, i.e. a "hole" or donut shape in 2D. - `central mass`: The opposite of holes, high density in the centre of the projection, and often "outliers" on the edges. - `LDA`/`PDA`: An index based on the linear discriminant dimension reduction (and penalised), optimised by projections where the named classes are most separated. ] --- .pull-left[ Grand <img src="penguins2d.gif" width="80%"> .small[ Might accidentally see best separation ] ] .pull-right[ Guided, using LDA index <img src="penguins2d_guided.gif" width="80%"> .small[ Moves to the best separation ] ] --- class: inverse middle center # manual tour control the coefficient of one variable, reduce it to zero, increase it to 1, maintaining orthonormality --- # Manual tour .pull-left[ - start from best projection, given by projection pursuit - bl contribution controlled - if bl is removed form projection, Adelie and chinstrap are mixed - bl is important for Adelie ] .pull-right[ <img src="penguins_manual_bl.gif" width="90%"> ] --- # Manual tour .pull-left[ - start from best projection, given by projection pursuit - fl contribution controlled - cluster less separated when fl is fully contributing - fl is important, in small amounts, for Gentoo ] .pull-right[ <img src="penguins_manual_fl.gif" width="90%"> ] --- # Local tour .pull-left[ Rocks from and to a given projection, in order to observe the neighbourhood ] .pull-right[ <img src="penguins2d_local.gif" width="90%"> ] --- class: inverse middle # Your turn Using the sample code from the tour package, check how many clusters in the example data. ``` r library(tourr) data(flea) ?animate_xy animate_xy(flea[, 1:6]) ```
−
+
02
:
00
--- # Saving and sharing: Animated gif .pull-left[ ``` r render_gif( penguins[,2:5], grand_tour(), display_xy(col=penguins$species, axes="bottomleft"), file="penguins2d.gif", frames=100, width=300, height=300) ``` ] .pull-right[ <img src="penguins2d.gif" width="80%"> ] --- # Saving and sharing: Single frame .pull-left[ ``` r load(here::here("data/p_tour_path.rda")) penguins_pcti <- interpolate(penguins_pct, 0.2) f27 <- matrix(penguins_pcti[,,27], ncol=2) p27 <- render_proj(penguins[,2:5], f27, obs_labels=penguins$species) ``` Draw it with ggplot, and possibly pass to plotly. ] .pull-right[
] --- # Resources - [Cook and Laa (2024)](https://dicook.github.io/mulgar_book/) - Emerson et al (2013) The Generalized Pairs Plot, Journal of Computational and Graphical Statistics, 22:1, 79-91 - [Natalia da Silva](http://natydasilva.com/) [PPForest](https://cran.r-project.org/web/packages/PPforest/index.html) and [shiny app](https://natydasilva.shinyapps.io/shinyV03/). - Wickham et al (2011) [tourr: An R Package for Exploring Multivariate Data with Projections](https://www.jstatsoft.org/article/view/v040i02/v40i02.pdf) and the R package [tourr](https://cran.r-project.org/web/packages/tourr/index.html) - Schloerke et al (2016) [Escape from Boxland](https://journal.r-project.org/archive/2016/RJ-2016-044/index.html), [the web site zoo](http://schloerke.com/geozoo/) and the R package [geozoo](https://cran.r-project.org/web/packages/geozoo/index.html) - Spyrison and Cook (2020). spinifex: Manual Tours, Manual Control of Dynamic Projections of Numeric Multivariate Data. https://CRAN.R-project.org/package=spinifex - Stuart Lee [liminal](https://github.com/sa-lee/liminal) New tools to do linked brushing between tours and PCA/tSNE/PDS views --- # Share and share alike <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.