Session 1
Professor Di Cook
Monash University
@visnut@aus.social
tourr: Implements geodesic interpolation and basis generation functions that allow you to create new tour methods from R.
geozoo: Geometric objects defined in ‘geozoo’ can be simulated or displayed in the R package ‘tourr’.
spinifex: Implements manual control, where the contribution of a selected variable can be adjusted between -1 to 1, to examine the sensitivity of structure in the data to that variable.
liminal: A special purpose package to compare the views of the data provided by non-linear dimension with tSNE and a tour.
slides1.R file from https://github.com/dicook/MDAG_2022My version of R, and installed packages is:
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur ... 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] showtext_0.9-5 showtextdb_3.0 sysfonts_0.8.8 ggthemes_4.2.4 forcats_0.5.2 stringr_1.4.1 dplyr_1.0.10
#> [8] purrr_0.3.5 readr_2.1.3 tidyr_1.2.1 tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2 knitr_1.40
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.0 xfun_0.34 haven_2.5.1 gargle_1.2.1 colorspace_2.0-3
#> [6] vctrs_0.5.1 generics_0.1.3 htmltools_0.5.3 yaml_2.3.6 utf8_1.2.2
#> [11] rlang_1.0.6 pillar_1.8.1 withr_2.5.0 glue_1.6.2 DBI_1.1.3
#> [16] dbplyr_2.2.1 modelr_0.1.9 readxl_1.4.1 lifecycle_1.0.3 munsell_0.5.0
#> [21] gtable_0.3.1 cellranger_1.1.0 rvest_1.0.3 evaluate_0.18 tzdb_0.3.0
#> [26] fastmap_1.1.0 curl_4.3.3 fansi_1.0.3 broom_1.0.0 backports_1.4.1
#> [31] scales_1.2.1 googlesheets4_1.0.1 jsonlite_1.8.3 fs_1.5.2 hms_1.1.2
#> [36] digest_0.6.30 stringi_1.7.8 grid_4.2.1 cli_3.4.1 tools_4.2.1
#> [41] magrittr_2.0.3 crayon_1.5.2 pkgconfig_2.0.3 ellipsis_0.3.2 xml2_1.3.3
#> [46] reprex_2.0.2 googledrive_2.0.0 lubridate_1.9.0 timechange_0.1.1 assertthat_0.2.1
#> [51] rmarkdown_2.18 httr_1.4.4 rstudioapi_0.14 R6_2.5.1 compiler_4.2.1
library(tidyverse)
library(palmerpenguins)
penguins <- penguins %>% filter(!is.na(bill_length_mm))
penguins_sub <- penguins[,c(1, 3:6)] %>%
mutate(across(where(is.numeric), ~ scale(.)[,1])) %>%
rename(bl = bill_length_mm,
bd = bill_depth_mm,
fl = flipper_length_mm,
bm = body_mass_g)
summary(penguins_sub)#> species bl bd fl bm
#> Adelie :151 Min. :-2.17 Min. :-2.05 Min. :-2.06 Min. :-1.87
#> Chinstrap: 68 1st Qu.:-0.86 1st Qu.:-0.79 1st Qu.:-0.78 1st Qu.:-0.81
#> Gentoo :123 Median : 0.10 Median : 0.08 Median :-0.28 Median :-0.19
#> Mean : 0.00 Mean : 0.00 Mean : 0.00 Mean : 0.00
#> 3rd Qu.: 0.84 3rd Qu.: 0.78 3rd Qu.: 0.86 3rd Qu.: 0.68
#> Max. : 2.87 Max. : 2.20 Max. : 2.14 Max. : 2.62
See Allison Horst’s web site for more details.
|
|
|
|
| Adélie 1 | Gentoo 2 | Chinstrap 3 |
Movement of points, generally in a grand tour, can provide additional information about structure in high-dimensions.

Length and direction of axes relative to the pattern of interest


Length and direction of axes relative to the pattern of interest


A grand tour is by definition a movie of low-dimensional projections constructed in such a way that it comes arbitrarily close to showing all possible low-dimensional projections; in other words, a grand tour is a space-filling curve in the manifold of low-dimensional projections of high-dimensional data spaces.
\({\mathbf x}_i \in \mathcal{R}^p\), \(i^{th}\) data vector
\(F\) is a \(p\times d\) orthonormal basis, \(F'F=I_d\), where \(d\) is the projection dimension.
The projection of \({\mathbf x_i}\) onto \(F\) is \({\mathbf y}_i={\mathbf x}_iF\).
Tour is indexed by time, \(F(t)\), where \(t\in [a, z]\). Starting and target frame denoted as \(F_a = F(a), F_z=F(t)\).
The animation of the projected data is given by a path \({\mathbf y}_i(t)={\mathbf x}_iF(t)\).
Tour is indexed by time, \(F(t)\), where \(t\in [a, z]\). Starting and target frame denoted as \(F_a = F(a), F_z=F(t)\).
The animation of the projected data is given by a path \({\mathbf y}_i(t)={\mathbf x}_iF(t)\).

Let’s take a look at some common high-d shapes with a grand tour
Hollow

Solid

Hollow

Solid

Torus

Mobius

A grand tour is like a random walk (with interpolation) through the space of all possible projections: sphere for 1D, torus for 2D.1


spinifex package.New target bases are chosen using a projection pursuit index function
\[\mathop{\text{maximize}}_{F} g(xF) ~~~\text{ subject to } F \text{ being orthonormal}\]
holes: This is an inverse Gaussian filter, which is optimised when there is not much data in the center of the projection, i.e. a “hole” or donut shape in 2D.central mass: The opposite of holes, high density in the centre of the projection, and often “outliers” on the edges.LDA/PDA: An index based on the linear discriminant dimension reduction (and penalised), optimised by projections where the named classes are most separated.Grand

Might accidentally see best separation
Guided, using LDA index

Moves to the best separation
bd contribution controlledbd is removed from projection, Gentoo separation disappearsbd is important for distinguishing Gentoo
bl contribution controlledbl is important for distinguishing Adelie from Chinstrap
Rocks from and to a given projection, in order to observe the neighbourhood

Solid 4D sphere

Hollow 4D sphere

4D Torus

4D Hollow Cube

Compute PCA, reduce dimension, show original variable axes in the reduced space.

1D projections displayed as a density

Density contours overlaid on a scatterplot

More on these in Session 2
function (center = TRUE, axes = "center",
half_range = NULL, col = "black", pch = 20,
cex = 1, edges = NULL, edges.col = "black",
...)
{
labels <- NULL
if (!areColors(col))
col <- mapColors(col)
init <- function(data) {
half_range <<- compute_half_range(
half_range, data, center)
labels <<- abbreviate(colnames(data), 3)
}
if (!is.null(edges)) {
if (!is.matrix(edges) && ncol(edges) == 2) {
stop("Edges matrix needs two columns,
from and to, only.")
}
}
render_frame <- function() {
par(pty = "s", mar = rep(0.1, 4))
blank_plot(xlim = c(-1, 1),
ylim = c(-1, 1))
}
render_transition <- function() {
rect(-1, -1, 1, 1, col = "#FFFFFFE6",
border = NA)
}
render_data <- function(data, proj,
geodesic) {
draw_tour_axes(proj, labels, limits = 1,
axes, ...)
x <- data %*% proj
if (center)
x <- center(x)
x <- x/half_range
points(x, col = col, pch = pch,
cex = cex)
if (!is.null(edges)) {
segments(x[edges[, 1], 1],
x[edges[, 1], 2],
x[edges[, 2], 1],
x[edges[, 2], 2],
col = edges.col)
}
}
list(init = init,
render_frame = render_frame,
render_transition = render_transition,
render_data = render_data,
render_target = nul)
}
To create a new display, a function containing these functions:
render_data: draw the data using base, input at least data, projinit: initialise the drawingrender_frame: clears display to restart drawingrender_transition: allows for different drawing when a target is reachedMethod 1, using gifski and tourr::render_gif(). See lots of code chunks!
render_gif(penguins_sub[,2:5], grand_tour(),
display_xy(col=col, axes="bottomleft"),
"penguins2d.gif", apf=1/5,
frames=100, width=300, height=300)
Method 2, using plotly (see create_plotly_animation.R):
frame pointing to your indexggplotlyhtmltools::save_html()Use the spinifex function play_tour_path() to do this easily.
We can learn a little more about the data if have a tour in the toolbox. It can help us to understand
Slides produced using quarto.
Slides available from https://github.com/dicook/MDAG_2022.
Viewable at https://dicook.github.io/MDAG/slides.html)).