17 Neural networks and deep learning

Neural networks (NN) can be considered to be nested additive (or even ensemble) models where explanatory variables are combined, and transformed through an activation function like a logistic. These transformed combinations are added recursively to yield class predictions. They are considered to be black box models, but there is a growing demand for interpretability. Although interpretability is possible, it can be unappealing to understand a complex model constructed to tackle a difficult classification task. Nevertheless, this is the motivation for the explanation of visualisation for NN models in this chapter.

In the simplest form, we might write the equation for a NN as

\[ \hat{y} = f(x) = a_0+\sum_{h=1}^{s} w_{0h}\phi(a_h+\sum_{i=1}^{p} w_{ih}x_i) \] where \(s\) indicates the number of nodes in the hidden (middle layer), and \(\phi\) is a choice of activation function. In a simple situation where \(p=3\), \(s=2\), and linear output layer, the model could be written as:

\[ \begin{aligned} \hat{y} = a_0+ & w_{01}\phi(a_1+w_{11}x_1+w_{21}x_2+w_{31}x_3) +\\ & w_{02}\phi(a_2+w_{12}x_1+w_{22}x_2+w_{32}x_3) \end{aligned} \] which is a combination of two (linear) models, each of which could be examined for their role in making predictions.

In practice, a model may have many nodes, and several hidden layers, a variety of activation functions, and regularisation modifications. One should keep in mind the principle of parsimony is important when applying NNs, because it is tempting to make an overly complex, and thus over-parameterised, construction. Fitting NNs is still problematic. One would hope that fitting produces a stable result, whatever the starting seed the same parameter estimates are returned. However, this is not the case, and different, sometimes radically different, results are routinely obtained after each attempted fit (Wickham et al., 2015).

For these examples we use the software keras (Allaire & Chollet, 2023) following the installation and tutorial details at https://tensorflow.rstudio.com/tutorials/. Because it is an interface to python it can be tricky to install. If this is a problem, the example code should be possible to convert to use nnet (Venables & Ripley, 2002) or neuralnet (Fritsch et al., 2019). We will use the penguins data to illustrate the fitting, because it makes it easier to understand the procedures and the fit. However, a NN is like using a jackhammer instead of a trowel to plant a seedling, more complicated than necessary to build a good classification model for this data.

17.1 Setting up the model

A first step is to decide how many nodes the NN architecture should have, and what activation function should be used. To make these decisions, ideally you already have some knowledge of the shapes of class clusters. For the penguins classification, we have seen that it contains three elliptically shaped clusters of roughly the same size. This suggests two nodes in the hidden layer would be sufficient to separate three clusters (Figure 17.1). Because the shapes of the clusters are convex, using linear activation (“relu”) will also be sufficient. The model specification is as follows:

Load libraries

source("code/setup.R")

Code

library(keras)
tensorflow::set_random_seed(211)

# Define model
p_nn_model <- keras_model_sequential()
p_nn_model |> 
  layer_dense(units = 2, activation = 'relu', 
              input_shape = 4) |> 
  layer_dense(units = 3, activation = 'softmax')
p_nn_model |> summary

loss_fn <- loss_sparse_categorical_crossentropy(
  from_logits = TRUE)

p_nn_model |> compile(
  optimizer = "adam",
  loss      = loss_fn,
  metrics   = c('accuracy')
)

Note that tensorflow::set_random_seed(211) sets the seed for the model fitting so that we can obtain the same result to discuss later. It needs to be set before the model is defined in the code. The model will also be saved in order to diagnose and make predictions.

Figure 17.1: Network architecture for the model on the penguins data. The round nodes indicate original or transformed variables, and each arrow connecting these is represented as one of the weights \(w_{ih}\) in the definition. The boxes indicate the additive constant entering the nodes, and the corresponding arrows represent the terms \(a_h\).

17.2 Checking the training/test split

Splitting the data into training and test is an essential way to protect against overfitting, for most classifiers, but especially so for the copiously parameterised NNs. The model specified for the penguins data with only two nodes is unlikely to be overfitted, but it is nevertheless good practice to use a training set for building and a test set for evaluation.

Figure 17.2 shows the tour being used to examine the split into training and test samples for the penguins data. Using random sampling, particularly stratified by group, should result the two sets being very similar, as can be seen here. It does happen that several observations in the test set are on the extremes of their class cluster, so it could be that the model makes errors in the neighbourhoods of these points.

Code

# Split the data intro training and testing
library(rsample)
load("data/penguins_sub.rda") # from mulgar book

set.seed(821)
p_split <- penguins_sub |> 
  select(bl:species) |>
  initial_split(prop = 2/3, 
                strata=species)
p_train <- training(p_split)
p_test <- testing(p_split)

# Check training and test split
p_split_check <- bind_rows(
  bind_cols(p_train, type = "train"), 
  bind_cols(p_test, type = "test")) |>
  mutate(type = factor(type))

Code to run tours

animate_xy(p_split_check[,1:4], 
           col=p_split_check$species,
           pch=p_split_check$type, 
           shapeset=c(16,1))
animate_xy(p_split_check[,1:4], 
           guided_tour(lda_pp(p_split_check$species)),
           col=p_split_check$species,
           pch=p_split_check$type, 
           shapeset=c(16,1))
render_gif(p_split_check[,1:4],
           grand_tour(),
           display_xy( 
             col=p_split_check$species, 
             pch=p_split_check$type, 
             shapeset=c(16,1),
             cex=1.5,
             axes="bottomleft"), 
           gif_file="gifs/p_split.gif",
           frames=500,
           loop=FALSE
)
render_gif(p_split_check[,1:4],
           guided_tour(lda_pp(p_split_check$species)),
           display_xy( 
             col=p_split_check$species, 
             pch=p_split_check$type, 
             shapeset=c(16,1),
             cex=1.5,
             axes="bottomleft"), 
           gif_file="gifs/p_split_guided.gif",
           frames=500,
           loop=FALSE
)

Animation showing a grand tour of the penguins data, with colour indicating species (red=Gentoo, blue=Adelie, yellow=Chinstrap) and shape indicating training (open circles) vs test (closed circles). The distribution of points in each group is reasonably similar regardless of projection. — (a) Grand tour

Animation showing a guided tour of the penguins data, with colour indicating species (red=Gentoo, blue=Adelie, yellow=Chinstrap) and shape indicating training (open circles) vs test (closed circles). The distribution of points in each group is reasonably similar regardless of projection. — (a) Grand tour

17.3 Fit the model

The data needs to be specially formatted for the model fitted using keras. The explanatory variables need to be provided as a matrix, and the categorical response needs to be separate, and specified as a numeric variable, beginning with 0.

Code

# Data needs to be matrix, and response needs to be numeric
p_train_x <- p_train |>
  select(bl:bm) |>
  as.matrix()
p_train_y <- p_train |> pull(species) |> as.numeric() 
p_train_y <- p_train_y-1 # Needs to be 0, 1, 2
p_test_x <- p_test |>
  select(bl:bm) |>
  as.matrix()
p_test_y <- p_test |> pull(species) |> as.numeric() 
p_test_y <- p_test_y-1 # Needs to be 0, 1, 2

The specified model is reasonably simple, four input variables, two nodes in the hidden layer and a three column binary matrix for output. This corresponds to 5+5+3+3+3=19 parameters.

Model: "sequential"
________________________________________________________________________________
 Layer (type)                       Output Shape                    Param #     
================================================================================
 dense_1 (Dense)                    (None, 2)                       10          
 dense (Dense)                      (None, 3)                       9           
================================================================================
Total params: 19 (76.00 Byte)
Trainable params: 19 (76.00 Byte)
Non-trainable params: 0 (0.00 Byte)
________________________________________________________________________________

Code

# Fit model
p_nn_fit <- p_nn_model |> keras::fit(
  x = p_train_x, 
  y = p_train_y,
  epochs = 200,
  verbose = 0
)

Because we set the random number seed we will get the same fit each time the code provided here is run. However, if the model is re-fit without setting the seed, you will see that there is a surprising amount of variability in the fits. Setting epochs = 200 helps to usually get a good fit. One expects that keras is reasonably stable so one would not expect the huge array of fits as observed in Wickham et al. (2015) using nnet. That this can happen with the simple model used here reinforces the notion that fitting of NN models is fiddly, and great care needs to be taken to validate and diagnose the fit.

Fitting NN models is fiddly, and very different fitted models can result from restarts, parameter choices, and architecture.

Code

library(keras)

# load fitted model
p_nn_model <- load_model_tf("data/penguins_cnn")

The fitted model that we have chosen as the final one has reasonably small loss and high accuracy. Plots of loss and accuracy across epochs showing the change during fitting can be plotted, but we don’t show them here, because they are generally not very interesting.

Code

p_nn_model |> evaluate(p_test_x, p_test_y, verbose = 0)

     loss  accuracy 
0.2563850 0.9553571

The model object can be saved for later use with:

Code

save_model_tf(p_nn_model, "data/penguins_cnn")

17.4 Extracting model components

View the individual node models to understand how they combine to produce the overall model.

Because nodes in the hidden layers of NNs are themselves (relatively simple regression) models, it can be interesting to examine these to understand how the model is making it’s predictions. Although it’s rarely easy, most software will allow the coefficients for the models at these nodes to be extracted. With the penguins NN model there are two nodes, so we can extract the coefficients and plot the resulting two linear combinations to examine the separation between classes.

Code

# Extract hidden layer model weights
p_nn_wgts <- keras::get_weights(p_nn_model, trainable=TRUE)
p_nn_wgts

[[1]]
           [,1]        [,2]
[1,]  0.6216676  1.33304155
[2,]  0.1851478 -0.01596385
[3,] -0.1680396 -0.30432791
[4,] -0.8867414 -0.36627045

[[2]]
[1]  0.12708087 -0.09466381

[[3]]
           [,1]     [,2]       [,3]
[1,] -0.1646167 1.527644 -1.9215064
[2,] -0.7547278 1.555889  0.3210194

[[4]]
[1]  0.4554813 -0.9371488  0.3577386

The linear coefficients for the first node in the model are 0.62, 0.19, -0.17, -0.89, and the second node in the model are 1.33, -0.02, -0.3, -0.37. We can use these like we used the linear discriminants in LDA to make a 2D view of the data, where the model is separating the three species. The constants 0.13, -0.09 are not important for this. They are only useful for drawing the location of the boundaries between classes produced by the model.

These two sets of model coefficients provide linear combinations of the original variables. Together, they define a plane on which the data is projected to view the classification produced by the model. Ideally, though this plane should be defined using an orthonormal basis otherwise the shape of the data distribution might be warped. So we orthonormalise this matrix before computing the data projection.

Code

# Orthonormalise the weights to make 2D projection
p_nn_wgts_on <- tourr::orthonormalise(p_nn_wgts[[1]])
p_nn_wgts_on

           [,1]       [,2]
[1,]  0.5593355  0.7969849
[2,]  0.1665838 -0.2145664
[3,] -0.1511909 -0.1541475
[4,] -0.7978314  0.5431528

Figure 17.3 shows the data projected into the plane determined by the two linear combinations of the two nodes in the hidden layer. Training and test sets are indicated by empty and solid circles. The three species are clearly different but there is some overlap or confusion for a few penguins. The most interesting aspect to learn is that there is no big gap between the Gentoo and other species, which we know exists in the data. The model has not found this gap, and thus is likely to unfortunately and erroneously confuse some Gentoo penguins, particularly with Adelie.

What we have shown here is a process to use the models at the nodes of the hidden layer to produce a reduced dimensional space where the classes are best separated, at least as determined by the model. The process will work in higher dimensions also.

When there are more nodes in the hidden layer than the number of original variables it means that the space is extended to achieve useful classifications that need more complicated non-linear boundaries. The extra nodes describe the non-linearity. Wickham et al. (2015) provides a good illustration of this in 2D. The process of examining each of the node models can be useful for understanding this non-linear separation, also in high dimensions.

# Hidden layer
p_train_m <- p_train |>
  mutate(nn1 = as.matrix(p_train[,1:4]) %*%
           as.matrix(p_nn_wgts_on[,1], ncol=1),
         nn2 = as.matrix(p_train[,1:4]) %*%
           matrix(p_nn_wgts_on[,2], ncol=1))

# Now add the test points on.
p_test_m <- p_test |>
  mutate(nn1 = as.matrix(p_test[,1:4]) %*%
           as.matrix(p_nn_wgts_on[,1], ncol=1),
         nn2 = as.matrix(p_test[,1:4]) %*%
           matrix(p_nn_wgts_on[,2], ncol=1))
p_train_m <- p_train_m |>
  mutate(set = "train")
p_test_m <- p_test_m |>
  mutate(set = "test")
p_all_m <- bind_rows(p_train_m, p_test_m)
ggplot(p_all_m, aes(x=nn1, y=nn2, 
                     colour=species, shape=set)) + 
  geom_point() +
  scale_colour_discrete_divergingx(palette="Zissou 1") +
  scale_shape_manual(values=c(16, 1)) +
  theme_minimal() +
  theme(aspect.ratio=1)

A scatter plot of a projection showing clusters of three penguin species—Adélie (blue), Chinstrap (yellow), and Gentoo (red)—based on multiple variables. Data points are marked as either training data (open circles) or test data (filled circles). Both have x-axis 'nn1' with labels -2, -1, 0, 1, 2 and 3 and y-axis 'nn2' with labels -2, -1, 0, 1 and 2. The clusters are different but not quite separated. There is little difference between training and test sets. A legend in the top-right corner indicates species colors, while another legend at the bottom-right differentiates between training (open circles) and test (filled circles) data. — Figure 17.3: Plot of the data in the linear combinations from the two nodes in the hidden layer. The three species are clearly different, although with some overlap between all three. A main issue to notice is that there isn’t a big gap between Gentoo and the other species, which we know exists in 4D based on our data exploration done in other chapters. This suggests that this fitted model is sub-optimal.

This particular fit is not good, based on what we know about the data. There should be a fit where a two node hidden layer produces a 2D representation where the difference between classes is clearer.

17.5 Examining predictive probabilities

When the predictive probabilities are returned by a model, as is done by this NN, we can use a ternary diagram for three class problems, or high-dimensional simplex when there are more classes to examine the strength of the classification. This done in the same way that was used for the votes matrix from a random forest in Section 15.2.1.

Code

# Predict training and test set
# This looks like there is some machine dependency on the
# saved model, so predictions are saved for loading across platforms
p_nn_model <- load_model_tf("data/penguins_cnn")
p_train_pred <- p_nn_model |> 
  predict(p_train_x, verbose = 0)
p_test_pred <- p_nn_model |> 
  predict(p_test_x, verbose = 0)

save(p_train_pred, file="data/p_train_pred.rda")
save(p_test_pred, file="data/p_test_pred.rda")

Code

# Predict training and test set
load("data/p_train_pred.rda")
p_train_pred_cat <- levels(p_train$species)[
  apply(p_train_pred, 1,
        which.max)]
p_train_pred_cat <- factor(
  p_train_pred_cat,
  levels=levels(p_train$species))
table(p_train$species, p_train_pred_cat)

           p_train_pred_cat
            Adelie Chinstrap Gentoo
  Adelie        92         4      1
  Chinstrap      0        45      0
  Gentoo         1         0     78

Code

load("data/p_test_pred.rda")
p_test_pred_cat <- levels(p_test$species)[
  apply(p_test_pred, 1, 
        which.max)]
p_test_pred_cat <- factor(
  p_test_pred_cat,
  levels=levels(p_test$species))
table(p_test$species, p_test_pred_cat)

           p_test_pred_cat
            Adelie Chinstrap Gentoo
  Adelie        45         3      1
  Chinstrap      0        23      0
  Gentoo         1         0     39

Code

# Set up the data to make the ternary diagram
# Join data sets
colnames(p_train_pred) <- c("Adelie", "Chinstrap", "Gentoo")
colnames(p_test_pred) <- c("Adelie", "Chinstrap", "Gentoo")
p_train_pred <- as_tibble(p_train_pred)
p_train_m <- p_train_m |>
  mutate(pspecies = p_train_pred_cat) |>
  bind_cols(p_train_pred) |>
  mutate(set = "train")
p_test_pred <- as_tibble(p_test_pred)
p_test_m <- p_test_m |>
  mutate(pspecies = p_test_pred_cat) |>
  bind_cols(p_test_pred) |>
  mutate(set = "test")
p_all_m <- bind_rows(p_train_m, p_test_m)

# Add simplex to make ternary
proj <- t(geozoo::f_helmert(3)[-1,])
p_nn_v_p <- as.matrix(p_all_m[,c("Adelie", "Chinstrap", "Gentoo")]) %*% proj
colnames(p_nn_v_p) <- c("x1", "x2")
p_nn_v_p <- p_nn_v_p |>
  as.data.frame() |>
  mutate(species = p_all_m$species,
         set = p_all_m$set)

simp <- geozoo::simplex(p=2)
sp <- data.frame(cbind(simp$points), simp$points[c(2,3,1),])
colnames(sp) <- c("x1", "x2", "x3", "x4")
sp$species = sort(unique(penguins_sub$species))

Code

# Plot it
ggplot() +
  geom_segment(data=sp, aes(x=x1, y=x2, xend=x3, yend=x4)) +
  geom_text(data=sp, aes(x=x1, y=x2, label=species),
            nudge_x=c(-0.1, 0.15, 0),
            nudge_y=c(0.05, 0.05, -0.05)) +
  geom_point(data=p_nn_v_p, aes(x=x1, y=x2, 
                                colour=species,
                                shape=set), 
             size=2, alpha=0.5) +
  scale_color_discrete_divergingx(palette="Zissou 1") +
  scale_shape_manual(values=c(19, 1)) +
  theme_map() +
  theme(aspect.ratio=1, legend.position = "right")

A scatterplot where all points fall inside a 2D triangle, marked by black lines. The triangle is upside down and is equilateral. Each vertex is labelled: Chinstrap (top left), Adelie (top right), Gentoo (bottom left). Each points corresponds to a penguin, and is coloured by species. The legend at right shows the color labelling as blue (Adelie), yellow (Chinstrap), red (Gentoo), and shape labels as open circles (training) and filled circles (test). Chinstrap points are a tight cluster at their vertex. Adelie and Gentoo are not centered at the vertex but part way along towards the others vertex. This indicates the confusion between these two species. There are some yellow points mixed with blue points but not with red. — Ternary diagram for the three groups of the predictive probabilities of both training and test sets. From what we already know about the penguins data this fit is not good. Both Chinstrap and Gentoo penguins are confused with Adelie, or at risk of it. Gentoo is very well-separated from the other two species when several variables are used, and this fitted model is blind to it. One useful finding is that there is little difference between training and test sets, so the model has not been over-fitted.

If the training and test sets look similar when plotted in the model space then the model is not suffering from over-fitting.

17.6 Application to a large dataset

To see how these methods apply in the setting where we have a large number of variables, observations and classes we will look at a neural network that predicts the category for the fashion MNIST data. The code for designing and fitting the model is following the tutorial available from https://tensorflow.rstudio.com/tutorials/keras/classification and you can find additional information there. Below we only replicate the steps needed to build the model from scratch. We also note that a similar investigation was presented in Li et al. (2020), with a focus on investigating the model at different epochs during the training.

The first step is to download and prepare the data. Here we scale the observations to range between zero and one, and we define the label names.

Code

library(keras)

# download the data
fashion_mnist <- dataset_fashion_mnist()

# split into input variables and response
c(train_images, train_labels) %<-% fashion_mnist$train
c(test_images, test_labels) %<-% fashion_mnist$test

# for interpretation we also define the category names
class_names = c('T-shirt/top',
                'Trouser',
                'Pullover',
                'Dress',
                'Coat',
                'Sandal',
                'Shirt',
                'Sneaker',
                'Bag',
                'Ankle boot')

# rescaling to the range (0,1)
train_images <- train_images / 255
test_images <- test_images / 255

In the next step we define the neural network and train the model. Note that because we have many observations, even a very simple structure returns a good model. And because this example is well-known, we do not need to tune the model or check the validation accuracy.

Code

# defining the model
model_fashion_mnist <- keras_model_sequential()
model_fashion_mnist |>
  # flatten the image data into a long vector
  layer_flatten(input_shape = c(28, 28)) |>
  # hidden layer with 128 units
  layer_dense(units = 128, activation = 'relu') |>
  # output layer for 10 categories
  layer_dense(units = 10, activation = 'softmax')

model_fashion_mnist |> compile(
  optimizer = 'adam',
  loss = 'sparse_categorical_crossentropy',
  metrics = c('accuracy')
)

# fitting the model, if we did not know the model yet we
# would add a validation split to diagnose the training
model_fashion_mnist |> fit(train_images,
              train_labels,
              epochs = 5)
save_model_tf(model_fashion_mnist, "data/fashion_nn")

We have defined a flat neural network with a single hidden layer with 128 nodes. To investigate the model we can start by comparing the activations to the original input data distribution. Since both the input space and the space of activations is large, and they are of different dimensionality, we will first use principal component analysis. This simplifies the analysis, and in general we do not need the original pixel or hidden node information for the interpretation here. The comparison is using the test-subset of the data.

Code

# get the fitted model
model_fashion_mnist <- load_model_tf("data/fashion_nn")
# observed response labels in the test set
test_tags <- factor(class_names[test_labels + 1],
                    levels = class_names)

# calculate activation for the hidden layer, this can be done
# within the keras framework
activations_model_fashion <- keras_model(
  inputs = model_fashion_mnist$input,
  outputs = model_fashion_mnist$layers[[2]]$output
)
activations_fashion <- predict(
  activations_model_fashion,
  test_images, verbose = 0)

# PCA for activations
activations_pca <- prcomp(activations_fashion)
activations_pc <- as.data.frame(activations_pca$x)

# PCA on the original data
# we first need to flatten the image input
test_images_flat <- test_images
dim(test_images_flat) <- c(nrow(test_images_flat), 784)
images_pca <- prcomp(as.data.frame(test_images_flat))
images_pc <- as.data.frame(images_pca$x)

Code to run tours

p2 <- ggplot(activations_pc,
       aes(PC1, PC2, color = test_tags)) +
  geom_point(size = 0.1) +
  ggtitle("Activations") +
  scale_color_discrete_qualitative(palette = "Dynamic") +
  theme_bw() +
  theme(legend.position = "none", aspect.ratio = 1)

p1 <- ggplot(images_pc,
       aes(PC1, PC2, color = test_tags)) +
  geom_point(size = 0.1) +
  ggtitle("Input space") +
  scale_color_discrete_qualitative(palette = "Dynamic") +
  theme_bw() +
  theme(legend.position = "none", aspect.ratio = 1)

legend_labels <- cowplot::get_legend(
  p1 + 
    guides(color = guide_legend(nrow = 1)) +
    theme(legend.position = "bottom",
          legend.title = element_blank()) +
    guides(color = guide_legend(override.aes = list(size = 1))) 
)
# hide plotting code
cowplot::plot_grid(cowplot::plot_grid(p1, p2), legend_labels,
                   rel_heights = c(1, .3), nrow = 2)

A side-by-side comparison of two scatter plots, each showing a PC1 vs PC2 of data points color-coded by category. The left plot, titled 'Input space', represents the raw input data projected into two principal components. Data points are densely packed and overlapping, with no clear separation between categories. The right plot, titled 'Activations', shows the same data after being transformed by a neural network (e.g., through hidden layer activations). In this plot, the points are more clearly separated into distinct groups, with elongated and curved cluster structures suggesting that the network has learned to organize the data in a more class-discriminative way. Axes in both plots are labeled PC1 (x-axis) and PC2 (y-axis), and the same color scheme is used across both plots to represent consistent categories. — First two PCs of the original data (left) and the activations (right). The activation space has done a useful transformation of the 784 variables into a reduced dimension that reveals class clusters, because some cluster differences can be seen well in the first two PCs.

Looking only at the first two principal components we note some clear differences from the transformation in the hidden layer. The observations seem to be more evenly spread in the input space, while in the activations space we notice grouping along specific directions. In particular the category “Bag” appears to be most different from all other classes, and the non-linear transformation in the activations space shows that they are clearly different from the shoe categories, while in the input space we could note some overlap in the linear projection. To better identify differences between other groups we will use the tour on the first five principal components.

Code to run tours

animate_xy(images_pc[,1:5], col = test_tags,
        cex=0.2, palette = "Dynamic")
animate_xy(activations_pc[,1:5], col = test_tags,
        cex=0.2, palette = "Dynamic")

render_gif(images_pc[,1:5],
           grand_tour(),
           display_xy( 
             col=test_tags, 
             cex=0.2,
             palette = "Dynamic",
             axes="bottomleft"), 
           gif_file="gifs/fashion_images_gt.gif",
           frames=500,
           loop=FALSE
)
render_gif(activations_pc[,1:5],
           grand_tour(),
           display_xy( 
             col=test_tags, 
             cex=0.2,
             palette = "Dynamic",
             axes="bottomleft"), 
           gif_file="gifs/fashion_activations_gt.gif",
           frames=500,
           loop=FALSE
)

As with the first two principal components we get a much more spread out distribution in the original space. Nevertheless we can see differences between the classes, and that some groups are varying along specific directions in that space. Overall the activations space shows tighter clusters as expected after including the ReLU activation function, but the picture is not as neat as the first two principal components would suggest. While certain groups appear very compact even in this larger subspace, others vary quite a bit within part of the space. For example we can clearly see the “Bag” observations as different from all other images, but also notice that there is a large variation within this class along certain directions.

An animation showing 2D scatterplots of the first five PCs of the fashion image data. Each point represents an individual image, colored by clothing category: Ankle boot (pink), Pullover (red), Trouser (yellow), Shirt (green), Coat (teal), Sandal (light blue), Sneaker (purple), Dress (peach), Bag (orange), and T-shirt/top (magenta). Clusters are visible although muddy. In the bottom-left corner, is the axis display showing the contribution of the first five principal components in each projection. A legend on the top-right maps each color to its corresponding fashion category. — (a) Input space

An animation showing 2D scatterplots of the first five PCs of the neural network activation data. Each point represents an individual image, colored by clothing category: Ankle boot (pink), Pullover (red), Trouser (yellow), Shirt (green), Coat (teal), Sandal (light blue), Sneaker (purple), Dress (peach), Bag (orange), and T-shirt/top (magenta). Clusters are visible and some are quite distinct. In the bottom-left corner, is the axis display showing the contribution of the first five principal components in each projection. A legend on the top-right maps each color to its corresponding fashion category. — (a) Input space

Finally we will investigate the model performance through the misclassifications and uncertainty between classes. We start with the error matrix for the test observations. To fit the error matrix we use the numeric labels, the ordering is as defined above for the labels.

Code

fashion_test_pred <- predict(model_fashion_mnist,
                             test_images, verbose = 0)
fashion_test_pred_cat <- levels(test_tags)[
  apply(fashion_test_pred, 1,
        which.max)]
predicted <- factor(
  fashion_test_pred_cat,
  levels=levels(test_tags)) |>
  as.numeric() - 1
observed <- as.numeric(test_tags) -1
table(observed, predicted)

        predicted
observed    0    1    2    3    4    5    6    7    8    9
       0  128    0   84  719   10    0   51    7    1    0
       1    0   42   15  941    0    0    0    2    0    0
       2    6    0  824   38  121    0    3    8    0    0
       3    1    0   14  976    8    0    1    0    0    0
       4    1    0  205  181  605    0    0    8    0    0
       5    1    0    0    2    0   77    0  902    1   17
       6   24    0  231  282  405    0   44   12    2    0
       7    0    0    0    0    0    0    0 1000    0    0
       8   17    1   78  102   49    0    5  730   16    2
       9    0    0    0    2    0    0    0  947    0   51

Here the labels are used as 0 - T-shirt/top, 1 - Trouser, 2 - Pullover, 3 - Dress, 4 - Coat, 5 - Sandal, 6 - Shirt, 7 - Sneaker, 8 - Bag, 9 - Ankle boot. From this we see that the model mainly confuses certain categories with each other, and within expected groups (e.g. different types of shoes can be confused with each other, or different types of shirts). We can further investigate this by visualizing the full probability matrix for the test observations, to see which categories the model is uncertain about.

Code to visualize probabilities

# getting the probabilities from the output layer
fashion_test_pred <- predict(model_fashion_mnist,
                             test_images, verbose = 0)

# this is the same code as was used in the RF chapter
proj <- t(geozoo::f_helmert(10)[-1,])
f_nn_v_p <- as.matrix(fashion_test_pred) %*% proj
colnames(f_nn_v_p) <- c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9")

f_nn_v_p <- f_nn_v_p |>
  as.data.frame() |>
  mutate(class = test_tags)

simp <- geozoo::simplex(p=9)
sp <- data.frame(simp$points)
colnames(sp) <- c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9")
sp$class = ""
f_nn_v_p_s <- bind_rows(sp, f_nn_v_p) |>
  mutate(class = ifelse(class %in% c("T-shirt/top",
                                     "Pullover",
                                     "Shirt",
                                     "Coat"), class, "Other")) |>
  mutate(class = factor(class, levels=c("Other", "T-shirt/top",
                                        "Pullover",
                                        "Shirt",
                                        "Coat"))) 
labels <- c("0" , "1", "2", "3", "4", "5", "6", "7", "8", "9",
                rep("", 10000))

animate_xy(f_nn_v_p_s[,1:9], col = f_nn_v_p_s$class, 
           axes = "off", cex=0.2,
           edges = as.matrix(simp$edges),
           edges.width = 0.05,
           obs_labels = labels, 
           palette = "Viridis")

render_gif(f_nn_v_p_s[,1:9],
           grand_tour(),
           display_xy( 
             col=f_nn_v_p_s$class, 
             cex=0.2,
             palette = "Viridis",
             axes="off",
             obs_labels = labels,
             edges = as.matrix(simp$edges),
             edges.width = 0.05), 
           gif_file="gifs/fashion_confusion_gt.gif",
           frames=500,
           loop=FALSE
)

An animation of 2D scatterplots showing data in a 10D simplex with edges in light grey. Each point represents a fashion item, colored by category, and shows a tetrahedron shape containing vertices 0, 2, 3, 4, 6. Labeled vertices numbered 0 through 9 correspond to fashion item categories: T-shirt/top (blue), Pullover (green), Shirt (yellow), Coat (light green), and Other (purple). Dense connections exist between closely related categories, such as Pullover, Shirt, and Coat, suggesting misclassification potential. The 'Other' category (purple) is more dispersed across the plot, linking pairs of nodes. A legend in the top-right corner identifies the color-coded categories. — (a) Input space

The tour of the class probabilities shows that the model is often confused between two classes, this appears as points falling along one edge in the simplex. In particular for the highlighted categories we can notice some interesting patterns, where pairs of classes get confused with each other. We also see some three-way confusions, these are observations that fall on one surface triangle defined via three corners of the simplex, for example between Pullover, Shirt and Coat.

For this data using explainers like SHAP is not so interesting, since the individual pixel contribution to a prediction are typically not of interest. With image classification a next step might be to further investigate which part of the image is important for a prediction, and this can be visualized as a heat map placed over the original image. This is especially interesting in the case of difficult or misclassified images. This however is beyond the scope of this book.

There are subsets of confusion, as can be seen by both confusion matrix and votes visualisation. T-shirt/top, Shirt, Pullover, Coat, Dress are mutually confused. And some pairs are commonly confused, e.g. Sandal and Sneaker. In the votes matrix visualisation this can be seen by the concentration of points in a tetrahedron and the concentrations along some edges.

Exercises

The problem with the NN model fitted to the penguins is that the Gentoo are poorly classified, when they should be perfectly predictable due to the big gap between class clusters. Re-fit the NN, with two nodes in the hidden layer) to the penguins data, to find a better model that appropriately perfectly predicts Gentoo penguins. (You might find the easiest way to get a better model is changing the random seed and re-fitting, as illustrated in Wickham et al. (2015), Figures 21-24.) Support this by plotting the model (using the hidden layer), and the predictive probabilities as a ternary plot. Do the SHAP values also support that bd plays a stronger role in your best model? (bd is the main variable for distinguishing Gentoo’s from the other species, particularly when used with fl or bl.)
For the fashion MNIST data we have seen that certain categories are more likely to be confused with each other. Select a subset of the data including only the categories Ankle boot, Sneaker and Sandal and see if you can reproduce the analysis of the penguins data in this chapter with this subset.
Can you fit a neural network that can predict the class in the fake tree data? Because the data is noisy and we do not have that many observations, it can be easy to overfit the data. Once you find a setting that works, think about what aspects of the model might be interesting for visualization. What comparisons with a random forest model could be of interest?
The sketches data could also be considered a classic image classification problem, and we have seen that we can get a reasonable accuracy with a random forest model. Because we only have a smaller number of observations (compared to the fashion MNIST data) when fitting a neural network we need to be very careful not to overfit the training data. Try fitting a flat neural network (similar to what we did for the fashion MNIST data) and check the test accuracy of the model.
Challenge: try to design a more accurate neural network for the sketches data. Here you can investigate using a convolutional neural network in combination with data augmentation. In addition, using batch normalization should improve the model performance.

Abbott, E. (1884). Flatland: A Romance of Many Dimensions. Dover Publications.

Ahlberg, C., Williamson, C., & Shneiderman, B. (1991). Dynamic Queries for Information Exploration: An Implementation and Evaluation. ACM CHI ‘92 Conference Proceedings, 619–626.

Allaire, J., & Chollet, F. (2023). keras: R interface to Keras. https://CRAN.R-project.org/package=keras

Anderson, E. (1957). A Semigraphical Method for the Analysis of Complex Problems. Proceedings of the National Academy of Science, 13, 923–927.

Andrews, D. F. (1972). Plots of High-dimensional Data. Biometrics, 28, 125–136.

Andrews, D. F., Gnanadesikan, R., & Warner, J. L. (1971). Transformations of Multivariate Data. Biometrics, 27, 825–840.

Anselin, L., & Bao, S. (1997). Exploratory Spatial Data Analysis Linking SpaceStat and ArcView. In M. M. Fischer & A. Getis (Eds.), Recent Developments in Spatial Analysis (pp. 35–59). Springer.

Arnold, J. B. (2024). ggthemes: Extra Themes, Scales and Geoms for ggplot2. https://jrnold.github.io/ggthemes/

ASA Statistical Graphics Section. (2023). Video Library. https://community.amstat.org/jointscsg-section/media/videos.

Asimov, D. (1985). The Grand Tour: A Tool for Viewing Multidimensional Data. SIAM Journal of Scientific and Statistical Computing, 6(1), 128–143.

Auguie, B. (2017). gridExtra: Miscellaneous Functions for grid Graphics. https://CRAN.R-project.org/package=gridExtra

Australian Bureau of Agricultural and Resource Economics and Sciences. (2018). Forests of Australia. https://www.agriculture.gov.au/abares/forestsaustralia/forest-data-maps-and-tools/spatial-data/forest-cover

Batsaikhan, Z., Cook, D., & Laa, U. (2023). Frame to Frame Interpolation for High-dimensional Data Visualisation using the woylier package. https://doi.org/10.48550/arXiv.2311.08181

Batsaikhan, Z., Cook, D., & Laa, U. (2024). woylier: Alternative Tour Frame Interpolation Method. https://numbats.github.io/woylier/

Becker, R. A., & Chambers, J. M. (1984). S: An Environment for Data Analysis and Graphics. Wadsworth.

Becker, R. A., & Cleveland, W. S. (1988). Brushing Scatterplots (W. S. Cleveland & M. E. McGill, Eds.; pp. 201–224). Wadsworth.

Becker, R., Cleveland, W. S., & Shyu, M.-J. (1996). The Visual Design and Control of Trellis Displays. Journal of Computational and Graphical Statistics, 6(1), 123–155.

Bederson, B. B., & Schneiderman, B. (2003). The Craft of Information Visualization: Readings and Reflections. Morgan Kaufmann.

Bellman, R. (1961). Adaptive Control Processes: A Guided Tour.

Bickel, P. J., Kur, G., & Nadler, B. (2018). Projection Pursuit in High Dimensions. Proceedings of the National Academy of Sciences, 115, 9151–9156. https://doi.org/10.1073/pnas.1801177115

Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.

Boehmke, B., & Greenwell, B. M. (2019). Hands-On Machine Learning with R (1st ed.). Chapman; Hall/CRC. https://doi.org/10.1201/9780367816377

Boelaert, J., Ollion, E., & Sodoge, J. (2022). aweSOM: Interactive Self-Organizing Maps. https://CRAN.R-project.org/package=aweSOM

Bonneau, G.-P., Ertl, T., & Nielson, G. M. (Eds.). (2006). Scientific Visualization: The Visual Extraction of Knowledge from Data. Springer.

Borg, I., & Groenen, P. J. F. (2005). Modern Multidimensional Scaling. Springer.

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32.

Breiman, L., Cutler, A., Liaw, A., & Wiener, M. (2022). randomForest: Breiman and Cutler’s Random Forests for classification and Regression. https://www.stat.berkeley.edu/~breiman/RandomForests/

Breiman, L., Friedman, J., Olshen, C., & Stone, C. (1984). Classification and Regression Trees. Wadsworth; Brooks/Cole.

Buja, A. (1996). Interactive Graphical Methods in the Analysis of Customer Panel Data: Comment. Journal of Business & Economic Statistics, 14(1), 128–129.

Buja, A., & Asimov, D. (1986). Grand Tour Methods: An Outline. Computing Science and Statistics, 17, 63–67.

Buja, A., Asimov, D., Hurley, C., & McDonald, J. A. (1988). Elements of a Viewing Pipeline for Data Analysis (W. S. Cleveland & M. E. McGill, Eds.; pp. 277–308). Wadsworth.

Buja, A., Cook, D., Asimov, D., & Hurley, C. (2005). Computational Methods for High-Dimensional Rotations in Data Visualization. In C. R. Rao, E. J. Wegman, & J. L. Solka (Eds.), Handbook of Statistics: Data Mining and Visualization (pp. 391–414). Elsevier/North-Holland.

Buja, A., Cook, D., & Swayne, D. (1996). Interactive High-Dimensional Data Visualization. Journal of Computational and Graphical Statistics, 5(1), 78–99.

Buja, A., Hurley, C., & McDonald, J. A. (1986). A Data Viewer for Multivariate Data. Computing Science and Statistics, 17(1), 171–174.

Buja, A., & Swayne, D. F. (2002). Visualization Methodology for Multidimensional Scaling. Journal of Classification, 19(1), 7–43.

Buja, A., Swayne, D. F., Littman, M. L., Dean, N., Hofmann, H., & Chen, L. (2008). Data Visualization with Multidimensional Scaling. Journal of Computational and Graphical Statistics, 17(2), 444–472. https://doi.org/10.1198/106186008X318440

Buja, A., & Tukey, P. (Eds.). (1991). Computing and Graphics in Statistics. Springer-Verlag.

Butler, A., Hoffman, P., Smibert, P., Papalexi, E., & Satija, R. (2018). Integrating Single-Cell Transcriptomic Data Across Different Conditions, Technologies, and Species. Nature Biotechnology, 36, 411–420. https://doi.org/10.1038/nbt.4096

Card, S. K., Mackinlay, J. D., & Schneiderman, B. (1999). Readings in Information Visualization. Morgan Kaufmann Publishers.

Carr, D. B., Wegman, E. J., & Luo, Q. (1996). ExplorN: Design Considerations Past and Present (Technical Report No. 129). Center for Computational Statistics, George Mason University.

Chatfield, C. (1995). Problem Solving: A Statistician’s Guide. Chapman; Hall/CRC Press.

Chen, C.-H., Härdle, W., & Unwin, A. (Eds.). (2007). Handbook of Data Visualization. Springer. https://doi.org/10.1007/978-3-540-33037-0

Chen, Z., Wang, C., Huang, S., Shi, Y., & Xi, R. (2024). Directly Selecting Cell-type Marker Genes for Single-cell Clustering Analyses. Cell Reports Methods, 4, 100810. https://doi.org/10.1016/j.crmeth.2024.100810

Cheng, B., & Titterington, M. (1994). Neural Networks: A Review from a Statistical Perspective. Statistical Science, 9(1), 2–30.

Cheng, J., & Sievert, C. (2023). crosstalk: Inter-Widget Interactivity for HTML Widgets. https://rstudio.github.io/crosstalk/

Chernoff, H. (1973). The Use of Faces to Represent Points in \(k\)-dimensional Space Graphically. Journal of the American Statistical Association, 68, 361–368.

Cleveland, W. S. (1979). Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of American Statistics Association, 74, 829–836.

Cleveland, W. S. (1993). Visualizing Data. Hobart Press.

Cleveland, W. S., & McGill, M. E. (Eds.). (1988). Dynamic Graphics for Statistics. Wadsworth.

Cook, D., & Buja, A. (1997). Manual Controls For High-Dimensional Data Projections. Journal of Computational and Graphical Statistics, 6(4), 464–480.

Cook, D., Buja, A., & Cabrera, J. (1993). Projection Pursuit Indexes Based on Orthonormal Function Expansions. Journal of Computational and Graphical Statistics, 2(3), 225–250.

Cook, D., Buja, A., Cabrera, J., & Hurley, C. (1995). Grand Tour and Projection Pursuit. Journal of Computational and Graphical Statistics, 4(3), 155–172.

Cook, D., Hofmann, H., Lee, E.-K., Yang, H., Nikolau, B., & Wurtele, E. (2007). Exploring Gene Expression Data, Using Plots. Journal of Data Science, 5(2), 151–182.

Cook, D., & Laa, U. (2025). mulgar: Functions for Pre-Processing Data for Multivariate data Visualisation using Tours. https://dicook.github.io/mulgar/

Cook, D., Lee, E.-K., Buja, A., & Wickham, H. (2006). Grand Tours, Projection Pursuit Guided Tours and Manual Controls. In C.-H. Chen, W. Härdle, & A. Unwin (Eds.), Handbook of Data Visualization. Springer. https://doi.org/10.1007/978-3-540-33037-0

Cook, D., Majure, J. J., Symanzik, J., & Cressie, N. (1996). Dynamic Graphics in a GIS: Exploring and Analyzing Multivariate Spatial Data using Linked Software. Computational Statistics: Special Issue on Computer Aided Analyses of Spatial Data, 11(4), 467–480.

Cook, D., & Swayne, D. F. (2007). Interactive and Dynamic Graphics for Data Analysis: With R and GGobi. Springer-Verlag. https://doi.org/10.1007/978-0-387-71762-3

Cortes, C., Pregibon, D., & Volinsky, C. (2003). Computational Methods for Dynamic Graphs. Journal of Computational & Graphical Statistics, 12(4), 950–970.

Cortes, C., & Vapnik, V. N. (1995). Support-Vector Networks. Machine Learning, 20(3), 273–297.

d’Ocagne, M. (1885). Coordonnées Parallèles et Axiales: Méthode de Transformation Géométrique et Procédé Nouveau de Calcul Graphique dÉduits de la Considération des Coordonnées Paralléles. Gauthier-Villars.

Dalgaard, P. (2002). Introductory Statistics with R. Springer.

Dasu, T., Swayne, D. F., & Poole, D. (2005). Grouping Multivariate Time Series: A Case Study. Proceedings of the IEEE Workshop on Temporal Data Mining: Algorithms, Theory and Applications, in Conjunction with the Conference on Data Mining, Houston, November 27, 2005, 25–32.

de Vries, A., & Ripley, B. D. (2024). ggdendro: Create Dendrograms and Tree Diagrams Using ggplot2. https://andrie.github.io/ggdendro/

Department of Environment, Land, Water & Planning. (2019). Fire Origins - Current and Historical. https://discover.data.vic.gov.au/dataset/fire-origins-current-and-historical

Department of Environment, Land, Water & Planning. (2020a). CFA - Fire Station. https://discover.data.vic.gov.au/dataset/cfa-fire-station-vmfeat-geomark_point

Department of Environment, Land, Water & Planning. (2020b). Recreation Sites. https://discover.data.vic.gov.au/dataset/recreation-sites

Diaconis, P., & Freedman, D. (1984). Asymptotics of Graphical Projection Pursuit. Annals of Statistics, 12, 793–815.

Dolnicar, S., Grün, B., & Leisch, F. (2018). Market Segmentation Analysis: Understanding it, Doing it, and Making it Useful (pp. 11–22). https://doi.org/10.1007/978-981-10-8818-6_2

Dykes, J., MacEachren, A. M., & Kraak, M.-J. (2005). Exploring Geovisualization. Elsevier.

Emerson, J. W., Green, W. A., Schloerke, B., Crowley, J., Cook, D., Hofmann, H., & Wickham, H. (2013). The Generalized Pairs Plot. Journal of Computational and Graphical Statistics, 22(1), 79–91. https://doi.org/10.1080/10618600.2012.694762

Everitt, B. S., Landau, S., Leese, M., & Stahel, D. (2011). Cluster Analysis (5th ed). John Wiley; Sons, Ltd.

Fienberg, S. E. (1979). Graphical Methods in Statistics. Journal of American Statistical Association, 33(4), 165–178.

Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7(2), 179–188. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x

Fisherkeller, M. A., Friedman, J. H., & Tukey, J. W. (1973). PRIM-9, an Interactive Multidimensional Data Display and Analysis System. https://www.youtube.com/watch?v=B7XoW2qiFUA

Fisherkeller, M. A., Friedman, J. H., & Tukey, J. W. (1974). PRIM-9, an Interactive Multidimensional Data Display and Analysis System. In W. S. Cleveland (Ed.), The collected works of john w. Tukey: Graphics 1965-1985, volume v (pp. 340–346).

Forbes, J., Cook, D., & Hyndman, R. J. (2020). Spatial modelling of the two-party preferred vote in australian federal elections: 2001–2016. Australian & New Zealand Journal of Statistics, 62(2), 168–185. https://doi.org/https://doi.org/10.1111/anzs.12292

Ford, B. J. (1992). Images of Science: A History of Scientific Illustration. The British Library.

Forgy, E. (1965). Cluster Analysis of Multivariate Data: Efficiency versus Interpretability of Classification. Biometrics, 21(3), 768–769.

Fraley, C., & Raftery, A. E. (2002). Model-based Clustering, Discriminant Analysis, Density Estimation. Journal of the American Statistical Association, 97, 611–631. https://doi.org/10.1198/016214502760047131

Fraley, C., Raftery, A. E., & Scrucca, L. (2024). Mclust: Gaussian mixture modelling for model-based clustering, classification, and density estimation. https://mclust-org.github.io/mclust/

Friedman, J. H. (1987). Exploratory Projection Pursuit. Journal of American Statistical Association, 82, 249–266.

Friedman, J. H., & Tukey, J. W. (1974). A Projection Pursuit Algorithm for Exploratory Data Analysis. IEEE Transactions on Computing C, 23, 881–889.

Friendly, M., & Denis, D. J. (2004). Milestones in the History of Thematic Cartography, Statistical Graphics, and Data Visualization. http://www.math.yorku.ca/SCS/Gallery/milestone/.

Fritsch, S., Guenther, F., & Wright, M. N. (2019). neuralnet: Training of Neural Networks. https://CRAN.R-project.org/package=neuralnet

Furnas, G. W., & Buja, A. (1994). Prosection Views: Dimensional Inference Through Sections and Projections. Journal of Computational and Graphical Statistics, 3(4), 323–385.

Gabriel, K. R. (1971). The Biplot Graphical Display of Matrices with Applications to Principal Component Analysis. Biometrika, 58, 453–467.

Gentle, J. E., Härdle, W., & Mori, Y. (Eds.). (2004). Handbook of Computational Statistics: Concepts and Methods. Springer.

Giordani, P., Ferraro, M. B., & Martella, F. (2020). An Introduction to Clustering with R. Springer Singapore. https://doi.org/10.1007/978-981-13-0553-5

Glover, D. M., & Hopke, P. K. (1992). Exploration of Multivariate Chemical Data by Projection Pursuit. Chemometrics and Intelligent Laboratory Systems, 16, 45–59.

Good, P. (2005). Permutation, Parametric, and Bootstrap Tests of Hypotheses. Springer.

Gower, J. C., & Hand, D. J. (1996). Biplots. Chapman; Hall.

Gruen, B. (2024). CRAN Task View: Cluster Analysis & Finite Mixture Models (Version 2024-08-20). https://cran.r-project.org/web/views/Cluster.html.

Hajibaba, H., Karlsson, L., & Dolnicar, S. (2016). Residents Open Their Homes to Tourists When Disaster Strikes. Journal of Travel Research, 56(8), 1065–1078.

Hansen, C., & Johnson, C. R. (2004). Visualization Handbook. Academic Press.

Hao, Y., Hao, S., Andersen-Nissen, E., III, W. M. M., Zheng, S., Butler, A., Lee, M. J., Wilk, A. J., Darby, C., Zagar, M., Hoffman, P., Stoeckius, M., Papalexi, E., Mimitou, E. P., Jain, J., Srivastava, A., Stuart, T., Fleming, L. B., Yeung, B., … Satija, R. (2021). Integrated Analysis of Multimodal Single-Cell Data. Cell. https://doi.org/10.1016/j.cell.2021.04.048

Harrison, P. (2023). langevitour: Smooth Interactive Touring of High Dimensions, Demonstrated with scRNA-Seq Data. The R Journal, 15, 206–219. https://doi.org/10.32614/RJ-2023-046

Harrison, P. (2024). Langevitour: Langevin tour. https://logarithmic.net/langevitour/

Hart, C., & Wang, E. (2024). detourr: Portable and Performant Tour Animations. https://casperhart.github.io/detourr/

Hartigan, J. A., & Kleiner, B. (1981). Mosaics for Contingency Tables. Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface, 268–273.

Hartigan, J., & Kleiner, B. (1984). A Mosaic of Television Ratings. The American Statistician, 38, 32–35.

Haslett, J., Bradley, R., Craig, P., Unwin, A., & Wills, G. (1991). Dynamic Graphics for Exploring Spatial Data with Application to Locating Global and Local Anomalies. The American Statistician, 45(3), 234–242.

Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning. Springer.

Hennig, C. (2024). fpc: Flexible Procedures for Clustering. https://CRAN.R-project.org/package=fpc

Hennig, C., Meila, M., Murtagh, F., & Rocci, R. (2015). Handbook of Cluster Analysis (1st ed.). Chapman; Hall/CRC. https://doi.org/10.1201/b19706

Hofmann, H. (2001). Graphical Tools for the Exploration of Multivariate Categorical Data. Books on Demand.

Hofmann, H. (2003). Constructing and Reading Mosaicplots. Computational Statistics and Data Analysis, 43(4), 565–580.

Hofmann, H., & Theus, M. (1998). Selection Sequences in MANET. Computational Statistics, 13(1), 77–87.

Horikoshi, M., & Tang, Y. (2018). ggfortify: Data Visualization Tools for Statistical Analysis Results. https://CRAN.R-project.org/package=ggfortify

Horikoshi, M., & Tang, Y. (2024). ggfortify: Data Visualization Tools for Statistical Analysis Results. https://github.com/sinhrks/ggfortify

Horst, A. M., Hill, A. P., & Gorman, K. B. (2022). Palmer Archipelago Penguins Data in the palmerpenguins R Package - An Alternative to Anderson’s Irises. The R Journal, 14, 244–254. https://doi.org/10.32614/RJ-2022-020

Horst, A., Hill, A., & Gorman, K. (2022). palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data. https://allisonhorst.github.io/palmerpenguins/

Hotelling, H. (1933). Analysis of a Complex of Statistical Variables into Principal Components. Journal of Educational Psychology, 24(6), 417--441. https://doi.org/10.1037/h0071325

Huber, P. J. (1985). Projection Pursuit (with discussion). Annals of Statistics, 13, 435–525.

Hurley, C. (1987). The Data Viewer: An Interactive Program for Data Analysis [PhD thesis]. University of Washington.

Iannone, R., Cheng, J., Schloerke, B., Hughes, E., Lauer, A., & Seo, J. (2024). Gt: Easily create presentation-ready display tables. https://gt.rstudio.com

Ihaka, R., & Gentleman, R. (1996). R: A Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics, 5, 299–314.

Ihaka, R., Murrell, P., Hornik, K., Fisher, J. C., Stauffer, R., Wilke, C. O., McWhite, C. D., & Zeileis, A. (2024). colorspace: A Toolbox for Manipulating and Assessing Colors and Palettes. https://colorspace.R-Forge.R-project.org/

Inselberg, A. (1985). The Plane with Parallel Coordinates. The Visual Computer, 1, 69–91.

Iowa State University. (2020). ASOS-AWOS-METAR Data Download. https://mesonet.agron.iastate.edu/request/download.phtml?network=AU__ASOS

Johnson, D., & Travis, J. (2007). Flatland: The Movie. https://round-drum-w7xh.squarespace.com/our-story.

Johnson, R. A., & Wichern, D. W. (2002). Applied Multivariate Statistical Analysis (5th ed). Prentice-Hall.

Jolliffe, I. T., & Cadima, J. (2016). Principal Component Analysis: A Review and Recent Developments. Philosophical Transactions of the Royal Society A, 374, 20150202. https://doi.org/10.1098/rsta.2015.0202

Jones, M. C., & Sibson, R. (1987). What is Projection Pursuit? (With discussion). Journal of the Royal Statistical Society, Series A, 150, 1–36.

Kandanaarachchi, S. (2022). dobin: Dimension Reduction for Outlier Detection. https://sevvandi.github.io/dobin/

Kandanaarachchi, S., & Hyndman, R. J. (2021). Dimension Reduction for Outlier Detection Using DOBIN. Journal of Computational and Graphical Statistics, 30(1), 204–219. https://doi.org/https://doi.org/10.1080/10618600.2020.1807353

Kassambara, A. (2017). Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning. STHDA.

Kassambara, A. (2023). ggpubr: ggplot2 Based Publication Ready Plots. https://rpkgs.datanovia.com/ggpubr/

Kohonen, T. (2001). Self-Organizing Maps (3rd ed). Springer.

Koschat, M. A., & Swayne, D. F. (1996). Interactive Graphical Methods in the Analysis of Customer Panel Data (with discussion). Journal of Business and Economic Statistics, 14(1), 113–132.

Krijthe, J. (2023). Rtsne: T-Distributed Stochastic Neighbor Embedding using a Barnes-hut Implementation. https://github.com/jkrijthe/Rtsne

Kruskal, J. B. (1964a). Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis. Psychometrika, 29, 1–27.

Kruskal, J. B. (1964b). Nonmetric Multidimensional Scaling: A Numerical Method. Psychometrika, 29, 115–129.

Kruskal, J. B., & Wish, M. (1978). Multidimensional Scaling. Sage Publications.

Kuhn, M., & Wickham, H. (2020). tidymodels: A Collection of Packages for Modeling and Machine Learning using tidyverse Principles. https://www.tidymodels.org

Kuhn, M., & Wickham, H. (2024). tidymodels: Easily Install and Load the Tidymodels Packages. https://tidymodels.tidymodels.org

Laa, U., Aumann, A., Cook, D., & Valencia, G. (2023). New and Simplified Manual Controls for Projection and Slice Tours, With Application to Exploring Classification Boundaries in High Dimensions. Journal of Computational and Graphical Statistics, 32(3), 1229–1236. https://doi.org/10.1080/10618600.2023.2206459

Laa, U., Cook, D., & Lee, S. (2022). Burning Sage: Reversing the Curse of Dimensionality in the Visualization of High-Dimensional Data. Journal of Computational and Graphical Statistics, 31(1), 40–49. https://doi.org/10.1080/10618600.2021.1963264

Laa, U., Cook, D., & Valencia, G. (2020a). A Slice Tour for Finding Hollowness in High-Dimensional Data. Journal of Computational and Graphical Statistics, 29(3), 681–687. https://doi.org/10.1080/10618600.2020.1777140

Laa, U., Cook, D., & Valencia, G. (2020b). A Slice Tour for Finding Hollowness in High-Dimensional Data. Journal of Computational and Graphical Statistics, 29(3), 681–687. https://doi.org/10.1080/10618600.2020.1777140

Lancaster, H. O. (1965). The Helmert Matrices. The American Mathematical Monthly, 72(1), 4–12.

Laurent, S. (2023). cxhull: Convex Hull. https://github.com/stla/cxhull

Lee, E.-K. (2018). PPtreeViz: An R package for Visualizing Projection Pursuit Classification Trees. Journal of Statistical Software, 83(8), 1–30. https://doi.org/10.18637/jss.v083.i08

Lee, E.-K., & Cook, D. (2009). A Projection Pursuit Index for Large \(p\) Small \(n\) Data. Statistics and Computing, 20, 381–392. https://doi.org/10.1007/s11222-009-9131-1

Lee, E.-K., Cook, D., Klinke, S., & Lumley, T. (2005). Projection Pursuit for Exploratory Supervised Classification. Journal of Computational and Graphical Statistics, 14(4), 831–846.

Lee, S. (2021). Liminal: Multivariate data visualization with tours and embeddings. https://github.com/sa-lee/liminal/

Lee, S., Cook, D., Silva, N. da, Laa, U., Spyrison, N., Wang, E., & Zhang, H. S. (2022). The State-of-the-Art on Tours for Dynamic Visualization of High-Dimensional Data. WIREs Computational Statistics, 14(4), e1573. https://doi.org/10.1002/wics.1573

Lee, Y. D., Cook, D., Park, J., & Lee, E.-K. (2013). PPtree: Projection Pursuit Classification Tree. Electronic Journal of Statistics, 7(none), 1369–1386. https://doi.org/10.1214/13-EJS810

Leisch, F. (2008). Visualizing Cluster Analysis and Finite Mixture Models. In Handbook of Data Visualization (pp. 561–587). Springer. https://doi.org/10.1007/978-3-540-33037-0_22

Li, M., Zhao, Z., & Scheidegger, C. (2020). Visualizing Neural Networks with the Grand Tour. Distill. https://doi.org/10.23915/distill.00025

Liaw, A., & Wiener, M. (2002). Classification and Regression by randomForest. R News, 2(3), 18–22. https://CRAN.R-project.org/doc/Rnews/

Littman, M. L., Swayne, D. F., Dean, N., & Buja, A. (1992). Visualizing the Embedding of Objects in Euclidean Space. Computing Science and Statistics: Proceedings of the 24th Symposium on the Interface, 208–217.

Lloyd, S. (1982). Least Squares Quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129–137. https://doi.org/10.1109/TIT.1982.1056489

Longley, P. A., Maguire, D. J., Goodchild, M. F., & Rhind, D. W. (2005). Geographic Information Systems and Science. John Wiley & Sons.

Loperfido, N. (2018). Skewness-Based Projection Pursuit: A Computational Approach. Computational Statistics & Data Analysis, 120, 42–57. https://doi.org/https://doi.org/10.1016/j.csda.2017.11.001

Maaten, L. van der, & Hinton, G. (2008). Visualizing Data Using t-SNE. J. Mach. Learn. Res., 9(Nov), 2579–2605. http://www.jmlr.org/papers/v9/vandermaaten08a.html

MacQueen, J. B. (1967). Some Methods for Classification and Analysis of Multivariate Observations. In L. M. L. Cam & J. Neyman (Eds.), Proc. Of the fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281–297). University of California Press.

Maindonald, J., & Braun, J. (2003). Data Analysis and Graphics using R - an Example-based Approach. Cambridge University Press.

Martin, E. (1965). Flatland. http://www.der.org/films/flatland.html.

Mayer, M. (2024). shapviz: SHAP visualizations. https://CRAN.R-project.org/package=shapviz

Mayer, M., & Watson, D. (2023). kernelshap: Kernel SHAP. https://CRAN.R-project.org/package=kernelshap

McFarlane, M., & Young, F. W. (1994). Graphical Sensitivity Analysis for Multidimensional Scaling. Journal of Computational and Graphical Statistics, 3, 23–33.

McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. http://arxiv.org/abs/1802.03426

McNeil, D. (1977). Interactive Data Analysis. John Wiley; Sons.

McVicar, T. (2011). Near-Surface Wind Speed. v10. CSIRO. Data Collection. https://doi.org/10.25919/5c5106acbcb02

Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., & Leisch, F. (2024). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. https://CRAN.R-project.org/package=e1071

Milborrow, S. (2024). rpart.plot: Plot rpart Models: An Enhanced Version of plot.rpart. http://www.milbo.org/rpart-plot/index.html

Mock, T. (2023). gtExtras: Extending gt for beautiful HTML tables. https://github.com/jthomasmock/gtExtras

Molnar, C. (2025). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (3rd ed). https://christophm.github.io/interpretable-ml-book/.

Moon, K. R., Dijk, D. van, Wang, Z., Gigante, S., Burkhardt, D. B., Chen, W. S., Yim, K., Elzen, A. van den, Hirn, M. J., Coifman, R. R., Ivanova, N. B., Wolf, G., & Krishnaswamy, S. (2019). Visualizing Structure and Transitions for Biological Data Exploration. Nature Biotechnology, 37, 1482–1492. https://doi.org/10.1038/s41587-019-0336-3

Murrell, P. (2005). R Graphics. Chapman; Hall/CRC.

OpenStreetMap contributors. (2020). Planet Dump Retrieved from https://planet.osm.org . https://www.openstreetmap.org.

Pearson, K. (1901). LIII. On Lines and Planes of Closest Fit to Systems of Points in Space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), 559–572. https://doi.org/10.1080/14786440109462720

Pedersen, T. L. (2024). patchwork: The Composer of Plots. https://patchwork.data-imaginist.com

Perisic, I., & Posse, C. (2005). Projection Pursuit Indices Based on the Empirical Distribution Function. Journal of Computational and Graphical Statistics, 14(3), 700–715. https://doi.org/10.1198/106186005X69440

Polzehl, J. (1995). Projection Pursuit Discriminant Analysis. Computational Statistics and Data Analysis, 20, 141–157.

Posse, C. (1992). Projection Pursuit Discriminant Analysis for Two Groups. Communications in Statistics, Part A - Theory and Methods, 21, 1–19.

Posse, C. (1995). Tools for Two-dimensional Projection Pursuit. Journal of Computational and Graphical Statistics, 4(2), 83–100.

P-Tree System. (2020). JAXA Himawari Monitor - User’s Guide. https://www.eorc.jaxa.jp/ptree/userguide.html

R Core Team. (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. https://www.R-project.org/

Rao, C. R. (1948). The Utilization of Multiple Measurements in Problems of Biological Classification (with discussion). Journal of the Royal Statistical Society, Series B, 10, 159–203.

Rao, C. R. (Ed.). (1993). Handbook of Statistics, Vol. 9. Elsevier Science Publishers.

Rao, C. R., Wegman, E. J., & Solka, J. L. (Eds.). (2006). Handbook of Statistics: Data Mining and Visualization. Elsevier/North-Holland.

Ripley, B. (1996). Pattern Recognition and Neural Networks. Cambridge University Press.

Ripley, B. (2023). nnet: Feed-Forward Neural Networks and Multinomial Log-Linear Models. http://www.stats.ox.ac.uk/pub/MASS4/

Ripley, B., & Venables, B. (2024). MASS: Support functions and datasets for venables and ripley’s MASS. http://www.stats.ox.ac.uk/pub/MASS4/

Robinson, D., Hayes, A., & Couch, S. (2024). broom: Convert Statistical Objects into Tidy Tibbles. https://CRAN.R-project.org/package=broom

Rothkopf, E. Z. (1957). A Measure of Stimulus Similarity and Errors in Some Paired-Associate Learning Tasks. Journal of Experimental Psychology, 2, 94–101. https://psycnet.apa.org/doi/10.1037/h0041867

Roweis, S. T., & Saul, L. K. (2000). Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290(5500), 2323–2326. https://doi.org/10.1126/science.290.5500.2323

Satija, R., Farrell, J. A., Gennert, D., Schier, A. F., & Regev, A. (2015). Spatial Reconstruction of Single-Cell Gene Expression Data. Nature Biotechnology, 33, 495–502. https://doi.org/10.1038/nbt.3192

Savageau, D., & Boyer, R. (1993). Places Rated Almanac: Your Guide to Finding the Best Places to Live in North America. Prentce Hall Travel.

Schloerke, B. (2016). geozoo: Zoo of Geometric Objects. http://schloerke.github.io/geozoo/

Schloerke, B., Cook, D., Larmarange, J., Briatte, F., Marbach, M., Thoen, E., Elberg, A., & Crowley, J. (2024). GGally: Extension to ggplot2. https://ggobi.github.io/ggally/

Schloerke, B., Wickham, H., Cook, D., & Hofmann, H. (2016). Escape from Boxland. The R Journal, 8, 243–257.

Scrucca, L., Fraley, C., Murphy, T. B., & Raftery, A. E. (2023). Model-Based Clustering, Classification, and Density Estimation Using mclust in R. Chapman; Hall/CRC. https://doi.org/10.1201/9781003277965

Shepard, R. N. (1962). The Analysis of Proximities: Multidimensional Scaling with an Unknown Distance Function, I and II. Psychometrika, 27, 125-139 and 219-246.

Sievert, C. (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman; Hall/CRC. https://plotly-r.com

Sievert, C., Parmer, C., Hocking, T., Chamberlain, S., Ram, K., Corvellec, M., & Despouy, P. (2024). plotly: Create Interactive Web Graphics via plotly.js. https://plotly-r.com

Sjoberg, D. D., Larmarange, J., Curry, M., Lavery, J., Whiting, K., & Zabor, E. C. (2024). Gtsummary: Presentation-ready data summary and analytic result tables. https://github.com/ddsjoberg/gtsummary

Sjoberg, D. D., Whiting, K., Curry, M., Lavery, J. A., & Larmarange, J. (2021). Reproducible Summary Tables with the gtsummary Package. The R Journal, 13, 570–580. https://doi.org/10.32614/RJ-2021-053

Slowikowski, K. (2024). Ggrepel: Automatically position non-overlapping text labels with ggplot2. https://ggrepel.slowkow.com/

Sparks, A. H., Carroll, J., Goldie, J., Marchiori, D., Melloy, P., Padgham, M., Parsonage, H., & Pembleton, K. (2020). bomrang: Australian government bureau of meteorology (BOM) data client. https://CRAN.R-project.org/package=bomrang

Spence, R. (2007). Information Visualization: Design for Interaction. Prentice Hall.

Stauffer, R., Mayr, G. J., Dabernig, M., & Zeileis, A. (2009). Somewhere over the Rainbow: How to Make Effective Use of Colors in Meteorological Visualizations. Bulletin of the American Meteorological Society, 96(2), 203–216. https://doi.org/10.1175/BAMS-D-13-00155.1

Stuart, T., Butler, A., Hoffman, P., Hafemeister, C., Papalexi, E., III, W. M. M., Hao, Y., Stoeckius, M., Smibert, P., & Satija, R. (2019). Comprehensive Integration of Single-Cell Data. Cell, 177, 1888–1902. https://doi.org/10.1016/j.cell.2019.05.031

Sutherland, P., Rossini, A., Lumley, T., Lewin-Koh, N., Dickerson, J., Cox, Z., & Cook, D. (2000). Orca: A Visualization Toolkit for High-Dimensional Data. Journal of Computational and Graphical Statistics, 9(3), 509–529. https://doi.org/10.1080/10618600.2000.10474896

Swayne, D. F., Buja, A., & Temple Lang, D. (2004). Exploratory Visual Analysis of Graphs in GGobi. In J. Antoch (Ed.), CompStat: Proceedings in computational statistics, 16th symposium. Physica-Verlag.

Swayne, D. F., Cook, D., & Buja, A. (1992). XGobi: Interactive Dynamic Graphics in the X Window System with a Link to S. American Statistical Association 1991 Proceedings of the Section on Statistical Graphics, 1–8.

Swayne, D. F., Cook, D., & Buja, A. (1998). XGobi: Interactive Dynamic Data Visualization in the X Window System. Journal of Computational and Graphical Statistics, 7(1), 113–130. https://doi.org/10.1080/10618600.1998.10474764

Swayne, D. F., & Klinke, S. (1998). Editorial Commentary. Computational Statistics: Special Issue on The Use of Interactive Graphics, 14(1).

Swayne, D. F., Temple Lang, D., Buja, A., & Cook, D. (2003). GGobi: Evolving from XGobi into an Extensible Framework for Interactive Data Visualization. Computational Statistics & Data Analysis, 43, 423–444.

Swayne, D., & Buja, A. (1998). Missing Data in Interactive High-Dimensional Data Visualization. Computational Statistics, 13(1), 15–26.

Symanzik, J. (2002). New Applications of the Image Grand Tour. Computing Science and Statistics, 34, 500--512. https://math.usu.edu/symanzik/papers/2002_interface.pdf

Symanzik, J. (2004). Interactive and Dynamic Graphics. In J. E. Gentle, W. Härdle, & Y. Mori (Eds.), Handbook of Computational Statistics: Concepts and Methods (pp. 293–336). Springer.

Takatsuka, M., & Gahegan, M. (2002). GeoVISTA Studio: A Codeless Visual Programming Environment for Geoscientific Data Analysis and Visualization. The Journal of Computers and Geosciences, 28(10), 1131–1144.

Tang, Y., Horikoshi, M., & Li, W. (2016). ggfortify: Unified Interface to Visualize Statistical Result of Popular R Packages. The R Journal, 8(2), 474–485. https://doi.org/10.32614/RJ-2016-060

Tarpey, T., Li, L., & Flury, B. (1995). Principal Points and Self-Consistent Points of Elliptical Distributions. The Annals of Statistics, 23, 103–112.

Temple Lang, D., Swayne, D., Wickham, H., & Lawrence, M. (2006). rggobi: An Interface between R and GGobi. http://www.R-project.org.

Tenenbaum, J. B., Silva, V. de, & Langford, J. C. (2000). A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290(5500), 2319–2323. https://doi.org/10.1126/science.290.5500.2319

Therneau, T., & Atkinson, B. (2023). rpart: Recursive Partitioning and Regression trees. https://github.com/bethatkinson/rpart

Theus, M. (2002). Interactive Data Visualization Using Mondrian. Journal of Statistical Software, 7(11), http://www.jstatsoft.org.

Theus, M., Hofmann, H., & Wilhelm, A. F. X. (1998). Selection Sequences - Interactive Analysis of Massive Data Sets. Computing Science and Statistics, 29(1), 439–444.

Thompson, G. L. (1993). Generalized Permutation Polytopes and Exploratory Graphical Methods for Ranked Data. The Annals of Statistics, 21, 1401–1430.

Tierney, L. (1991). LispStat: An Object-Orientated Environment for Statistical Computing and Dynamic Graphics. John Wiley & Sons.

Tierney, N., & Cook, D. (2023a). Expanding Tidy Data Principles to Facilitate Missing Data Exploration, Visualization and Assessment of Imputations. Journal of Statistical Software, 105(7), 1–31. https://doi.org/10.18637/jss.v105.i07

Tierney, N., & Cook, D. (2023b). Expanding Tidy Data Principles to Facilitate Missing Data Exploration, Visualization and Assessment of Imputations. Journal of Statistical Software, 105(7), 1–31. https://doi.org/10.18637/jss.v105.i07

Tierney, N., Cook, D., McBain, M., & Fay, C. (2024). naniar: Data Structures, Summaries, and Visualisations for Missing Data. https://github.com/njtierney/naniar

Torgerson, W. S. (1952). Multidimensional Scaling. 1. Theory and Method. Psychometrika, 17, 401–419.

Tufte, E. (1983). The Visual Display of Quantitative Information. Graphics Press.

Tufte, E. (1990). Envisioning Information. Graphics Press.

Tukey, J. W. (1965). The Technical Tools of Statistics. The American Statistician, 19, 23–28.

Unwin, A. R., Hawkins, G., Hofmann, H., & Siegl, B. (1996). Interactive Graphics for Data Sets with Missing Values - MANET. Journal of Computational and Graphical Statistics, 5(2), 113–122.

Unwin, A., Hofmann, H., & Wilhelm, A. (2002). Direct Manipulation Graphics for Data Mining. Journal of Image and Graphics, 2(1), 49–65.

Unwin, A., Theus, M., & Hofmann, H. (2006). Graphics of Large Datasets: Visualizing a Million. Springer.

Unwin, A., Volinsky, C., & Winkler, S. (2003). Parallel Coordinates for Exploratory Modelling Analysis. Comput. Stat. Data Anal., 43(4), 553–564. https://doi.org/{\tt http://dx.doi.org/10.1016/S0167-9473(02)00292-X}

Urbanek, S., & Theus, M. (2003). iPlots: High Interaction Graphics for R. In K. Hornik, F. Leisch, & A. Zeileis (Eds.), Proceedings of the 3rd international workshop on distributed statistical computing (DSC 2003).

Vaidyanathan, R., Xie, Y., Allaire, J., Cheng, J., Sievert, C., & Russell, K. (2023). Htmlwidgets: HTML widgets for r. https://github.com/ramnathv/htmlwidgets

van den Boogaart, K. G., Tolosana-Delgado, R., & Bren, M. (2024). compositions: Compositional Data Analysis. http://www.stat.boogaart.de/compositions/

van der Maaten, L. J. P. (2014). Accelerating t-SNE using Tree-Based lgorithms. Journal of Machine Learning Research, 15, 3221–3245.

van der Maaten, L. J. P., & Hinton, G. E. (2008). Visualizing High-Dimensional Data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.

Vapnik, V. N. (1999). The Nature of Statistical Learning Theory. Springer.

Velleman, P. F., & Velleman, A. Y. (1985). Data Desk Handbook. Data Description, Inc.

Venables, W. N., & Ripley, B. (2002). Modern Applied Statistics with S. Springer-Verlag. https://www.stats.ox.ac.uk/pub/MASS4/

Wainer, H. (2000). Visual Revelations (2nd ed). LEA, Inc.

Wainer, H., & Spence, I. (eds). (2005a). The Commercial and Political Atlas, Representing, by means of Stained Copper-Plate Charts, The Progress of the Commerce, Revenues, Expenditure, and Debts of England, during the whole of the Eighteenth Century, by William Playfair. Cambridge University Press.

Wainer, H., & Spence, I. (eds). (2005b). The Statistical Breviary; Shewing on a Principle entirely new, the resources of every state and kingdom in Europe; illustrated with Stained Copper-Plate Charts, representing the physical powers of each distinct nation with ease and perspicuity by William Playfair. Cambridge University Press.

Wang, P. C. C. (Ed.). (1978). Graphical Representation of Multivariate Data. Academic Press.

Wang, Y., Huang, H., Rudin, C., & Shaposhnik, Y. (2021). Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMap, and PaCMAP for Data Visualization. Journal of Machine Learning Research, 22(201), 1–73. http://jmlr.org/papers/v22/20-1061.html

Wegman, E. (1990). Hyperdimensional Data Analysis Using Parallel Coordinates. Journal of American Statistics Association, 85, 664–675.

Wegman, E. J. (1991). The Grand Tour in \(k\)-Dimensions (Technical Report No. 68). Center for Computational Statistics, George Mason University.

Wegman, E. J., & Carr, D. B. (1993). Statistical Graphics and Visualization (C. R. Rao, Ed.; pp. 857–958). Elsevier Science Publishers.

Wegman, E. J., Poston, W. L., & Solka, J. L. (1998). Image Grand Tour. Automatic Target Recognition VIII - Proceedings of SPIE, 3371, 286–294.

Wehrens, R., & Buydens, L. M. C. (2007). Self- and Super-Organizing Maps in R: The kohonen package. Journal of Statistical Software, 21(5), 1–19. https://doi.org/10.18637/jss.v021.i05

Wehrens, R., & Kruisselbrink, J. (2018). Flexible Self-Organizing Maps in kohonen 3.0. Journal of Statistical Software, 87(7), 1–18. https://doi.org/10.18637/jss.v087.i07

Wehrens, R., & Kruisselbrink, J. (2023). Kohonen: Supervised and unsupervised self-organising maps. https://CRAN.R-project.org/package=kohonen

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org

Wickham, H. (2022). classifly: Explore Classification Models in High Dimensions. http://had.co.nz/classifly

Wickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., Woo, K., Yutani, H., Dunnington, D., & van den Brand, T. (2024). ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://ggplot2.tidyverse.org

Wickham, H., & Cook, D. (2025). tourr: Tour Methods for Multivariate Data Visualisation. https://github.com/ggobi/tourr

Wickham, H., Cook, D., & Hofmann, H. (2015). Visualizing Statistical Models: Removing the Blindfold. Statistical Analysis and Data Mining: The ASA Data Science Journal, 8(4), 203–225. https://doi.org/10.1002/sam.11271

Wickham, H., Cook, D., Hofmann, H., & Buja, A. (2011). tourr: An R Package for Exploring Multivariate Data with Projections. Journal of Statistical Software, 40(2). https://doi.org/10.18637/jss.v040.i02

Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation. https://dplyr.tidyverse.org

Wickham, H., Hester, J., & Bryan, J. (2024). readr: Read Rectangular Text Data. https://readr.tidyverse.org

Wilhelm, A. F. X., Wegman, E. J., & Symanzik, J. (1999). Visual Clustering and Classification: The Oronsay Particle Size Data Set Revisited. Computational Statistics: Special Issue on Interactive Graphical Data Analysis, 14(1), 109–146.

Wilkinson, L. (2005). The Grammar of Graphics. Springer.

Wills, G. (1999). NicheWorks - Interactive Visualization of Very Large Graphs. Journal of Computational and Graphical Statistics, 8(2), 190–212.

Xie, Y., Hofmann, H., & Cheng, X. (2014). Reactive Programming for Interactive Graphics. Statistical Science, 29(2), 201–213. https://doi.org/10.1214/14-STS477

Young, F. W., Valero-Mora, P. M., & Friendly, M. (2006). Visual Statistics: Seeing Data with Dynamic Interactive Graphics. John Wiley & Sons.

Zeileis, A., Fisher, J. C., Hornik, K., Ihaka, R., McWhite, C. D., Murrell, P., Stauffer, R., & Wilke, C. O. (2020). colorspace: A toolbox for manipulating and assessing colors and palettes. Journal of Statistical Software, 96(1), 1–49. https://doi.org/10.18637/jss.v096.i01

Zeileis, A., Hornik, K., & Murrell, P. (2009). Escaping RGBland: Selecting Colors for Statistical Graphics. Computational Statistics & Data Analysis, 53(9), 3259–3270. https://doi.org/10.1016/j.csda.2008.11.033

Zhang, C., Ye, J., & Wang, X. (2023). A Computational Perspective on Projection Pursuit in High Dimensions: Feasible or Infeasible Feature Extraction. International Statistical Review, 91(1), 140–161. https://doi.org/10.1111/insr.12517

Zhang, H. S., Cook, D., Laa, U., Langrené, N., & Menéndez, P. (2021). Visual Diagnostics for Constrained Optimisation with Application to Guided Tours. The R Journal, 13(2), 624–641. https://doi.org/10.32614/RJ-2021-105

Zhang, H. S., Cook, D., Laa, U., Langrené, N., & Menéndez, P. (2024). ferrn: Facilitate Exploration of touRR optimisatioN. https://github.com/huizezhang-sherry/ferrn/

Zhu, H. (2024). kableExtra: Construct complex table with kable and pipe syntax. http://haozhu233.github.io/kableExtra/