4 Principal component analysis

Reducing dimensionality using principal component analysis (PCA) dates back to Pearson (1901) and Hotelling (1933). The goal is to find a smaller set of variables, \(q (< p)\), that contain as much information from the original as possible. The new set of variables, known as principal components (PCs), are linear combinations of the original variables. The PCs can be used to represent the data in a lower-dimensional space. Jolliffe & Cadima (2016) provides a contemporary review of the century of developments in PCA.

The process is essentially an optimisation procedure, although PCA has an analytical solution. It solves the problem of

\[ \max_{a_k} ~\text{Var} (Xa_k), \] where \(X\) is the \(n \times p\) data matrix, \(a_k (k=1, ..., p)\) is a 1D projection vector, called an eigenvector, and the \(\text{Var} (Xa_k)\) is called an eigenvalue. PCA is a sequential process: you first find the projection that captures the most variance, then you find a projection that is orthogonal to the first axis that captures the most variance in the \((p-1)\)-dimensional space, then you find a projection that is orthogonal to the first two axes that captures the most variance in the \((p-2)\)-dimensional space, and so on. The eigenvectors define the combination of the original variables, and the eigenvalues define the amount of total variance explained by each of the new of variables.

Visualisation can be used to assess if PCA is suitable to use to summarise the variation in the data. It is also used to detect other patterns – clustering, outliers or non-linear association – which are interesting to find and indicate that PCA is not suitable.

PCA is very broadly useful for summarising linear association by using combinations of the variables that are highly correlated. However, high correlation can also occur when there are outliers, or clustering. PCA is commonly used to detect these patterns also, although this might NOT be a reliable way to do so. To detect clustering or anomalies, using a different approach that is specifically focused on these types of patterns is advisable. To some extent capturing clustering or anomalies using PCA is actually finding problematic patterns that adversely affects conducting appropriate dimension reduction.

PCA is not very effective when the distribution of the variables is highly skewed, so it can be helpful to transform variables to make them more symmetrically distributed before conducting PCA. It is also possible to summarise different types of structure by generalising the optimisation criteria to any function of projected data, \(f(XA)\), which is called projection pursuit (PP). PP has a long history (Kruskal (1964a), Friedman & Tukey (1974), Diaconis & Freedman (1984), Jones & Sibson (1987), Huber (1985)), and there are regularly new developments (e.g. E.-K. Lee & Cook (2009), Perisic & Posse (2005), Y. D. Lee et al. (2013), Loperfido (2018), Bickel et al. (2018), C. Zhang et al. (2023)).

4.1 Determining how many dimensions

We would start by examining the data using a grand tour. The goal is to check whether there might be potential issues for PCA, such as skewness, outliers or clustering, or even non-linear dependencies.

We’ll start by showing PCA on the simulated data from Chapter 3. The scree plots shown in Figure 4.1 are produced using the mulgar::ggscree() function, and include a grey guideline to help decide how many PCs are sufficient. This guideline is generated by taking the median value from of the eigenvalues generated by doing PCA on 100 samples from a standard multivariate normal distribution. Any values much lower than this line would indicate that those PCs are not contributing to the explanation of variation. For these three simulated examples, the scree plots illustrate that PCA supports that the data are 2D, 3D and 5D respectively.

Load libraries

source("code/setup.R")

Code to make scree plots

# Conduct PCA and make the scree plot for 
# the 2-, 3- and 5-D planar data
data(plane)
data(box)
cube5d <- data.frame(cube.solid.random(p=5, n=300)$points)
colnames(cube5d) <- paste0("x", 1:5)
cube5d <- data.frame(apply(cube5d, 2, 
                           function(x) (x-mean(x))/sd(x)))
p_pca <- prcomp(plane)
b_pca <- prcomp(box)
c_pca <- prcomp(cube5d)
p_scree <- ggscree(p_pca, q = 5) + 
  theme_minimal() +
  ggtitle("(a) 2D in 5D") +
  ylim(c(0,3))
b_scree <- ggscree(b_pca, q = 5) + 
  theme_minimal() +
  ggtitle("(b) 3D in 5D") +
  ylim(c(0,3))
c_scree <- ggscree(c_pca, q = 5) + 
  theme_minimal() +
  ggtitle("(c) 5D in 5D") +
  ylim(c(0,3))

The next step is to look at the coefficients for the selected number of PCs. Table 4.1 shows the coefficients for the first two PCs of the plane data. All five variables contribute, with x1, x2, x3 contributing more to PC1, and x4, x5 contributing more to PC2. Table 4.2 shows the coefficients for the first three PCs of the box data. Variables x1, x2, x3 contribute strongly to PC1, PC2 has contributions from all variables except x3 and variables x4 and x5 contribute strongly to PC3.

Code to print PC coefficients

p_pca$rotation[,1:2] |>
  as_tibble(rownames="Variable") |> 
  gt() |>
  fmt_number(columns = c(PC1, PC2),
             decimals = 2)

Table 4.1: Coefficients for the first two PCs for the plane data.

Variable	PC1	PC2
x1	0.58	−0.06
x2	−0.55	0.21
x3	0.47	−0.41
x4	0.25	0.64
x5	−0.29	−0.62

Code to print PC coefficients

b_pca$rotation[,1:3] |>
  as_tibble(rownames="Variable") |> 
  gt() |>
  fmt_number(columns = c(PC1, PC2, PC3),
             decimals = 2)

Table 4.2: Coefficients for the first three PCs for the box data.

Variable	PC1	PC2	PC3
x1	−0.51	0.46	0.11
x2	0.51	0.46	0.00
x3	−0.65	−0.09	0.23
x4	−0.22	0.36	−0.87
x5	0.02	0.66	0.43

In each of these simulated data sets, all five variables contributed to the dimension reduction. If we added two purely noise variables to the plane data, as done in Chapter 3, the scree plot in Figure 4.1 would indicate that the data is now 4D, and we would get a different interpretation of the coefficients from the PCA, see Table 4.3. We see that PC1 and PC2 are approximately the same as before, with main variables being (x1, x2, x3) and (x4, x5) respectively. PC3 and PC4 are both x6 and x7.

Code to make scree plot

set.seed(5143)
plane_noise <- plane
plane_noise$x6 <- rnorm(100)
plane_noise$x7 <- rnorm(100)
plane_noise <- data.frame(apply(plane_noise, 2, function(x) (x-mean(x))/sd(x)))

pn_pca <- prcomp(plane_noise)
n_scree <- ggscree(pn_pca, q = 7) + 
  theme_minimal() +
  ggtitle("(d) 4D in 7D") +
  ylim(c(0,3))

Four line plots laid out in 2x2. In (a) the line drops from 3 to 2 to 0 from x=1,2,3. In (b) the line drops from 2 to 1.5 to 1 to 0.25 and stays there as x goes from 1 to 5. In (c) the black line stays around 1 for all values of x. In (d) the black line drops from 3 to 2 to 1 for x=1, 2, 3, and then drops to 0 at x=4. — Figure 4.1: Scree plots for the three simulated data sets shown in Figure 3.2, and the data where extra noise variables are added. The 2D in 5D (a) is clearly recognised by PCA to be 2D because the variance drops substantially between 2-3 principal components. The 3D in 5D (b) is possibly 3D because the variance drops from 3-4 principal components. The fully 5D data (c) has no drop in variance, and all values are close to the typical value one would observe if the data was fully 5D. When the additional noise variables are added to the 2D data it becomes 4D in 7D, and the scree plot indicates that around 3-4 PCs would be suitable.

Code to print PC coefficients

pn_pca$rotation[,1:4] |>
  as_tibble(rownames="Variable") |> 
  gt() |>
  fmt_number(columns = c(PC1, PC2, PC3, PC4),
             decimals = 2)

Table 4.3: Coefficients for PCs 1-4 of the plane plus noise data.

Variable	PC1	PC2	PC3	PC4
x1	0.58	0.04	0.01	0.00
x2	−0.55	−0.18	−0.03	0.07
x3	0.47	0.37	0.05	−0.20
x4	0.24	−0.62	−0.06	0.17
x5	−0.28	0.60	0.07	−0.14
x6	0.05	0.29	−0.58	0.76
x7	−0.02	−0.08	−0.81	−0.58

4.1.1 Example: pisa

The pisa data contains simulated data from math, reading and science scores, totalling 30 variables. PCA is used here to examine the association. We might expect that it is 3D, but what we see suggests it is primarily 1D. This means that a student that scores well in math, will also score well in reading and science.

data(pisa)
pisa_std <- pisa |>
  dplyr::filter(CNT == "Australia") |>
  ungroup() |>
  dplyr::select(PV1MATH:PV10SCIE) |>
  mutate_all(mulgar:::scale2)
pisa_pca <- prcomp(pisa_std)
pisa_scree <- ggscree(pisa_pca, q = 15) + theme_minimal()

The scree plot in Figure 4.2 shows a big drop from one to two PCs in the amount of variance explained. From a grand tour on the 30 variables we can see that the data is elliptical in most projections, sometimes shrinking to be a small circle. This pattern strongly indicates that there is one primary direction of variation in the data, with only small variation in any direction away from it. Shrinking to the small circle is analogous to to how a pencil or cigar or water bottle in 3D looks from some angles.

Code

animate_xy(pisa_std, half_range=1)

render_gif(pisa_std, 
           grand_tour(), 
           display_xy(half_range=0.9),
           gif_file="gifs/pisa_gt.gif",
           frames=500,
           width=400,
           height=400,
           loop=FALSE)

Tour showing lots of linear projections of the pisa data. You can see strong linear dependence. — (a) Scree plot

The coefficients of the first PC (first eigenvector) are roughly equal in magnitude (as shown below), which tells us that all variables roughly contribute. Interestingly, they are all negative, which is not actually meaningful. With different software these could easily have been all positive. The sign of the coefficients can be reversed, as long as all are reversed, which is the same as an arrow pointing one way, changing and pointing the other way.

Code to print PC coefficients

round(pisa_pca$rotation[,1], 2)

 PV1MATH  PV2MATH  PV3MATH  PV4MATH  PV5MATH  PV6MATH  PV7MATH  PV8MATH 
   -0.18    -0.18    -0.18    -0.18    -0.18    -0.18    -0.18    -0.18 
 PV9MATH PV10MATH  PV1READ  PV2READ  PV3READ  PV4READ  PV5READ  PV6READ 
   -0.18    -0.18    -0.18    -0.18    -0.19    -0.18    -0.19    -0.19 
 PV7READ  PV8READ  PV9READ PV10READ  PV1SCIE  PV2SCIE  PV3SCIE  PV4SCIE 
   -0.19    -0.19    -0.19    -0.19    -0.18    -0.19    -0.19    -0.18 
 PV5SCIE  PV6SCIE  PV7SCIE  PV8SCIE  PV9SCIE PV10SCIE 
   -0.18    -0.19    -0.19    -0.18    -0.19    -0.18

The tour verifies that the pisa data is primarily 1D, indicating that a student who scores well in math, probably scores well in reading and science, too. More interestingly, the regular shape of the data strongly indicates that it is “synthetic”, simulated rather than observed.

4.1.2 Example: aflw

This data has player statistics for all the matches in the 2021 season of the Women’s Australian Football League. We would be interested to know which variables contain similar information, and thus might be combined into single variables. We would expect that many statistics group into a few small sets, such as offensive and defensive skills. We might also expect that some of the statistics are skewed, most players have low values and just a handful of players are stellar. It is also possible that there are some extreme values. These are interesting features, but they will distract from the main purpose of grouping the statistics. Thus the tour is used to check for potential problems with the data prior to conducting PCA.

data(aflw)
aflw_std <- aflw |>
  mutate_if(is.numeric, function(x) (x-
      mean(x, na.rm=TRUE))/
      sd(x, na.rm=TRUE))

We look at all of the 29 player statistics in a grand tour in Figure 4.3.

Code to generate tour

animate_xy(aflw_std[,7:35], half_range=0.9)
render_gif(aflw_std[,7:35], 
           grand_tour(), 
           display_xy(half_range=0.9),
           gif_file="gifs/aflw_gt.gif",
           frames=500,
           loop=FALSE)

Tour showing lots of linear projections of the aflw data. You can see linear dependence, and some outliers. — Figure 4.3: Grand tour of the AFLW player statistics. Most player statistics concentrate near the centre, indicating most players are “average”! There are a few outliers appearing in different combinations of the skills, which one would expect to be the star players for particular skill sets.

No major surprises! There is a small amount of skewness, and there are no major outliers. Skewness indicates that most players have reasonably similar skills (bunching of points), except for some key players (the moderate outliers). The skewness could be reduced by applying a log or square root transformation to some variables prior to running the PCA. However, we elect not to do this because the moderate outliers are of interest. These correspond to talented players that we’d like to explore further with the analysis.

Below we have the conventional summary of the PCA, a scree plot showing the reduction in variance to be explained when each additional PC is considered. It is also conventional to look at a table summarising the proportions of variance explained by PCs, but with almost 30 variables it is easier to make some decision on the number of PCs needed based on the scree plot.

Code to make screeplot

aflw_pca <- prcomp(aflw_std[,7:35], 
               scale = FALSE, 
               retx=TRUE)

ggscree(aflw_pca, q = 29) + theme_minimal()

Scree plot showing variance vertically against PC number horizontally. Variance drops from close to 10 for PC 1 to about 1.2 for PC 4 then slowly decays through to PC 29. — Figure 4.4: Scree plot showing decay in variance of PCs. There are sharp drops for the first four PCs, and then smaller declines.

From the scree plot in Figure 4.4, we see a sharp drop from one to two, two to three and then smaller drops. After four PCs the variance drops again at six PCs and then gradually decays. We will choose four PCs to examine more closely. This explains 67.2% of the variance.

When there are as many variables as this, it can be hard to digest the combinations of variables most contributing to each PC. Rearranging the table by sorting on a selected PC can help. Table 4.4 has been sorted according to the PC 1 coefficients.

PC 1 is primarily composed of disposals, possessions, kicks, metres, uncontested, contested, …. primarily the field play statistics! It is quite common in PCA for the first PC to be a combination of all variables, which suggests that there is one main direction of variation in the data. Here it is not quite that. PCA suggests that the primary variation is through a combination of field skills, or basic football playing skills.

Thus the second PC contrasts the first, because it is primarily a combination of shots, goals, marks_in50, accuracy, and behinds contrasted against rebounds_in50 and intercepts. The positive coefficients are primary offensive skills and the negative coefficients are defensive skills. This PC is reasonable measure of the offensive vs defensive skills of a player.

Code to print PC coefficients

aflw_pca$rotation[,1:4] |>
  as_tibble(rownames="Variable") |> 
  arrange(desc(PC1), desc(PC2), desc(PC3)) |>
  gt() |>
  fmt_number(columns = c(PC1, PC2, PC3, PC4),
             decimals = 2)

Table 4.4: Coefficients for the first four PCs. PC 1 contrasts some with PC 2, with the first having large coefficients primarily on field play statistics, and the second having large coefficients on the scoring statistics.

Variable	PC1	PC2	PC3	PC4
disposals	0.31	−0.05	−0.03	0.07
possessions	0.31	−0.03	−0.07	0.09
kicks	0.29	−0.04	0.09	−0.12
metres	0.28	−0.03	0.10	−0.15
contested	0.28	0.01	−0.12	0.23
uncontested	0.28	−0.06	−0.01	−0.05
turnovers	0.27	−0.01	−0.01	−0.29
clearances	0.23	0.00	−0.29	0.19
clangers	0.23	−0.02	−0.06	−0.33
handballs	0.23	−0.04	−0.19	0.31
frees_for	0.21	0.02	−0.13	0.18
marks	0.21	0.03	0.32	0.02
tackles	0.20	0.01	−0.28	0.09
time_pct	0.16	−0.04	0.35	−0.02
intercepts	0.13	−0.28	0.24	0.03
rebounds_in50	0.13	−0.28	0.24	−0.06
frees_against	0.13	0.03	−0.16	−0.23
assists	0.09	0.23	0.00	0.05
bounces	0.09	0.03	0.02	−0.28
behinds	0.09	0.32	0.08	−0.02
shots	0.08	0.38	0.12	−0.03
tackles_in50	0.07	0.27	−0.18	0.03
marks_in50	0.06	0.34	0.18	0.04
contested_marks	0.05	0.16	0.34	0.15
goals	0.04	0.37	0.16	0.03
accuracy	0.04	0.34	0.10	0.06
one_pct	0.03	−0.21	0.33	0.08
disposal	0.02	−0.13	0.20	0.50
hitouts	−0.04	0.00	−0.03	0.32

We could continue to interpret each PC by examining large coefficients to help decide how many PCs are a suitable summary of the information in the data. Briefly, PC 3 is mixed but it is possibly a measure of worth of the player because time_pct has a large coefficient, so players that are on the field longer will contribute strongly to this new variable. It also has large (and opposite) contributions from clearances, tackles, contested_marks. PC 4 appears to be related to aggressive play with clangers, turnovers, bounces and frees_against featuring. All four PCs have useful information.

For deeper exploration, when we tour the four PCs, we’d like to be able to stop and identify players. This can be done by creating a pre-computed animation, with additional mouse-over, made using plotly. However, it is not size-efficient, and is only feasible with a small number of observations. This is because all of the animation frames, with the full projected data in each, are composed into a single object, which gets large very quickly.

The result is shown in Figure 4.5. We can see that the shape of the four PCs is similar to that of all the variables, bunching of points in the centre with a lot of moderate outliers.

Code to make tour animation

set.seed(20)
b <- basis_random(4, 2)
aflw_pct <- tourr::save_history(aflw_pca$x[,1:4], 
                    tour_path = grand_tour(),
                    start = b,
                    max_bases = 5)
# To reconstruct projected data plots, later
save(aflw_pct, file="data/aflw_pct.rda") 
aflw_pcti <- interpolate(aflw_pct, 0.1)
aflw_anim <- render_anim(aflw_pca$x[,1:4],
                         frames=aflw_pcti, 
             obs_labels=paste0(aflw$surname,
                               aflw$given_name))

aflw_gp <- ggplot() +
     geom_path(data=aflw_anim$circle, 
               aes(x=c1, y=c2,
                   frame=frame), linewidth=0.1) +
     geom_segment(data=aflw_anim$axes, 
                  aes(x=x1, y=y1, 
                      xend=x2, yend=y2, 
                      frame=frame), 
                  linewidth=0.1) +
     geom_text(data=aflw_anim$axes, 
               aes(x=x2, y=y2, 
                   frame=frame, 
                   label=axis_labels), 
               size=5) +
     geom_point(data=aflw_anim$frames, 
                aes(x=P1, y=P2, 
                    frame=frame, 
                    label=obs_labels), 
                alpha=0.8) +
     xlim(-1,1) + ylim(-1,1) +
     coord_equal() +
     theme_bw() +
     theme(axis.text=element_blank(),
         axis.title=element_blank(),
         axis.ticks=element_blank(),
         panel.grid=element_blank())
aflw_pctour <- ggplotly(aflw_gp,
                        width=500,
                        height=550) |>
       animation_button(label="Go") |>
       animation_slider(len=0.8, x=0.5,
                        xanchor="center") |>
       animation_opts(easing="linear", transition = 0)

htmlwidgets::saveWidget(aflw_pctour,
          file="html/aflw_pca.html",
          selfcontained = TRUE)

Figure 4.5: Animation of four PCs of the aflw data with interactive labelling.

Code to generate interactive plot of frame 18

load("data/aflw_pct.rda")
aflw_pcti <- interpolate(aflw_pct, 0.1)
f18 <- matrix(aflw_pcti[,,18], ncol=2)
p18 <- render_proj(aflw_pca$x[,1:4], f18, 
                   obs_labels=paste0(aflw$surname,
                               aflw$given_name))
pg18 <- ggplot() +
  geom_path(data=p18$circle, aes(x=c1, y=c2)) +
  geom_segment(data=p18$axes, aes(x=x1, y=y1, xend=x2, yend=y2)) +
  geom_text(data=p18$axes, aes(x=x2, y=y2, label=rownames(p18$axes))) +
  geom_point(data=p18$data_prj, aes(x=P1, y=P2, label=obs_labels)) +
  xlim(-1,1) + ylim(-1, 1) +
  #ggtitle("Frame 18") +
  theme_bw() +
  theme(
    axis.text=element_blank(),
    axis.title=element_blank(),
    axis.ticks=element_blank(),
    panel.grid=element_blank())

Code

ggplotly(pg18, width=500, height=500)

Figure 4.6: Frame 18 re-plotted so that players can be identified on mouse-over.

For any particular frame, like 18 re-plotted in Figure 4.6, we can investigate further. Here there is a branching pattern, where the branch points in the direction of PC 1. Mouse-over the players at the tip of this branch and we find players like Alyce Parker, Brittany Bonnici, Dana Hooker, Kiara Bowers. If you look up the bios of these players you’ll find they all have generally good player descriptions like “elite disposals”, “powerful left foot”, “hard-running midfielder”, “best and fairest”.

In the direction of PC 2, you’ll find players like Lauren Ahrens, Stacey Livingstone who are star defenders. Players in this end of PC 2, have high scores on intercepts and rebounds_in50.

Another interesting frame for inspecting PC 2 is 59. PC 2 at one end has players with high goal scoring skills, and the other good defending skills. So mousing over the other end of PC 2 finds players like Gemma Houghton and Katie Brennan who are known for their goal scoring. The branch pattern is an interesting one, because it tells us there is some combination of skills that are lacking among all players, primarily this appears to be there some distinction between defenders skills and general playing skills. It’s not as simple as this because the branching is only visible when PC 1 and PC 2 are examined with PC 3.

PCA is useful for getting a sense of the variation in a high-dimensional data set. Interpreting the principal components is often useful, but it can be discombobulating. For the aflw data it would be good to think about it as a guide to the main directions of variation and to follow with a more direct engineering of variables into interesting player characteristics. For example, calculate offensive skill as an equal combination of goals, accuracy, shots, behinds. A set of new variables specifically computed to measure particular skills would make explaining an analysis easier.

The tour verifies that PCA on the aflw data is complicated and doesn’t capture all of the variation. However, it does provide useful insights. It detected outstanding players, and indicated the different skills sets of top goal scorers and top defensive players.

4.2 Examining the PCA model in the data space

When you choose a smaller number of PCs \((k)\) than the number of original variables, this is essentially producing a model for the data. The model is the lower dimensional \(k\)-D space. It is analogous to a linear regression model, except that the residuals from the model are \((p-k)\)-D.

It is common to show the model, that is the data projected into the \(k\)-D model space. When \(k=2\) this is called a “biplot”. For the plane and plane_noise data the biplots are shown in Figure 4.7, as produced by the ggfortify package (Tang et al., 2016). This is useful for checking which variables contribute most to the new principal component variables, and also to check for any problems that might have affected the fit, such as outliers, clusters or non-linearity. Interestingly, biplots are typically only made in 2D, even if the data should be summarised by more than two PCs. Occasionally you will see the biplot made for PC \(j\) vs PC \(k\) also. With the pca_tour() function in the tourr package you can view a \(k\)-D biplot. This will display the \(k\) PCs with the axes displaying the original variables, and thus show their contribution to the PCs.

Code for biplots

plane_pca <- prcomp(plane)
pl1 <- autoplot(plane_pca, loadings = TRUE, 
         loadings.label = TRUE) + 
  ggtitle("(a)") +
  theme_minimal() + 
  theme(aspect.ratio=1)
plane_noise_pca <- prcomp(plane_noise)
pl2 <- autoplot(plane_noise_pca, loadings = TRUE, 
         loadings.label = TRUE) + 
  ggtitle("(b)") +
  theme_minimal() + 
  theme(aspect.ratio=1)
pl1 + pl2

Two scatterplots with black dots representing individual data points projected onto the first two principal components. Red arrows originate from the center and point outward, representing variable loadings. In plot (a), the x-axis is labeled PC1 (57.61%) and the y-axis is labeled PC2 (37.36%), indicating the proportion of variance explained by each component. In plot (b), PC1 accounts for 41.22% of the variance and PC2 for 28.21%. The red arrows are labeled X1 through X6, showing how variables contribute to the principal components. — Figure 4.7: Biplots of the plane (a) and plane + noise (b) data. All five variables contribute strongly to the two principal components in (a): PC1 is primarily `x1`, `x2` and `x3` and PC2 is primarily `x4` and `x5`. In (b) the same four variables contribute in almost the same way, with variables `x6` and `x7` contributing very little. The data was constructed this way, that these two dimensions were purely noise.

It can be useful to examine this model using the tour. The model is simply a plane in high dimensions. This would be considered to be the model in the data space. The reason to do this is to check how well the model fits the data. The plane corresponding to the model should be oriented along the main direction of the points, and the spread of points around the plane should be small. We should also be able to see if there has been any strong non-linear relationship missed by the model, or outliers and clusters.

The function pca_model() from the mulgar package can be used to represent the model as a \(k\)-D wire-frame plane. Figure 4.8 shows the models for the plane and box data, 2D and 3D respectively.

We look at the model in the data space to check how well the model fits the data. If it fits well, the points will cluster tightly around the model representation, with little spread in other directions.

Code for model-in-the-data

plane_m <- pca_model(plane_pca)
plane_m_d <- rbind(plane_m$points, plane)
animate_xy(plane_m_d, edges=plane_m$edges,
           axes="bottomleft",
           edges.col="#E7950F",
           edges.width=3)
render_gif(plane_m_d, 
           grand_tour(), 
           display_xy(half_range=0.9,
                      edges=plane_m$edges, 
                      edges.col="#E7950F",
                      edges.width=3),
           gif_file="gifs/plane_model.gif",
           frames=500,
           width=400,
           height=400,
           loop=FALSE)
box_pca <- prcomp(box)
box_m <- pca_model(box_pca, d=3)
box_m_d <- rbind(box_m$points, box)
animate_xy(box_m_d, edges=box_m$edges, 
           axes="bottomleft", edges.col="#E7950F", edges.width=3)
render_gif(box_m_d, 
           grand_tour(), 
           display_xy(half_range=0.9,
                      edges=box_m$edges, 
                      edges.col="#E7950F",
                      edges.width=3),
           gif_file="gifs/box_model.gif",
           frames=500,
           width=400,
           height=400,
           loop=FALSE)

Tour animation showing many 2D projections from 5D. Points are black dots, and there is a yellow rectangle overlaying the points. The rectangle is oriented along the two main directions where the points are spread out. A circle and line segments indicates the values of the projection coefficients of each projection. — (a) Model for the 2D in 5D data.

Tour animation showing many 2D projections from 5D. Points are black dots, and there is a yellow wireframe rectangular prism overlaying the points. The box is oriented along the three main directions where the points are spread out. A circle and line segments indicates the values of the projection coefficients of each projection. — (a) Model for the 2D in 5D data.

4.2.1 Example: pisa

The model for the pisa data is a 1D vector, shown in Figure 4.9. In this example there is a good agreement between the model and the data.

Code for model-in-the-data

pisa_model <- pca_model(pisa_pca, d=1, s=2)

pisa_all <- rbind(pisa_model$points, pisa_std)
animate_xy(pisa_all, edges=pisa_model$edges,
           edges.col="#E7950F", edges.width=3)
render_gif(pisa_all, 
           grand_tour(), 
           display_xy(half_range=0.9,
                      edges=pisa_model$edges, 
                      edges.col="#E7950F", 
                      edges.width=5),
           gif_file="gifs/pisa_model.gif",
           frames=500,
           width=400,
           height=400,
           loop=FALSE)

A tour animation of 2D projections, with black points and an orange line overlaid. There is an axis display of circle and line segments indicating the contribution of each variable in each projection. The line is beautifully matching the regular elliptical shape of the point cloud. — Figure 4.9: PCA model of the `pisa` data. The 1D model captures the primary variation in the data and there is a small amount of spread in all directions away from the model.

The pisa data fits fairly closely to the 1D PCA model. The variance of points away from the model is symmetric and relatively small. These suggest the 1D model is a reasonable summary of the test scores.

4.2.2 Example: aflw

It is less useful to examine the PCA model for the aflw data, because the main patterns that were of interest were the exceptional players. However, we will do it anyway! Figure 4.10 shows the 4D PCA model overlain on the data. Even though the distribution of points is not as symmetric and balanced as the other examples, we can see that the cube structure mirrors the variation. We can see that the relationships between variables are not strictly linear, because the spread extends unevenly away from the box.

Code for model-in-the-data

aflw_model <- pca_model(aflw_pca, d=4, s=1)

aflw_all <- rbind(aflw_model$points, aflw_std[,7:35])
animate_xy(aflw_all, edges=aflw_model$edges,
           edges.col="#E7950F", 
           edges.width=3, 
           half_range=0.8, 
           axes="off")
render_gif(aflw_all, 
           grand_tour(), 
           display_xy(half_range=0.8,
                      edges=aflw_model$edges, 
                      edges.col="#E7950F", 
                      edges.width=3, 
                      axes="off"),
           gif_file="gifs/aflw_model.gif",
           frames=500,
           width=400,
           height=400,
           loop=FALSE)

A tour animation of 2D projections showing a scatterplot of black points and an orange wireframe 4D cube overlaid. There is no axis display. The box roughly matches the spread of points, but there unusual scatterplot patterns in many projections which the box doesn't match. — Figure 4.10: PCA model of the `aflw` data. The linear model is not ideal for this data, which has other patterns like outliers, and some branching. However, the model roughly captures the linear associations, and leaves unexplained and unequal variation in different directions.

From the tour we see that the 4D model leaves substantial variation unexplained. It is also not symmetric, and there is some larger variation away from the model in some combinations of variables than others.

4.3 When relationships are not linear

4.3.1 Example: outliers

Figure 4.11 shows the scree plot for the planar data with noise and outliers. It is very similar to the scree plot on the data without the outliers (Figure 4.1 (d)). However, what we see from Figure 4.12 is that PCA loses the outliers. The animation in (a) shows the full data, and the outliers marked by colour and labels 1, 2, are clearly unusual in some projections. When we examine the tour of the first four PCs (as suggested by the scree plot) the outliers are not unusual. They are almost contained in the point cloud. The reason is clear when all the PCs are plotted in a scatterplot matrix as shown in Figure 4.13, and the outliers can be seen to be clearly detected only in PC5, PC6 and PC7.

Code for screeplot

plane_n_o_pca <- prcomp(plane_noise_outliers)
ggscree(plane_n_o_pca, q = 7) + theme_minimal()

Line plot showing variance vertically against PC number horizontally. Variance drops from close to 3 for PC 1 to about 0.8 for PC 4 then drops to near 0 for PCs 5-7. — Figure 4.11: Scree plot of the planar data with noise and an outlier. It is almost the same as the data without the outliers.

Code to generate tours

clrs <- hcl.colors(12, "Zissou 1")
p_col <- c(rep("black", 100), clrs[11], clrs[11])
p_obs_labels <- c(rep("", 100), "1", "2")

animate_xy(plane_n_o_pca$x[,1:4],
           col=p_col,
           obs_labels=p_obs_labels)
animate_xy(plane_noise_outliers,
           col=p_col,
           obs_labels=p_obs_labels)
set.seed(446)
render_gif(plane_noise_outliers, 
           grand_tour(), 
           display_xy(half_range=4,
                      col=p_col,
             obs_labels=p_obs_labels),
           gif_file="gifs/plane_n_o_clr.gif",
           frames=500,
           width=400,
           height=400,
           loop=FALSE)
set.seed(446)
render_gif(plane_n_o_pca$x[,1:4], 
           grand_tour(), 
           display_xy(half_range=1,
                      col=p_col,
             obs_labels=p_obs_labels),
           rescale = TRUE,
           gif_file="gifs/plane_n_o_pca.gif",
           frames=500,
           width=400,
           height=400,
           loop=FALSE)

A tour animation showing 2D projections. Most points are black, with two coloured red and labelled appearing as outliers. The axes are represented as a circle and line segments showing contribution of variables to each projection. — (a) Outliers clearly visible

A tour animation showing 2D projections of four PCs. Most points are black, with two coloured red and labelled appearing in the middle of the point cloud. The axes are represented as a circle and line segments showing contribution of variables to each projection. — (a) Outliers clearly visible

Code to make scatterplot matrix

ggscatmat(plane_n_o_pca$x) + theme_minimal() +
  theme(axis.title = element_blank(),
        axis.text = element_blank())

Matrix layout of pairwise plots of seven variables, as scatterplots in the lower triangle, density plots on the diagonal, and text showing correlation in the upper triangle. Outliers are visible in all scatterplots containing PCs 5, 6 and 7. Otherwise little association between variables. — Figure 4.13: From the scatterplot matrix we can see that the outliers are present in PC5, PC6 and PC7. That means by reducing the dimensionality to the first four PCs the model has missed some important characteristics in the data.

4.3.2 Example: Non-linear associations

Figure 4.15 shows the tour of the full 5D data containing non-linear relationships in comparison with a tour of the first three PCs, as recommended by the scree plot (Figure 4.14). The PCs capture some clear and very clean non-linear relationship, but it looks like it has missed some of the complexities of the relationships. The scatterplot matrix of all 5 PCs (Figure 4.16) shows that PC4 and PC5 contain interesting features: more non-linearity, and curiously an outlier.

Code for screeplot

data(plane_nonlin)
plane_nonlin_pca <- prcomp(plane_nonlin)
ggscree(plane_nonlin_pca, q = 5) + theme_minimal()

Line plot with x axis having tick marks at 1, 2, 3, 4, 5, and y axis labelled 'Variance' and ranging from 0 to 2.5. A thick grey line is fairly constant around 1. The black line drops from 2.5 to 1.7 to 0.75 to 0.2 for x going from 1 to 4. — Figure 4.14: Scree plot of the non-linear data suggests three PCs.

Code to generate tour

animate_xy(plane_nonlin_pca$x[,1:3])
set.seed(506)
render_gif(plane_nonlin_pca$x[,1:3], 
           grand_tour(), 
           display_xy(half_range=4.5),
           gif_file="gifs/plane_nonlin_pca.gif",
           frames=500,
           width=400,
           height=400)

Tour animation showing 2D projections from 5D. Black points form a U-shape, and there are a few other non-linear twists in the data. — (a) All five variables

Tour animation showing 2D projections from 3D. Black points form a strong U-shape, and nothing else. — (a) All five variables

Code to make scatterplot matrix

ggscatmat(plane_nonlin_pca$x) +
  theme_minimal() +
  theme(axis.title = element_blank(),
        axis.text = element_blank())

Matrix layout of pairwise plots of five variables, as scatterplots in the lower triangle, density plots on the diagonal, and text showing correlation in the upper triangle. Some nonlinear relationships are visible when PC1 is in the plot, but otherwise there is little association. — Figure 4.16: From the scatterplot matrix we can see that the there is a non-linear relationship visible in PC1 and PC2, with perhaps a small contribution from PC3. However, we can see that when the data is reduced to three PCs, it misses catching all on the non-linear relationships and also interestingly it seems that there is an unusual observation also.

One of the dangers of PCA is that interesting and curious details of the data can be captured by the lowest PCs, that are usually discarded. The tour, and examining the smaller PCs, can help to discover them.

Exercises

This question is about the aflw data.
1. Make a scatterplot matrix of the first four PCs of the aflw data. Is the branch pattern visible in any pair?
2. Construct five new variables to measure these skills offense, defense, playing time, ball movement, errors. Using the tour, examine the relationship between these variables. Map out how a few players could be characterised based on these directions of skills.
3. Symmetrise any aflw variables that have skewed distributions using a log or square root transformation. Then re-do the PCA. What do we learn that is different about associations between the skill variables?
This question is about the bushfires data.
1. using a grand tour on the numeric variables, ignoring the cause (class) variable. Note any issues such as outliers, or skewness that might affect PCA.
2. How many principal components would be recommended by the scree plot? Examine this PCA model with the data, and explain how well it does or doesn’t fit.
3. Use the pca_tour to examine the first five PCs of the bushfires data. How do all of the variables contribute to this reduced space?
Reduce the dimension of the sketches_train data to 12 PCs. How much variation does this explain? Is there any obvious clustering in this lower dimensional space?
For each of the anomaly data sets. Check if the anomaly is visible in the first \(k\) PCs, where \(k\) is determined by usual methods such as the scree plot, using a tour. If it is not, find which PC it falls in, if any.
Compute PCA for the multicluster data. How many PCs would be recommended by the scree plot? From the tour, what is not visible in this set, and how many more would you need to add to better see it?

Project

Linear dimension reduction can optimise for other criteria, and here we will explore one example: the algorithm described in (Kandanaarachchi & Hyndman, 2021) and implemented in the dobin package (Kandanaarachchi, 2022) finds a basis in which the first few directions are optimized for the detection of outliers in the data. We will examine how it performs for the plane_noise_outliers data (the example where outliers were hidden in the first four principal components.)

Start by looking up the documentation of dobin::dobin. How many parameters does the method depend on?
We first apply the function to the plane_noise_outliers data using default values for all parameters.
Recall that the outliers were added in rows 101 and 102 of the data. Make a scatterplot matrix showing the projections onto the first, second and third component found by dobin, using color to highlight the outliers. Are they visible as outliers with three components?
Adjust the frac parameter of the dobin function to frac = 0.99 and repeat the graphical evaluation from point 3. How does it compare to the previous solution?

Abbott, E. (1884). Flatland: A Romance of Many Dimensions. Dover Publications.

Ahlberg, C., Williamson, C., & Shneiderman, B. (1991). Dynamic Queries for Information Exploration: An Implementation and Evaluation. ACM CHI ‘92 Conference Proceedings, 619–626.

Allaire, J., & Chollet, F. (2023). keras: R interface to Keras. https://CRAN.R-project.org/package=keras

Anderson, E. (1957). A Semigraphical Method for the Analysis of Complex Problems. Proceedings of the National Academy of Science, 13, 923–927.

Andrews, D. F. (1972). Plots of High-dimensional Data. Biometrics, 28, 125–136.

Andrews, D. F., Gnanadesikan, R., & Warner, J. L. (1971). Transformations of Multivariate Data. Biometrics, 27, 825–840.

Anselin, L., & Bao, S. (1997). Exploratory Spatial Data Analysis Linking SpaceStat and ArcView. In M. M. Fischer & A. Getis (Eds.), Recent Developments in Spatial Analysis (pp. 35–59). Springer.

Arnold, J. B. (2024). ggthemes: Extra Themes, Scales and Geoms for ggplot2. https://jrnold.github.io/ggthemes/

ASA Statistical Graphics Section. (2023). Video Library. https://community.amstat.org/jointscsg-section/media/videos.

Asimov, D. (1985). The Grand Tour: A Tool for Viewing Multidimensional Data. SIAM Journal of Scientific and Statistical Computing, 6(1), 128–143.

Auguie, B. (2017). gridExtra: Miscellaneous Functions for grid Graphics. https://CRAN.R-project.org/package=gridExtra

Australian Bureau of Agricultural and Resource Economics and Sciences. (2018). Forests of Australia. https://www.agriculture.gov.au/abares/forestsaustralia/forest-data-maps-and-tools/spatial-data/forest-cover

Batsaikhan, Z., Cook, D., & Laa, U. (2023). Frame to Frame Interpolation for High-dimensional Data Visualisation using the woylier package. https://doi.org/10.48550/arXiv.2311.08181

Batsaikhan, Z., Cook, D., & Laa, U. (2024). woylier: Alternative Tour Frame Interpolation Method. https://numbats.github.io/woylier/

Becker, R. A., & Chambers, J. M. (1984). S: An Environment for Data Analysis and Graphics. Wadsworth.

Becker, R. A., & Cleveland, W. S. (1988). Brushing Scatterplots (W. S. Cleveland & M. E. McGill, Eds.; pp. 201–224). Wadsworth.

Becker, R., Cleveland, W. S., & Shyu, M.-J. (1996). The Visual Design and Control of Trellis Displays. Journal of Computational and Graphical Statistics, 6(1), 123–155.

Bederson, B. B., & Schneiderman, B. (2003). The Craft of Information Visualization: Readings and Reflections. Morgan Kaufmann.

Bellman, R. (1961). Adaptive Control Processes: A Guided Tour.

Bickel, P. J., Kur, G., & Nadler, B. (2018). Projection Pursuit in High Dimensions. Proceedings of the National Academy of Sciences, 115, 9151–9156. https://doi.org/10.1073/pnas.1801177115

Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.

Boehmke, B., & Greenwell, B. M. (2019). Hands-On Machine Learning with R (1st ed.). Chapman; Hall/CRC. https://doi.org/10.1201/9780367816377

Boelaert, J., Ollion, E., & Sodoge, J. (2022). aweSOM: Interactive Self-Organizing Maps. https://CRAN.R-project.org/package=aweSOM

Bonneau, G.-P., Ertl, T., & Nielson, G. M. (Eds.). (2006). Scientific Visualization: The Visual Extraction of Knowledge from Data. Springer.

Borg, I., & Groenen, P. J. F. (2005). Modern Multidimensional Scaling. Springer.

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32.

Breiman, L., Cutler, A., Liaw, A., & Wiener, M. (2022). randomForest: Breiman and Cutler’s Random Forests for classification and Regression. https://www.stat.berkeley.edu/~breiman/RandomForests/

Breiman, L., Friedman, J., Olshen, C., & Stone, C. (1984). Classification and Regression Trees. Wadsworth; Brooks/Cole.

Buja, A. (1996). Interactive Graphical Methods in the Analysis of Customer Panel Data: Comment. Journal of Business & Economic Statistics, 14(1), 128–129.

Buja, A., & Asimov, D. (1986). Grand Tour Methods: An Outline. Computing Science and Statistics, 17, 63–67.

Buja, A., Asimov, D., Hurley, C., & McDonald, J. A. (1988). Elements of a Viewing Pipeline for Data Analysis (W. S. Cleveland & M. E. McGill, Eds.; pp. 277–308). Wadsworth.

Buja, A., Cook, D., Asimov, D., & Hurley, C. (2005). Computational Methods for High-Dimensional Rotations in Data Visualization. In C. R. Rao, E. J. Wegman, & J. L. Solka (Eds.), Handbook of Statistics: Data Mining and Visualization (pp. 391–414). Elsevier/North-Holland.

Buja, A., Cook, D., & Swayne, D. (1996). Interactive High-Dimensional Data Visualization. Journal of Computational and Graphical Statistics, 5(1), 78–99.

Buja, A., Hurley, C., & McDonald, J. A. (1986). A Data Viewer for Multivariate Data. Computing Science and Statistics, 17(1), 171–174.

Buja, A., & Swayne, D. F. (2002). Visualization Methodology for Multidimensional Scaling. Journal of Classification, 19(1), 7–43.

Buja, A., Swayne, D. F., Littman, M. L., Dean, N., Hofmann, H., & Chen, L. (2008). Data Visualization with Multidimensional Scaling. Journal of Computational and Graphical Statistics, 17(2), 444–472. https://doi.org/10.1198/106186008X318440

Buja, A., & Tukey, P. (Eds.). (1991). Computing and Graphics in Statistics. Springer-Verlag.

Butler, A., Hoffman, P., Smibert, P., Papalexi, E., & Satija, R. (2018). Integrating Single-Cell Transcriptomic Data Across Different Conditions, Technologies, and Species. Nature Biotechnology, 36, 411–420. https://doi.org/10.1038/nbt.4096

Card, S. K., Mackinlay, J. D., & Schneiderman, B. (1999). Readings in Information Visualization. Morgan Kaufmann Publishers.

Carr, D. B., Wegman, E. J., & Luo, Q. (1996). ExplorN: Design Considerations Past and Present (Technical Report No. 129). Center for Computational Statistics, George Mason University.

Chatfield, C. (1995). Problem Solving: A Statistician’s Guide. Chapman; Hall/CRC Press.

Chen, C.-H., Härdle, W., & Unwin, A. (Eds.). (2007). Handbook of Data Visualization. Springer. https://doi.org/10.1007/978-3-540-33037-0

Chen, Z., Wang, C., Huang, S., Shi, Y., & Xi, R. (2024). Directly Selecting Cell-type Marker Genes for Single-cell Clustering Analyses. Cell Reports Methods, 4, 100810. https://doi.org/10.1016/j.crmeth.2024.100810

Cheng, B., & Titterington, M. (1994). Neural Networks: A Review from a Statistical Perspective. Statistical Science, 9(1), 2–30.

Cheng, J., & Sievert, C. (2023). crosstalk: Inter-Widget Interactivity for HTML Widgets. https://rstudio.github.io/crosstalk/

Chernoff, H. (1973). The Use of Faces to Represent Points in \(k\)-dimensional Space Graphically. Journal of the American Statistical Association, 68, 361–368.

Cleveland, W. S. (1979). Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of American Statistics Association, 74, 829–836.

Cleveland, W. S. (1993). Visualizing Data. Hobart Press.

Cleveland, W. S., & McGill, M. E. (Eds.). (1988). Dynamic Graphics for Statistics. Wadsworth.

Cook, D., & Buja, A. (1997). Manual Controls For High-Dimensional Data Projections. Journal of Computational and Graphical Statistics, 6(4), 464–480.

Cook, D., Buja, A., & Cabrera, J. (1993). Projection Pursuit Indexes Based on Orthonormal Function Expansions. Journal of Computational and Graphical Statistics, 2(3), 225–250.

Cook, D., Buja, A., Cabrera, J., & Hurley, C. (1995). Grand Tour and Projection Pursuit. Journal of Computational and Graphical Statistics, 4(3), 155–172.

Cook, D., Hofmann, H., Lee, E.-K., Yang, H., Nikolau, B., & Wurtele, E. (2007). Exploring Gene Expression Data, Using Plots. Journal of Data Science, 5(2), 151–182.

Cook, D., & Laa, U. (2025). mulgar: Functions for Pre-Processing Data for Multivariate data Visualisation using Tours. https://dicook.github.io/mulgar/

Cook, D., Lee, E.-K., Buja, A., & Wickham, H. (2006). Grand Tours, Projection Pursuit Guided Tours and Manual Controls. In C.-H. Chen, W. Härdle, & A. Unwin (Eds.), Handbook of Data Visualization. Springer. https://doi.org/10.1007/978-3-540-33037-0

Cook, D., Majure, J. J., Symanzik, J., & Cressie, N. (1996). Dynamic Graphics in a GIS: Exploring and Analyzing Multivariate Spatial Data using Linked Software. Computational Statistics: Special Issue on Computer Aided Analyses of Spatial Data, 11(4), 467–480.

Cook, D., & Swayne, D. F. (2007). Interactive and Dynamic Graphics for Data Analysis: With R and GGobi. Springer-Verlag. https://doi.org/10.1007/978-0-387-71762-3

Cortes, C., Pregibon, D., & Volinsky, C. (2003). Computational Methods for Dynamic Graphs. Journal of Computational & Graphical Statistics, 12(4), 950–970.

Cortes, C., & Vapnik, V. N. (1995). Support-Vector Networks. Machine Learning, 20(3), 273–297.

d’Ocagne, M. (1885). Coordonnées Parallèles et Axiales: Méthode de Transformation Géométrique et Procédé Nouveau de Calcul Graphique dÉduits de la Considération des Coordonnées Paralléles. Gauthier-Villars.

Dalgaard, P. (2002). Introductory Statistics with R. Springer.

Dasu, T., Swayne, D. F., & Poole, D. (2005). Grouping Multivariate Time Series: A Case Study. Proceedings of the IEEE Workshop on Temporal Data Mining: Algorithms, Theory and Applications, in Conjunction with the Conference on Data Mining, Houston, November 27, 2005, 25–32.

de Vries, A., & Ripley, B. D. (2024). ggdendro: Create Dendrograms and Tree Diagrams Using ggplot2. https://andrie.github.io/ggdendro/

Department of Environment, Land, Water & Planning. (2019). Fire Origins - Current and Historical. https://discover.data.vic.gov.au/dataset/fire-origins-current-and-historical

Department of Environment, Land, Water & Planning. (2020a). CFA - Fire Station. https://discover.data.vic.gov.au/dataset/cfa-fire-station-vmfeat-geomark_point

Department of Environment, Land, Water & Planning. (2020b). Recreation Sites. https://discover.data.vic.gov.au/dataset/recreation-sites

Diaconis, P., & Freedman, D. (1984). Asymptotics of Graphical Projection Pursuit. Annals of Statistics, 12, 793–815.

Dolnicar, S., Grün, B., & Leisch, F. (2018). Market Segmentation Analysis: Understanding it, Doing it, and Making it Useful (pp. 11–22). https://doi.org/10.1007/978-981-10-8818-6_2

Dykes, J., MacEachren, A. M., & Kraak, M.-J. (2005). Exploring Geovisualization. Elsevier.

Emerson, J. W., Green, W. A., Schloerke, B., Crowley, J., Cook, D., Hofmann, H., & Wickham, H. (2013). The Generalized Pairs Plot. Journal of Computational and Graphical Statistics, 22(1), 79–91. https://doi.org/10.1080/10618600.2012.694762

Everitt, B. S., Landau, S., Leese, M., & Stahel, D. (2011). Cluster Analysis (5th ed). John Wiley; Sons, Ltd.

Fienberg, S. E. (1979). Graphical Methods in Statistics. Journal of American Statistical Association, 33(4), 165–178.

Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7(2), 179–188. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x

Fisherkeller, M. A., Friedman, J. H., & Tukey, J. W. (1973). PRIM-9, an Interactive Multidimensional Data Display and Analysis System. https://www.youtube.com/watch?v=B7XoW2qiFUA

Fisherkeller, M. A., Friedman, J. H., & Tukey, J. W. (1974). PRIM-9, an Interactive Multidimensional Data Display and Analysis System. In W. S. Cleveland (Ed.), The collected works of john w. Tukey: Graphics 1965-1985, volume v (pp. 340–346).

Forbes, J., Cook, D., & Hyndman, R. J. (2020). Spatial modelling of the two-party preferred vote in australian federal elections: 2001–2016. Australian & New Zealand Journal of Statistics, 62(2), 168–185. https://doi.org/https://doi.org/10.1111/anzs.12292

Ford, B. J. (1992). Images of Science: A History of Scientific Illustration. The British Library.

Forgy, E. (1965). Cluster Analysis of Multivariate Data: Efficiency versus Interpretability of Classification. Biometrics, 21(3), 768–769.

Fraley, C., & Raftery, A. E. (2002). Model-based Clustering, Discriminant Analysis, Density Estimation. Journal of the American Statistical Association, 97, 611–631. https://doi.org/10.1198/016214502760047131

Fraley, C., Raftery, A. E., & Scrucca, L. (2024). Mclust: Gaussian mixture modelling for model-based clustering, classification, and density estimation. https://mclust-org.github.io/mclust/

Friedman, J. H. (1987). Exploratory Projection Pursuit. Journal of American Statistical Association, 82, 249–266.

Friedman, J. H., & Tukey, J. W. (1974). A Projection Pursuit Algorithm for Exploratory Data Analysis. IEEE Transactions on Computing C, 23, 881–889.

Friendly, M., & Denis, D. J. (2004). Milestones in the History of Thematic Cartography, Statistical Graphics, and Data Visualization. http://www.math.yorku.ca/SCS/Gallery/milestone/.

Fritsch, S., Guenther, F., & Wright, M. N. (2019). neuralnet: Training of Neural Networks. https://CRAN.R-project.org/package=neuralnet

Furnas, G. W., & Buja, A. (1994). Prosection Views: Dimensional Inference Through Sections and Projections. Journal of Computational and Graphical Statistics, 3(4), 323–385.

Gabriel, K. R. (1971). The Biplot Graphical Display of Matrices with Applications to Principal Component Analysis. Biometrika, 58, 453–467.

Gentle, J. E., Härdle, W., & Mori, Y. (Eds.). (2004). Handbook of Computational Statistics: Concepts and Methods. Springer.

Giordani, P., Ferraro, M. B., & Martella, F. (2020). An Introduction to Clustering with R. Springer Singapore. https://doi.org/10.1007/978-981-13-0553-5

Glover, D. M., & Hopke, P. K. (1992). Exploration of Multivariate Chemical Data by Projection Pursuit. Chemometrics and Intelligent Laboratory Systems, 16, 45–59.

Good, P. (2005). Permutation, Parametric, and Bootstrap Tests of Hypotheses. Springer.

Gower, J. C., & Hand, D. J. (1996). Biplots. Chapman; Hall.

Gruen, B. (2024). CRAN Task View: Cluster Analysis & Finite Mixture Models (Version 2024-08-20). https://cran.r-project.org/web/views/Cluster.html.

Hajibaba, H., Karlsson, L., & Dolnicar, S. (2016). Residents Open Their Homes to Tourists When Disaster Strikes. Journal of Travel Research, 56(8), 1065–1078.

Hansen, C., & Johnson, C. R. (2004). Visualization Handbook. Academic Press.

Hao, Y., Hao, S., Andersen-Nissen, E., III, W. M. M., Zheng, S., Butler, A., Lee, M. J., Wilk, A. J., Darby, C., Zagar, M., Hoffman, P., Stoeckius, M., Papalexi, E., Mimitou, E. P., Jain, J., Srivastava, A., Stuart, T., Fleming, L. B., Yeung, B., … Satija, R. (2021). Integrated Analysis of Multimodal Single-Cell Data. Cell. https://doi.org/10.1016/j.cell.2021.04.048

Harrison, P. (2023). langevitour: Smooth Interactive Touring of High Dimensions, Demonstrated with scRNA-Seq Data. The R Journal, 15, 206–219. https://doi.org/10.32614/RJ-2023-046

Harrison, P. (2024). Langevitour: Langevin tour. https://logarithmic.net/langevitour/

Hart, C., & Wang, E. (2024). detourr: Portable and Performant Tour Animations. https://casperhart.github.io/detourr/

Hartigan, J. A., & Kleiner, B. (1981). Mosaics for Contingency Tables. Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface, 268–273.

Hartigan, J., & Kleiner, B. (1984). A Mosaic of Television Ratings. The American Statistician, 38, 32–35.

Haslett, J., Bradley, R., Craig, P., Unwin, A., & Wills, G. (1991). Dynamic Graphics for Exploring Spatial Data with Application to Locating Global and Local Anomalies. The American Statistician, 45(3), 234–242.

Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning. Springer.

Hennig, C. (2024). fpc: Flexible Procedures for Clustering. https://CRAN.R-project.org/package=fpc

Hennig, C., Meila, M., Murtagh, F., & Rocci, R. (2015). Handbook of Cluster Analysis (1st ed.). Chapman; Hall/CRC. https://doi.org/10.1201/b19706

Hofmann, H. (2001). Graphical Tools for the Exploration of Multivariate Categorical Data. Books on Demand.

Hofmann, H. (2003). Constructing and Reading Mosaicplots. Computational Statistics and Data Analysis, 43(4), 565–580.

Hofmann, H., & Theus, M. (1998). Selection Sequences in MANET. Computational Statistics, 13(1), 77–87.

Horikoshi, M., & Tang, Y. (2018). ggfortify: Data Visualization Tools for Statistical Analysis Results. https://CRAN.R-project.org/package=ggfortify

Horikoshi, M., & Tang, Y. (2024). ggfortify: Data Visualization Tools for Statistical Analysis Results. https://github.com/sinhrks/ggfortify

Horst, A. M., Hill, A. P., & Gorman, K. B. (2022). Palmer Archipelago Penguins Data in the palmerpenguins R Package - An Alternative to Anderson’s Irises. The R Journal, 14, 244–254. https://doi.org/10.32614/RJ-2022-020

Horst, A., Hill, A., & Gorman, K. (2022). palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data. https://allisonhorst.github.io/palmerpenguins/

Hotelling, H. (1933). Analysis of a Complex of Statistical Variables into Principal Components. Journal of Educational Psychology, 24(6), 417--441. https://doi.org/10.1037/h0071325

Huber, P. J. (1985). Projection Pursuit (with discussion). Annals of Statistics, 13, 435–525.

Hurley, C. (1987). The Data Viewer: An Interactive Program for Data Analysis [PhD thesis]. University of Washington.

Iannone, R., Cheng, J., Schloerke, B., Hughes, E., Lauer, A., & Seo, J. (2024). Gt: Easily create presentation-ready display tables. https://gt.rstudio.com

Ihaka, R., & Gentleman, R. (1996). R: A Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics, 5, 299–314.

Ihaka, R., Murrell, P., Hornik, K., Fisher, J. C., Stauffer, R., Wilke, C. O., McWhite, C. D., & Zeileis, A. (2024). colorspace: A Toolbox for Manipulating and Assessing Colors and Palettes. https://colorspace.R-Forge.R-project.org/

Inselberg, A. (1985). The Plane with Parallel Coordinates. The Visual Computer, 1, 69–91.

Iowa State University. (2020). ASOS-AWOS-METAR Data Download. https://mesonet.agron.iastate.edu/request/download.phtml?network=AU__ASOS

Johnson, D., & Travis, J. (2007). Flatland: The Movie. https://round-drum-w7xh.squarespace.com/our-story.

Johnson, R. A., & Wichern, D. W. (2002). Applied Multivariate Statistical Analysis (5th ed). Prentice-Hall.

Jolliffe, I. T., & Cadima, J. (2016). Principal Component Analysis: A Review and Recent Developments. Philosophical Transactions of the Royal Society A, 374, 20150202. https://doi.org/10.1098/rsta.2015.0202

Jones, M. C., & Sibson, R. (1987). What is Projection Pursuit? (With discussion). Journal of the Royal Statistical Society, Series A, 150, 1–36.

Kandanaarachchi, S. (2022). dobin: Dimension Reduction for Outlier Detection. https://sevvandi.github.io/dobin/

Kandanaarachchi, S., & Hyndman, R. J. (2021). Dimension Reduction for Outlier Detection Using DOBIN. Journal of Computational and Graphical Statistics, 30(1), 204–219. https://doi.org/https://doi.org/10.1080/10618600.2020.1807353

Kassambara, A. (2017). Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning. STHDA.

Kassambara, A. (2023). ggpubr: ggplot2 Based Publication Ready Plots. https://rpkgs.datanovia.com/ggpubr/

Kohonen, T. (2001). Self-Organizing Maps (3rd ed). Springer.

Koschat, M. A., & Swayne, D. F. (1996). Interactive Graphical Methods in the Analysis of Customer Panel Data (with discussion). Journal of Business and Economic Statistics, 14(1), 113–132.

Krijthe, J. (2023). Rtsne: T-Distributed Stochastic Neighbor Embedding using a Barnes-hut Implementation. https://github.com/jkrijthe/Rtsne

Kruskal, J. B. (1964a). Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis. Psychometrika, 29, 1–27.

Kruskal, J. B. (1964b). Nonmetric Multidimensional Scaling: A Numerical Method. Psychometrika, 29, 115–129.

Kruskal, J. B., & Wish, M. (1978). Multidimensional Scaling. Sage Publications.

Kuhn, M., & Wickham, H. (2020). tidymodels: A Collection of Packages for Modeling and Machine Learning using tidyverse Principles. https://www.tidymodels.org

Kuhn, M., & Wickham, H. (2024). tidymodels: Easily Install and Load the Tidymodels Packages. https://tidymodels.tidymodels.org

Laa, U., Aumann, A., Cook, D., & Valencia, G. (2023). New and Simplified Manual Controls for Projection and Slice Tours, With Application to Exploring Classification Boundaries in High Dimensions. Journal of Computational and Graphical Statistics, 32(3), 1229–1236. https://doi.org/10.1080/10618600.2023.2206459

Laa, U., Cook, D., & Lee, S. (2022). Burning Sage: Reversing the Curse of Dimensionality in the Visualization of High-Dimensional Data. Journal of Computational and Graphical Statistics, 31(1), 40–49. https://doi.org/10.1080/10618600.2021.1963264

Laa, U., Cook, D., & Valencia, G. (2020a). A Slice Tour for Finding Hollowness in High-Dimensional Data. Journal of Computational and Graphical Statistics, 29(3), 681–687. https://doi.org/10.1080/10618600.2020.1777140

Laa, U., Cook, D., & Valencia, G. (2020b). A Slice Tour for Finding Hollowness in High-Dimensional Data. Journal of Computational and Graphical Statistics, 29(3), 681–687. https://doi.org/10.1080/10618600.2020.1777140

Lancaster, H. O. (1965). The Helmert Matrices. The American Mathematical Monthly, 72(1), 4–12.

Laurent, S. (2023). cxhull: Convex Hull. https://github.com/stla/cxhull

Lee, E.-K. (2018). PPtreeViz: An R package for Visualizing Projection Pursuit Classification Trees. Journal of Statistical Software, 83(8), 1–30. https://doi.org/10.18637/jss.v083.i08

Lee, E.-K., & Cook, D. (2009). A Projection Pursuit Index for Large \(p\) Small \(n\) Data. Statistics and Computing, 20, 381–392. https://doi.org/10.1007/s11222-009-9131-1

Lee, E.-K., Cook, D., Klinke, S., & Lumley, T. (2005). Projection Pursuit for Exploratory Supervised Classification. Journal of Computational and Graphical Statistics, 14(4), 831–846.

Lee, S. (2021). Liminal: Multivariate data visualization with tours and embeddings. https://github.com/sa-lee/liminal/

Lee, S., Cook, D., Silva, N. da, Laa, U., Spyrison, N., Wang, E., & Zhang, H. S. (2022). The State-of-the-Art on Tours for Dynamic Visualization of High-Dimensional Data. WIREs Computational Statistics, 14(4), e1573. https://doi.org/10.1002/wics.1573

Lee, Y. D., Cook, D., Park, J., & Lee, E.-K. (2013). PPtree: Projection Pursuit Classification Tree. Electronic Journal of Statistics, 7(none), 1369–1386. https://doi.org/10.1214/13-EJS810

Leisch, F. (2008). Visualizing Cluster Analysis and Finite Mixture Models. In Handbook of Data Visualization (pp. 561–587). Springer. https://doi.org/10.1007/978-3-540-33037-0_22

Li, M., Zhao, Z., & Scheidegger, C. (2020). Visualizing Neural Networks with the Grand Tour. Distill. https://doi.org/10.23915/distill.00025

Liaw, A., & Wiener, M. (2002). Classification and Regression by randomForest. R News, 2(3), 18–22. https://CRAN.R-project.org/doc/Rnews/

Littman, M. L., Swayne, D. F., Dean, N., & Buja, A. (1992). Visualizing the Embedding of Objects in Euclidean Space. Computing Science and Statistics: Proceedings of the 24th Symposium on the Interface, 208–217.

Lloyd, S. (1982). Least Squares Quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129–137. https://doi.org/10.1109/TIT.1982.1056489

Longley, P. A., Maguire, D. J., Goodchild, M. F., & Rhind, D. W. (2005). Geographic Information Systems and Science. John Wiley & Sons.

Loperfido, N. (2018). Skewness-Based Projection Pursuit: A Computational Approach. Computational Statistics & Data Analysis, 120, 42–57. https://doi.org/https://doi.org/10.1016/j.csda.2017.11.001

Maaten, L. van der, & Hinton, G. (2008). Visualizing Data Using t-SNE. J. Mach. Learn. Res., 9(Nov), 2579–2605. http://www.jmlr.org/papers/v9/vandermaaten08a.html

MacQueen, J. B. (1967). Some Methods for Classification and Analysis of Multivariate Observations. In L. M. L. Cam & J. Neyman (Eds.), Proc. Of the fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281–297). University of California Press.

Maindonald, J., & Braun, J. (2003). Data Analysis and Graphics using R - an Example-based Approach. Cambridge University Press.

Martin, E. (1965). Flatland. http://www.der.org/films/flatland.html.

Mayer, M. (2024). shapviz: SHAP visualizations. https://CRAN.R-project.org/package=shapviz

Mayer, M., & Watson, D. (2023). kernelshap: Kernel SHAP. https://CRAN.R-project.org/package=kernelshap

McFarlane, M., & Young, F. W. (1994). Graphical Sensitivity Analysis for Multidimensional Scaling. Journal of Computational and Graphical Statistics, 3, 23–33.

McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. http://arxiv.org/abs/1802.03426

McNeil, D. (1977). Interactive Data Analysis. John Wiley; Sons.

McVicar, T. (2011). Near-Surface Wind Speed. v10. CSIRO. Data Collection. https://doi.org/10.25919/5c5106acbcb02

Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., & Leisch, F. (2024). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. https://CRAN.R-project.org/package=e1071

Milborrow, S. (2024). rpart.plot: Plot rpart Models: An Enhanced Version of plot.rpart. http://www.milbo.org/rpart-plot/index.html

Mock, T. (2023). gtExtras: Extending gt for beautiful HTML tables. https://github.com/jthomasmock/gtExtras

Molnar, C. (2025). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (3rd ed). https://christophm.github.io/interpretable-ml-book/.

Moon, K. R., Dijk, D. van, Wang, Z., Gigante, S., Burkhardt, D. B., Chen, W. S., Yim, K., Elzen, A. van den, Hirn, M. J., Coifman, R. R., Ivanova, N. B., Wolf, G., & Krishnaswamy, S. (2019). Visualizing Structure and Transitions for Biological Data Exploration. Nature Biotechnology, 37, 1482–1492. https://doi.org/10.1038/s41587-019-0336-3

Murrell, P. (2005). R Graphics. Chapman; Hall/CRC.

OpenStreetMap contributors. (2020). Planet Dump Retrieved from https://planet.osm.org . https://www.openstreetmap.org.

Pearson, K. (1901). LIII. On Lines and Planes of Closest Fit to Systems of Points in Space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), 559–572. https://doi.org/10.1080/14786440109462720

Pedersen, T. L. (2024). patchwork: The Composer of Plots. https://patchwork.data-imaginist.com

Perisic, I., & Posse, C. (2005). Projection Pursuit Indices Based on the Empirical Distribution Function. Journal of Computational and Graphical Statistics, 14(3), 700–715. https://doi.org/10.1198/106186005X69440

Polzehl, J. (1995). Projection Pursuit Discriminant Analysis. Computational Statistics and Data Analysis, 20, 141–157.

Posse, C. (1992). Projection Pursuit Discriminant Analysis for Two Groups. Communications in Statistics, Part A - Theory and Methods, 21, 1–19.

Posse, C. (1995). Tools for Two-dimensional Projection Pursuit. Journal of Computational and Graphical Statistics, 4(2), 83–100.

P-Tree System. (2020). JAXA Himawari Monitor - User’s Guide. https://www.eorc.jaxa.jp/ptree/userguide.html

R Core Team. (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. https://www.R-project.org/

Rao, C. R. (1948). The Utilization of Multiple Measurements in Problems of Biological Classification (with discussion). Journal of the Royal Statistical Society, Series B, 10, 159–203.

Rao, C. R. (Ed.). (1993). Handbook of Statistics, Vol. 9. Elsevier Science Publishers.

Rao, C. R., Wegman, E. J., & Solka, J. L. (Eds.). (2006). Handbook of Statistics: Data Mining and Visualization. Elsevier/North-Holland.

Ripley, B. (1996). Pattern Recognition and Neural Networks. Cambridge University Press.

Ripley, B. (2023). nnet: Feed-Forward Neural Networks and Multinomial Log-Linear Models. http://www.stats.ox.ac.uk/pub/MASS4/

Ripley, B., & Venables, B. (2024). MASS: Support functions and datasets for venables and ripley’s MASS. http://www.stats.ox.ac.uk/pub/MASS4/

Robinson, D., Hayes, A., & Couch, S. (2024). broom: Convert Statistical Objects into Tidy Tibbles. https://CRAN.R-project.org/package=broom

Rothkopf, E. Z. (1957). A Measure of Stimulus Similarity and Errors in Some Paired-Associate Learning Tasks. Journal of Experimental Psychology, 2, 94–101. https://psycnet.apa.org/doi/10.1037/h0041867

Roweis, S. T., & Saul, L. K. (2000). Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290(5500), 2323–2326. https://doi.org/10.1126/science.290.5500.2323

Satija, R., Farrell, J. A., Gennert, D., Schier, A. F., & Regev, A. (2015). Spatial Reconstruction of Single-Cell Gene Expression Data. Nature Biotechnology, 33, 495–502. https://doi.org/10.1038/nbt.3192

Savageau, D., & Boyer, R. (1993). Places Rated Almanac: Your Guide to Finding the Best Places to Live in North America. Prentce Hall Travel.

Schloerke, B. (2016). geozoo: Zoo of Geometric Objects. http://schloerke.github.io/geozoo/

Schloerke, B., Cook, D., Larmarange, J., Briatte, F., Marbach, M., Thoen, E., Elberg, A., & Crowley, J. (2024). GGally: Extension to ggplot2. https://ggobi.github.io/ggally/

Schloerke, B., Wickham, H., Cook, D., & Hofmann, H. (2016). Escape from Boxland. The R Journal, 8, 243–257.

Scrucca, L., Fraley, C., Murphy, T. B., & Raftery, A. E. (2023). Model-Based Clustering, Classification, and Density Estimation Using mclust in R. Chapman; Hall/CRC. https://doi.org/10.1201/9781003277965

Shepard, R. N. (1962). The Analysis of Proximities: Multidimensional Scaling with an Unknown Distance Function, I and II. Psychometrika, 27, 125-139 and 219-246.

Sievert, C. (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman; Hall/CRC. https://plotly-r.com

Sievert, C., Parmer, C., Hocking, T., Chamberlain, S., Ram, K., Corvellec, M., & Despouy, P. (2024). plotly: Create Interactive Web Graphics via plotly.js. https://plotly-r.com

Sjoberg, D. D., Larmarange, J., Curry, M., Lavery, J., Whiting, K., & Zabor, E. C. (2024). Gtsummary: Presentation-ready data summary and analytic result tables. https://github.com/ddsjoberg/gtsummary

Sjoberg, D. D., Whiting, K., Curry, M., Lavery, J. A., & Larmarange, J. (2021). Reproducible Summary Tables with the gtsummary Package. The R Journal, 13, 570–580. https://doi.org/10.32614/RJ-2021-053

Slowikowski, K. (2024). Ggrepel: Automatically position non-overlapping text labels with ggplot2. https://ggrepel.slowkow.com/

Sparks, A. H., Carroll, J., Goldie, J., Marchiori, D., Melloy, P., Padgham, M., Parsonage, H., & Pembleton, K. (2020). bomrang: Australian government bureau of meteorology (BOM) data client. https://CRAN.R-project.org/package=bomrang

Spence, R. (2007). Information Visualization: Design for Interaction. Prentice Hall.

Stauffer, R., Mayr, G. J., Dabernig, M., & Zeileis, A. (2009). Somewhere over the Rainbow: How to Make Effective Use of Colors in Meteorological Visualizations. Bulletin of the American Meteorological Society, 96(2), 203–216. https://doi.org/10.1175/BAMS-D-13-00155.1

Stuart, T., Butler, A., Hoffman, P., Hafemeister, C., Papalexi, E., III, W. M. M., Hao, Y., Stoeckius, M., Smibert, P., & Satija, R. (2019). Comprehensive Integration of Single-Cell Data. Cell, 177, 1888–1902. https://doi.org/10.1016/j.cell.2019.05.031

Sutherland, P., Rossini, A., Lumley, T., Lewin-Koh, N., Dickerson, J., Cox, Z., & Cook, D. (2000). Orca: A Visualization Toolkit for High-Dimensional Data. Journal of Computational and Graphical Statistics, 9(3), 509–529. https://doi.org/10.1080/10618600.2000.10474896

Swayne, D. F., Buja, A., & Temple Lang, D. (2004). Exploratory Visual Analysis of Graphs in GGobi. In J. Antoch (Ed.), CompStat: Proceedings in computational statistics, 16th symposium. Physica-Verlag.

Swayne, D. F., Cook, D., & Buja, A. (1992). XGobi: Interactive Dynamic Graphics in the X Window System with a Link to S. American Statistical Association 1991 Proceedings of the Section on Statistical Graphics, 1–8.

Swayne, D. F., Cook, D., & Buja, A. (1998). XGobi: Interactive Dynamic Data Visualization in the X Window System. Journal of Computational and Graphical Statistics, 7(1), 113–130. https://doi.org/10.1080/10618600.1998.10474764

Swayne, D. F., & Klinke, S. (1998). Editorial Commentary. Computational Statistics: Special Issue on The Use of Interactive Graphics, 14(1).

Swayne, D. F., Temple Lang, D., Buja, A., & Cook, D. (2003). GGobi: Evolving from XGobi into an Extensible Framework for Interactive Data Visualization. Computational Statistics & Data Analysis, 43, 423–444.

Swayne, D., & Buja, A. (1998). Missing Data in Interactive High-Dimensional Data Visualization. Computational Statistics, 13(1), 15–26.

Symanzik, J. (2002). New Applications of the Image Grand Tour. Computing Science and Statistics, 34, 500--512. https://math.usu.edu/symanzik/papers/2002_interface.pdf

Symanzik, J. (2004). Interactive and Dynamic Graphics. In J. E. Gentle, W. Härdle, & Y. Mori (Eds.), Handbook of Computational Statistics: Concepts and Methods (pp. 293–336). Springer.

Takatsuka, M., & Gahegan, M. (2002). GeoVISTA Studio: A Codeless Visual Programming Environment for Geoscientific Data Analysis and Visualization. The Journal of Computers and Geosciences, 28(10), 1131–1144.

Tang, Y., Horikoshi, M., & Li, W. (2016). ggfortify: Unified Interface to Visualize Statistical Result of Popular R Packages. The R Journal, 8(2), 474–485. https://doi.org/10.32614/RJ-2016-060

Tarpey, T., Li, L., & Flury, B. (1995). Principal Points and Self-Consistent Points of Elliptical Distributions. The Annals of Statistics, 23, 103–112.

Temple Lang, D., Swayne, D., Wickham, H., & Lawrence, M. (2006). rggobi: An Interface between R and GGobi. http://www.R-project.org.

Tenenbaum, J. B., Silva, V. de, & Langford, J. C. (2000). A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290(5500), 2319–2323. https://doi.org/10.1126/science.290.5500.2319

Therneau, T., & Atkinson, B. (2023). rpart: Recursive Partitioning and Regression trees. https://github.com/bethatkinson/rpart

Theus, M. (2002). Interactive Data Visualization Using Mondrian. Journal of Statistical Software, 7(11), http://www.jstatsoft.org.

Theus, M., Hofmann, H., & Wilhelm, A. F. X. (1998). Selection Sequences - Interactive Analysis of Massive Data Sets. Computing Science and Statistics, 29(1), 439–444.

Thompson, G. L. (1993). Generalized Permutation Polytopes and Exploratory Graphical Methods for Ranked Data. The Annals of Statistics, 21, 1401–1430.

Tierney, L. (1991). LispStat: An Object-Orientated Environment for Statistical Computing and Dynamic Graphics. John Wiley & Sons.

Tierney, N., & Cook, D. (2023a). Expanding Tidy Data Principles to Facilitate Missing Data Exploration, Visualization and Assessment of Imputations. Journal of Statistical Software, 105(7), 1–31. https://doi.org/10.18637/jss.v105.i07

Tierney, N., & Cook, D. (2023b). Expanding Tidy Data Principles to Facilitate Missing Data Exploration, Visualization and Assessment of Imputations. Journal of Statistical Software, 105(7), 1–31. https://doi.org/10.18637/jss.v105.i07

Tierney, N., Cook, D., McBain, M., & Fay, C. (2024). naniar: Data Structures, Summaries, and Visualisations for Missing Data. https://github.com/njtierney/naniar

Torgerson, W. S. (1952). Multidimensional Scaling. 1. Theory and Method. Psychometrika, 17, 401–419.

Tufte, E. (1983). The Visual Display of Quantitative Information. Graphics Press.

Tufte, E. (1990). Envisioning Information. Graphics Press.

Tukey, J. W. (1965). The Technical Tools of Statistics. The American Statistician, 19, 23–28.

Unwin, A. R., Hawkins, G., Hofmann, H., & Siegl, B. (1996). Interactive Graphics for Data Sets with Missing Values - MANET. Journal of Computational and Graphical Statistics, 5(2), 113–122.

Unwin, A., Hofmann, H., & Wilhelm, A. (2002). Direct Manipulation Graphics for Data Mining. Journal of Image and Graphics, 2(1), 49–65.

Unwin, A., Theus, M., & Hofmann, H. (2006). Graphics of Large Datasets: Visualizing a Million. Springer.

Unwin, A., Volinsky, C., & Winkler, S. (2003). Parallel Coordinates for Exploratory Modelling Analysis. Comput. Stat. Data Anal., 43(4), 553–564. https://doi.org/{\tt http://dx.doi.org/10.1016/S0167-9473(02)00292-X}

Urbanek, S., & Theus, M. (2003). iPlots: High Interaction Graphics for R. In K. Hornik, F. Leisch, & A. Zeileis (Eds.), Proceedings of the 3rd international workshop on distributed statistical computing (DSC 2003).

Vaidyanathan, R., Xie, Y., Allaire, J., Cheng, J., Sievert, C., & Russell, K. (2023). Htmlwidgets: HTML widgets for r. https://github.com/ramnathv/htmlwidgets

van den Boogaart, K. G., Tolosana-Delgado, R., & Bren, M. (2024). compositions: Compositional Data Analysis. http://www.stat.boogaart.de/compositions/

van der Maaten, L. J. P. (2014). Accelerating t-SNE using Tree-Based lgorithms. Journal of Machine Learning Research, 15, 3221–3245.

van der Maaten, L. J. P., & Hinton, G. E. (2008). Visualizing High-Dimensional Data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.

Vapnik, V. N. (1999). The Nature of Statistical Learning Theory. Springer.

Velleman, P. F., & Velleman, A. Y. (1985). Data Desk Handbook. Data Description, Inc.

Venables, W. N., & Ripley, B. (2002). Modern Applied Statistics with S. Springer-Verlag. https://www.stats.ox.ac.uk/pub/MASS4/

Wainer, H. (2000). Visual Revelations (2nd ed). LEA, Inc.

Wainer, H., & Spence, I. (eds). (2005a). The Commercial and Political Atlas, Representing, by means of Stained Copper-Plate Charts, The Progress of the Commerce, Revenues, Expenditure, and Debts of England, during the whole of the Eighteenth Century, by William Playfair. Cambridge University Press.

Wainer, H., & Spence, I. (eds). (2005b). The Statistical Breviary; Shewing on a Principle entirely new, the resources of every state and kingdom in Europe; illustrated with Stained Copper-Plate Charts, representing the physical powers of each distinct nation with ease and perspicuity by William Playfair. Cambridge University Press.

Wang, P. C. C. (Ed.). (1978). Graphical Representation of Multivariate Data. Academic Press.

Wang, Y., Huang, H., Rudin, C., & Shaposhnik, Y. (2021). Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMap, and PaCMAP for Data Visualization. Journal of Machine Learning Research, 22(201), 1–73. http://jmlr.org/papers/v22/20-1061.html

Wegman, E. (1990). Hyperdimensional Data Analysis Using Parallel Coordinates. Journal of American Statistics Association, 85, 664–675.

Wegman, E. J. (1991). The Grand Tour in \(k\)-Dimensions (Technical Report No. 68). Center for Computational Statistics, George Mason University.

Wegman, E. J., & Carr, D. B. (1993). Statistical Graphics and Visualization (C. R. Rao, Ed.; pp. 857–958). Elsevier Science Publishers.

Wegman, E. J., Poston, W. L., & Solka, J. L. (1998). Image Grand Tour. Automatic Target Recognition VIII - Proceedings of SPIE, 3371, 286–294.

Wehrens, R., & Buydens, L. M. C. (2007). Self- and Super-Organizing Maps in R: The kohonen package. Journal of Statistical Software, 21(5), 1–19. https://doi.org/10.18637/jss.v021.i05

Wehrens, R., & Kruisselbrink, J. (2018). Flexible Self-Organizing Maps in kohonen 3.0. Journal of Statistical Software, 87(7), 1–18. https://doi.org/10.18637/jss.v087.i07

Wehrens, R., & Kruisselbrink, J. (2023). Kohonen: Supervised and unsupervised self-organising maps. https://CRAN.R-project.org/package=kohonen

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org

Wickham, H. (2022). classifly: Explore Classification Models in High Dimensions. http://had.co.nz/classifly

Wickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., Woo, K., Yutani, H., Dunnington, D., & van den Brand, T. (2024). ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://ggplot2.tidyverse.org

Wickham, H., & Cook, D. (2025). tourr: Tour Methods for Multivariate Data Visualisation. https://github.com/ggobi/tourr

Wickham, H., Cook, D., & Hofmann, H. (2015). Visualizing Statistical Models: Removing the Blindfold. Statistical Analysis and Data Mining: The ASA Data Science Journal, 8(4), 203–225. https://doi.org/10.1002/sam.11271

Wickham, H., Cook, D., Hofmann, H., & Buja, A. (2011). tourr: An R Package for Exploring Multivariate Data with Projections. Journal of Statistical Software, 40(2). https://doi.org/10.18637/jss.v040.i02

Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation. https://dplyr.tidyverse.org

Wickham, H., Hester, J., & Bryan, J. (2024). readr: Read Rectangular Text Data. https://readr.tidyverse.org

Wilhelm, A. F. X., Wegman, E. J., & Symanzik, J. (1999). Visual Clustering and Classification: The Oronsay Particle Size Data Set Revisited. Computational Statistics: Special Issue on Interactive Graphical Data Analysis, 14(1), 109–146.

Wilkinson, L. (2005). The Grammar of Graphics. Springer.

Wills, G. (1999). NicheWorks - Interactive Visualization of Very Large Graphs. Journal of Computational and Graphical Statistics, 8(2), 190–212.

Xie, Y., Hofmann, H., & Cheng, X. (2014). Reactive Programming for Interactive Graphics. Statistical Science, 29(2), 201–213. https://doi.org/10.1214/14-STS477

Young, F. W., Valero-Mora, P. M., & Friendly, M. (2006). Visual Statistics: Seeing Data with Dynamic Interactive Graphics. John Wiley & Sons.

Zeileis, A., Fisher, J. C., Hornik, K., Ihaka, R., McWhite, C. D., Murrell, P., Stauffer, R., & Wilke, C. O. (2020). colorspace: A toolbox for manipulating and assessing colors and palettes. Journal of Statistical Software, 96(1), 1–49. https://doi.org/10.18637/jss.v096.i01

Zeileis, A., Hornik, K., & Murrell, P. (2009). Escaping RGBland: Selecting Colors for Statistical Graphics. Computational Statistics & Data Analysis, 53(9), 3259–3270. https://doi.org/10.1016/j.csda.2008.11.033

Zhang, C., Ye, J., & Wang, X. (2023). A Computational Perspective on Projection Pursuit in High Dimensions: Feasible or Infeasible Feature Extraction. International Statistical Review, 91(1), 140–161. https://doi.org/10.1111/insr.12517

Zhang, H. S., Cook, D., Laa, U., Langrené, N., & Menéndez, P. (2021). Visual Diagnostics for Constrained Optimisation with Application to Guided Tours. The R Journal, 13(2), 624–641. https://doi.org/10.32614/RJ-2021-105

Zhang, H. S., Cook, D., Laa, U., Langrené, N., & Menéndez, P. (2024). ferrn: Facilitate Exploration of touRR optimisatioN. https://github.com/huizezhang-sherry/ferrn/

Zhu, H. (2024). kableExtra: Construct complex table with kable and pipe syntax. http://haozhu233.github.io/kableExtra/