

This talk is about visualisation to help in clustering high-dimensional data
Image: Sketchplanations
Tours of high-dimensional data are like examining the shadows (projections)
(and slices/sections to see through a shadow)
Data
Projection
Projected data

Data is 2D:
Projection is 1D:
Notice that the values of change between (-1, 1). All possible values being shown during the tour.

watching the 1D shadows we can see:
What does the 2D data look like? Can you sketch it?

⟵
The 2D data


Data is 3D:
Projection is 2D:
Notice that the values of change between (-1, 1). All possible values being shown during the tour.
See:

Data is 4D:
Projection is 2D:
How many clusters do you see?
You can now tell everyone that you can SEE in 4D!
If you want to discover and mark the clusters you see, you can use the detourr package to spin and brush points. Here’s a live demo. Hopefully this works.
Algorithm:
Software:
Grand tour

Slice display

Guided tour

Local tour

PCA

NLDR: tSNE

Best model: four-cluster VEE

Three-cluster EEE

Convex hulls are often used to summarise clusters in 2D. It is possible to view these in high-d, too.


| cl_w | cl_mc | ||
|---|---|---|---|
| 1 | 2 | 3 | |
| 1 | 149 | 8 | 0 |
| 2 | 0 | 0 | 119 |
| 3 | 0 | 57 | 0 |
library(crosstalk)
library(plotly)
library(viridis)
p_cl_shared <- SharedData$new(penguins_cl)
detour_plot <- detour(p_cl_shared, tour_aes(
projection = bl:bm,
colour = cl_w)) |>
tour_path(grand_tour(2),
max_bases=50, fps = 60) |>
show_scatter(alpha = 0.7, axes = FALSE,
width = "100%", height = "450px")
conf_mat <- plot_ly(p_cl_shared,
x = ~cl_mc_j,
y = ~cl_w_j,
color = ~cl_w,
colors = viridis_pal(option = "D")(3),
height = 450) |>
highlight(on = "plotly_selected",
off = "plotly_doubleclick") %>%
add_trace(type = "scatter",
mode = "markers")
bscols(
detour_plot, conf_mat,
widths = c(5, 6)
) The tourr package provides the algorithm to generate the tour paths, and also create new tours, different displays. However, the interactivity is poor, which is a big limitation.
detourr is an elegant solution, which could be developed further.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.