This talk is about visualisation to help in clustering high-dimensional data
Image: Sketchplanations
Tours of high-dimensional data are like examining the shadows (projections)
(and slices/sections to see through a shadow)
Data
Projection
Projected data
Data is 2D:
Projection is 1D:
Notice that the values of change between (-1, 1). All possible values being shown during the tour.
watching the 1D shadows we can see:
What does the 2D data look like? Can you sketch it?
⟵
The 2D data
Data is 3D:
Projection is 2D:
Notice that the values of change between (-1, 1). All possible values being shown during the tour.
See:
Data is 4D:
Projection is 2D:
How many clusters do you see?
You can now tell everyone that you can SEE in 4D!
If you want to discover and mark the clusters you see, you can use the detourr
package to spin and brush points. Here’s a live demo. Hopefully this works.
Algorithm:
Software:
Grand tour
Slice display
Guided tour
Local tour
PCA
NLDR: tSNE
Best model: four-cluster VEE
Three-cluster EEE
Convex hulls are often used to summarise clusters in 2D. It is possible to view these in high-d, too.
cl_w | cl_mc | ||
---|---|---|---|
1 | 2 | 3 | |
1 | 149 | 8 | 0 |
2 | 0 | 0 | 119 |
3 | 0 | 57 | 0 |
library(crosstalk)
library(plotly)
library(viridis)
p_cl_shared <- SharedData$new(penguins_cl)
detour_plot <- detour(p_cl_shared, tour_aes(
projection = bl:bm,
colour = cl_w)) |>
tour_path(grand_tour(2),
max_bases=50, fps = 60) |>
show_scatter(alpha = 0.7, axes = FALSE,
width = "100%", height = "450px")
conf_mat <- plot_ly(p_cl_shared,
x = ~cl_mc_j,
y = ~cl_w_j,
color = ~cl_w,
colors = viridis_pal(option = "D")(3),
height = 450) |>
highlight(on = "plotly_selected",
off = "plotly_doubleclick") %>%
add_trace(type = "scatter",
mode = "markers")
bscols(
detour_plot, conf_mat,
widths = c(5, 6)
)
The tourr
package provides the algorithm to generate the tour paths, and also create new tours, different displays. However, the interactivity is poor, which is a big limitation.
detourr
is an elegant solution, which could be developed further.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.