7  Spin-and-brush approach

Several examples of the spin-and-brush approach are documented in the literature, such as Cook et al. (1995a) and Wilhelm et al. (1999). The steps are:

  1. Run the (grand) tour.
  2. Stop when you see a separated cluster of points.
  3. Paint the cluster a chosen colour.
  4. Repeat 1-2 until the data is grouped, and when no other separated cluster is visible in any projection. You may need to re-paint some points if they appear to be grouped incorrectly in a different projection, or paint more points that after spinning most likely belong to an existing group.

Spin-and-brush is useful for exploring clustering when the data is numeric, and contains well-separated clusters. Patterns that adversely affect numerical techniques, such as nuisance variables or cases, differences in variances or shapes between clusters, don’t pose any problems for spin-and-brush. It is also effective if the data has connected low-dimensional (1D or 2D) clusters in high dimensions.

It will not work very well when there are no distinct clusters and the purpose of clustering is to partition the data into subsets. Here, you could begin with a solution provided by some numerical clustering algorithm, and to use visual tools to evaluate it, with goal of refining the results.

With a complex problem where there are many clusters, one can work sequentially, and remove each cluster after it is brushed, to de-clutter the display, in order to find more clusters.

Spin-and-brush is best achieved using a fully interactive graphics system like in the detourr package, where the results can be saved for further analysis. The code is very easy, and then all the controls are interactive.

library(detourr)
grDevices::hcl.colors(3, palette="Zissou 1")
detour(penguins_sub[,1:4], 
       tour_aes(projection = bl:bm)) |>
       tour_path(grand_tour(2), fps = 60, 
                 max_bases=20) |>
       show_scatter(alpha = 0.7, 
                    axes = FALSE)
Projected view where one cluster can be distinguished and is brushed in blue.
(a) One cluster painted
Projection where a second cluster can be distinguished and is brushed in red.
(b) Another cluster painted
Figure 7.1: Screenshots of the spin-and-brush approach using detourr on the penguins data.

Figure 7.1 shows the stages of spin-and-brush on the penguins data using detourr. The final results can be examined and used for later analysis. Because this data came with a class variable, the penguin species, it is interesting to see how close the spin-and-brush clustering approach came to recovering these:

Code to make confusion matrix
library(readr)
load("data/penguins_sub.rda")
detourr_penguins <- read_csv("data/detourr_penguins.csv")
table(penguins_sub$species, detourr_penguins$colour)
           
            000000 3e9eb6 f5191c
  Adelie       143      0      3
  Chinstrap      6      0     62
  Gentoo         2    117      0

It’s quite close! All but two of the 119 Gentoo penguins were identified as a cluster (labelled as “3e9eb6” from the chosen light blue hex colour), and all but three of the 146 Adelie penguins were identified as a cluster, (labelled as “000000” which is the unbrushed black group). Most of the Chinstrap species were recovered also (labelled as “f5191c” for the red hex colour).

Exercises

  1. Use the spin-and-brush approach to identify the three clusters in the mulgar::clusters data set.
  2. Use the spin-and-brush approach to identify the six clusters in the mulgar::multicluster data set. (The code below using detourr could be useful.)
  3. Use spin-and-brush on the challenge data sets, c1-c7 from the mulgar package. How many clusters do you detect in each?
library(detourr)

# Use a random starting basis because the first two variables make it too easy
strt <- tourr::basis_random(10, 2)
detour(multicluster, 
       tour_aes(projection = -group)) |>
       tour_path(grand_tour(2), start=strt, fps = 60) |>
       show_scatter(alpha = 0.7, axes = FALSE)
  1. Use the spin-and-brush technique to identify the branches of the fake_trees data. The result should look something like this:
Projection where some clusters extend in different direction, with point colors indicating the user-identified clusters.
Figure 7.2: Example solution after spin-and-brush on fake trees data.

You can use the download button to save the data with the colours. Tabulate the branches id variable in the original data with the colour groups created from brushing, to see how closely you have recovered the original classes.