Visually exploring local explanations to understand complex machine learning models

Outline

Example data: penguins
Non-linear model
Local explanations
Radial tour, focus on single observation

Palmer penguins

library(palmerpenguins)
penguins <- penguins %>%
  na.omit() # 11 observations 
# out of 344 removed, use only 
# vars of interest, and standardise
# them for easier interpretation
penguins_sub <- penguins[,c(1, 3:6)] %>% 
  mutate(
    across(where(is.numeric),  ~ scale(.)[,1])) %>%
  rename(bl = bill_length_mm,
         bd = bill_depth_mm,
         fl = flipper_length_mm,
         bm = body_mass_g)

Examine the penguins using a tour

Grand tour: randomly selecting target planes

Adelie

Chinstrap

Gentoo

Examine the penguins using a tour

Grand tour: randomly selecting target planes

Guided tour: target planes chosen to best separate classes

Tour projections are biplots

bd and bm distinguish Gentoo

Tour projections are biplots

bd and fl also distinguish Gentoo

Tour projections are biplots

bd and bl distinguish Chinstrap from Adelie

Fit a classification model

library(randomForest)
set.seed(2311)
penguins_rf_cl <- randomForest(species~., penguins_sub, 
                            ntree = 1000,
                            importance = TRUE)

#> 
#> Call:
#>  randomForest(formula = species ~ ., data = penguins_sub, ntree = 1000,      importance = TRUE) 
#>                Type of random forest: classification
#>                      Number of trees: 1000
#> No. of variables tried at each split: 2
#> 
#>         OOB estimate of  error rate: 2.4%
#> Confusion matrix:
#>           Adelie Chinstrap Gentoo class.error
#> Adelie       143         2      1      0.0205
#> Chinstrap      4        64      0      0.0588
#> Gentoo         0         1    118      0.0084

Variable importance (globally)

#>    Adelie Chinstrap Gentoo MeanDecreaseAccuracy MeanDecreaseGini
#> bl  0.453      0.38  0.088                0.306               86
#> bd  0.080      0.13  0.304                0.170               38
#> fl  0.131      0.13  0.345                0.206               71
#> bm  0.016      0.13  0.152                0.088               17

Globally, bl, fl are most important, and to a lesser extent, bd and even lesser extent bm. (Note: one needs to play with sign.)

lin1 <- penguins_rf_cl$importance[,4]/
           sqrt(sum(penguins_rf_cl$importance[,4]^2))
# Flip sign of one
lin1[2] <- -lin1[2]
proj1 <- as.matrix(penguins_sub[,-1])%*%as.matrix(lin1)
penguins_pred <- penguins_sub %>% 
  mutate(proj1=proj1)

Variable importance (by class)

#>    Adelie Chinstrap Gentoo MeanDecreaseAccuracy MeanDecreaseGini
#> bl  0.453      0.38  0.088                0.306               86
#> bd  0.080      0.13  0.304                0.170               38
#> fl  0.131      0.13  0.345                0.206               71
#> bm  0.016      0.13  0.152                0.088               17

Note: one still needs to play with sign.

Radial tours: `bd` is very important

A radial tour changes the contribution for one variable, reducing it to 0, and then back to original.

At right, coefficient for bd is being changed.

When it is 0, gap between Gentoo and others is smaller, implying that bd is very important.

but `bl` is less important

At right, the small contribution for bl is reduced to zero, which does not change the gap between Gentoo and others.

Thus bl can be removed from the projection, to make a simpler but equally effective combination of variables.

Local explanations

Explainable Artificial Intelligence (XAI) is an emerging field of research that provides methods for the interpreting of black box models¹.

A common approach is to use local explanations, which attempt to approximate linear variable importance in the neighbourhood each observation.

Fitted model may be highly nonlinear. Overall linear projection will not accurately represent the fit in all subspaces.

Compute SHAP values

Each observation has a variable importance, computed using the SHAP method, see Molnar (2022) Interpretable Machine Learning.

Comparing all local explanations

SHAP examines predictions while varying one variable and holding others fixed at mean.

Observations with different local explanations from the rest of their group are likely the misclassified cases.

And from the parallel coordinate plot can be seen which variables are most contributing to this.

Using R package cheem

Gentoo penguin, mistaken as a Chinstrap, model used small component of fl, unlike the other Gentoo but more like the bulk of the Chinstrap penguins. That was a mistake!

What we learn

With fl it looks more like a Chinstrap.

With only bl and bd it looks like it’s own species.

Summary

Local explanations tell us how a prediction is constructed, for any observation.
User-controlled, interactive radial tours are useful to check the local explanations, and better understand a model fit in local neighbourhoods.

Acknowledgements

Slides produced using quarto.

Slides available from https://github.com/dicook/ASC_2023.

Viewable at https://github.com/dicook/ASC_2023/slides.html.

Visually exploring local explanations to understand complex machine learning models

Outline

Palmer penguins

Examine the penguins using a tour

Examine the penguins using a tour

Tour projections are biplots

Tour projections are biplots

Tour projections are biplots

Fit a classification model

Variable importance (globally)

Variable importance (by class)

Radial tours: bd is very important

but bl is less important

Local explanations

Compute SHAP values

Comparing all local explanations

Using R package cheem

What we learn

Summary

Further reading

Acknowledgements

Radial tours: `bd` is very important

but `bl` is less important