13  Overview

Methods for supervised classification originated in the field of Statistics in the early nineteenth century, under the moniker discriminant analysis (see, for example, Fisher (1936a)). An increase in the collection of data, and storage in databases, in the late twentieth century has inspired a growing desire to extract knowledge from data, particularly to be able accurately predict the class labels. This has contributed to an explosion of research on new methods, especially on algorithms that focus on accurate prediction of new data based on training samples.

In contrast to unsupervised classification, the class label (categorical response variable) is known, in the training sample. The training sample is used to build the prediction model, and also to estimate the accuracy, or inversely error, of the model for future data. It is also important to understand the model and to interpret it, so that we can know how predictions are made. High-dimensional visualisation can help with this, and helps to tackle questions like:

Addressing these types of queries also motivate the emerging field called explainable artificial intelligence (XAI), which goes beyond predictive accuracy to more completely satisfy the desire to extract knowledge from data.

Although we focus on categorical response, some of the techniques here can be modified or adapted for problems with a numeric, or continuous, response variable. With a categorical response, and numerical predictors, we map colour to the response variable and use the tour to examine the relationship between predictors, and the different classes.

Figure 13.1: Examples of supervised classification patterns: (a) linearly separable, (b) linear but not completely separable, (c) non-linearly separable, (d) non-linear, but not completely separable.

Figure 13.1 shows some 2D examples where the two classes are (a) linearly separable, (b) not completely separable but linearly different, (c) non-linearly separable and (d) not completely separable but with a non-linear difference. We can also see that in (a) only the horizontal variable would be important for the model because the two classes are completely separable in this direction. Although the pattern in (c) is separable classes, most models would have difficulty capturing the separation. It is for this reason that it is important to understand the boundary between classes produced by a fitted model. In each of b, c, d it is likely that some observations would be misclassified. Identifying these cases, and inspecting where they are in the data space is important for understanding the model’s future performance.