Find some low-dimensional layout of points which approximates the distance between points in high-dimensions, with the purpose being to have a useful representation that reveals high-dimensional patterns, like clusters.
Multidimensional scaling (MDS) is the original approach:
\[
\mbox{Stress}_D(x_1, ..., x_n) = \left(\sum_{i, j=1; i\neq j}^n (d_{ij} - d_k(i,j))^2\right)^{1/2}
\] where \(D\) is an \(n\times n\) matrix of distances \((d_{ij})\) between all pairs of points, and \(d_k(i,j)\) is the distance between the points in the low-dimensional space.
PCA is a special case of MDS. The result from PCA is a linear projection, but generally MDS can provide some non-linear transformation.
Many variations being developed:
- t-stochastic neighbourhood embedding (t-SNE): compares interpoint distances with a standard probability distribution (eg \(t\)-distribution) to exaggerate local neighbourhood differences.
- uniform manifold approximation and projection (UMAP): compares the interpoint distances with what might be expected if the data was uniformly distributed in the high-dimensions.
NLDR can be useful but it can also make some misleading representations.