Skip to contents

Data

Sample data sets for examples.

aflw
AFLW player statistics
anomaly1 anomaly2 anomaly3 anomaly4 anomaly5
Data sets with anomalies
assoc1 assoc2 assoc3
Data sets with different types of association
box
3D plane in 5D
bushfires
Australian bushfires 2019-2020
c1 c2 c3 c4 c5 c6 c7
Cluster challenge data sets
clusters_nonlin
Four unusually shaped clusters in 4D
clusters
Three clusters in 5D
multicluster
Multiple clusters of different sizes, shapes and distance from each other
pisa
PISA scores
plane_nonlin
Non-linear relationship in 5D
plane
2D plane in 5D
simple_clusters
Two clusters in 2D
sketches_test
Images of sketches for testing
sketches_train
Images of sketches for training

Utility

Useful functions

calc_mv_dist()
Compute Mahalanobis distances between all pairs of observations
calc_norm()
Calculate the norm of a vector
convert_proj_tibble()
This function turns a projection sequence into a tibble
gen_vc_ellipse()
Generate points on the surface of an ellipse
gen_xvar_ellipse()
Ellipse matching data center and variance
norm_vec()
Normalise a vector to have length 1
pooled_vc()
Compute pooled variance-covariance matrix
rmvn()
Generate a sample from a multivariate normal
ggslice()
Generate an axis-parallel slice display
ggslice_projection()
Generate slice display

Principal Component Analysis

Useful functions for PCA

ggscree()
This function produces a simple scree plot
pca_model()
Create wire frame of PCA model

Clustering

Useful functions for cluster analysis

ggmcbic()
Produces an mclust summary plot with ggplot
hierfly()
Generate a dendrogram to be added to data
mc_ellipse()
Computes the ellipses of an mclust model
som_model()
Process the output from SOM to display the map and data