In this protocol the plot of the real data is embedded amongst a field of plots of data generated to be consistent with some null hypothesis. If the observe can pick the real data as different from the others, this lends weight to the statistical significance of the structure in the plot. The protocol is described in Buja, Cook, Hofmann, Lawrence, Lee, Swayne, Wickham (2009) Statistical inference for exploratory data analysis and model diagnostics, Phil. Trans. R. Soc. A, 367, 4361-4383.

lineup(method, true = NULL, n = 20, pos = sample(n, 1), samples = NULL)

Arguments

method

method for generating null data sets

true

true data set. If NULL, find_plot_data will attempt to extract it from the current ggplot2 plot.

n

total number of samples to generate (including true data)

pos

position of true data. Leave missing to pick position at random. Encryped position will be printed on the command line, decrypt to understand.

samples

samples generated under the null hypothesis. Only specify this if you don't want lineup to generate the data for you.

Details

Generate n - 1 null datasets and randomly position the true data. If you pick the real data as being noticeably different, then you have formally established that it is different to with p-value 1/n.

Examples

library(ggplot2)
ggplot(lineup(null_permute('mpg'), mtcars), aes(mpg, wt)) +
  geom_point() +
  facet_wrap(~ .sample)
#> decrypt("k4ZV xo3o Nm QPuN3NPm Rv")

ggplot(lineup(null_permute('cyl'), mtcars),
       aes(mpg, .sample, colour = factor(cyl))) +
       geom_point()
#> decrypt("k4ZV xo3o Nm QPuN3NPm v1")