Simple tools for complex problems: making molehills out of mountains

Zoé van Havre, CSIRO

Statistical analysis in medical, biological, and environmental research often aims to draw meaningful stories from complex and incomplete data which measure effects and relationships whose drivers are rarely well understood. When dealing with data which may be influenced by underlying factors, measured or not measured, many possible sources of variability can influence the response. Analyses which attempt to encapsulate all sources of correlation, measuring all possible effects between covariates in elaborate and impressive models, may also depend upon rarely fulfilled assumptions of independence or normality. Bayesian methods can provide an alternative approach whereas the number of assumptions is reduced and the degree of certainty in a given result is available.

The current research focuses on the use of overfitted mixture models (specifically, Gaussian mixture models) as a tool for extracting data-driven, coherent stories from complex data with the aid of a study in Alzheimer's Disease. Overfitted mixtures provide a straightforward, theoretically supported approach for the analysis of data which comes from any number of underlying groups. They can be used both to estimate the number of groups and their parameters, but also to identify noise, outliers, skewness, and other artefacts common to real data. When combined with careful consideration of a specific research question, the hierarchical nature of the Bayesian result provides a framework from which specific features of interest can be easily extracted.

Results from this research demonstrate how the use of simple, low-assumption methods can sometimes help provide clear answers (or at least clues) to complex questions, as well as introduce recently developed tools and computational solutions for the analysis of overfitted mixture models.

Slides [Source files]