Scaling log-linear analysis to datasets with thousands of variables

Geoff Webb, Faculty of Information Technology, Monash University

Association discovery is a fundamental data mining task. The primary statistical approach to association discovery between variables is log-linear analysis. Classical approaches to log-linear analysis do not scale beyond about ten variables. By melding the state-of-the-art in statistics, graphical modeling, and data mining research, we have developed efficient and effective algorithms for log-linear analysis, performing in seconds log-linear analysis of datasets with thousands of variables and providing a powerful statistically-sound method for creating compact models of complex high-dimensional multivariate distributions.