From genomes to clinical outcomes: empowering clinicians and public health officers with rapid and accessible analysis of microbial genomic data

Anders Gonçalves da Silva, Torsten Seemann, Dieter Bulach, Jason Kwong, Timothy Stinear, Benjamin P. Howden

Microbiological Diagnostic Unit, Public Health Laboratory and Doherty Applied Microbial Genomics, Department of Microbiology and Immunology The Peter Doherty Institute for Infection and Immunity The University of Melbourne

Bacterial genomic data hold tremendous power to transform decision making in clinical and public health settings by side-stepping time-consuming lab assays and speeding up decision making. It can help pinpoint origin of outbreaks, help link apparently unrelated clinical cases, describe bacterial antibiotic resistance profiles, and identify the virulence elements contributing to morbidity and mortality. This information has impacts at multiple scales from tailoring treatment for individual patients, to focusing public health response and policy.

Currently, a major bottleneck towards reaching the full potential of genomic data in public health and the clinic is transforming the raw output of DNA sequencing instruments into information that is readily digestible by the individuals at the front-line, such as epidemiologists and clinicians. A crucial component of this pipeline is generating consistent, reproducible, visually pleasing, and rapidly interpretable reports that capture important aspects of the data, and empower the end-user to act.

In this talk, we will describe our solution to this problem: quandongr. This R package combines dplyr, ggplot2, and knitr to convert the output of our Nullarbor genomics pipeline into reports ready for use by clinicians and public health officers. Our goal was to make the reports (1) customizable - to address distinct requirements associated with different pathogens of interest, and different data consumers; (2) extensible - so novel analytical and graphical approaches could be easily added to the core report; (3) modular - so that reports could be easily built from different components; (4) reproducible - to ensure different labs generate the same output from the same data; and (5) flexible - to ensure that reports are consistent regardless of number of analysed isolates. We see this as a step toward facilitating genomic data analysis by the end-user (the biologist, the clinician, the public health epidemiologist), by abstracting the layers that go from raw sequence data to decision-making.