Bioinformatics

An Efficient Bayesian Inference Framework for Coalescent-Based Nonparametric Phylodynamics

Lan, S., Palacios, J. A., Karcher, M., Minin, V. N., Shahbaba, B..

Motivation: The field of phylodynamics focuses on the problem of reconstructing population size dynamics over time using current genetic samples taken from the population of interest. This technique has been extensively used in many areas of biology, but is particularly useful for studying the spread of quickly evolving infectious diseases agents, e.g., influenza virus. Phylodynamic inference uses a coalescent model that defines a probability density for the genealogy of randomly sampled individuals from the population. When we assume that such a genealogy is known, the coalescent model, equipped with a Gaussian process prior on population size trajectory, allows for nonparametric Bayesian estimation of population size dynamics. While this approach is quite powerful, large data sets collected during infectious disease surveillance challenge the state-of-the-art of Bayesian phylodynamics and demand inferential methods with relatively low computational cost.

Results: To satisfy this demand, we provide a computationally efficient Bayesian inference framework based on Hamiltonian Monte Carlo for coalescent process models. Moreover, we show that by splitting the Hamiltonian function we can further improve the efficiency of this approach. Using several simulated and real datasets, we show that our method provides accurate estimates of population size dynamics and is substantially faster than alternative methods based on elliptical slice sampler and Metropolis-adjusted Langevin algorithm.

Availability: The R code for all simulation studies and real data analysis conducted in this paper are publicly available at http://www.ics.uci.edu/~slan/lanzi/CODES.html and in the R package phylodyn available at https://github.com/mdkarcher/phylodyn.

Contact: Shiwei Lan – S.Lan@warwick.ac.uk, Babak Shahbaba – babaks@uci.edu