Near-perfect Phylogeny Construction from Genetic Variation Data
Russell Schwartz
Carnegie Mellon University
The recent accumulation of large amounts of genotype data from human populations has created an unprecedented opportunity for inferring the history of the human species, identifying the forces shaping our genome, and finding links between genotype and phenotype. Making full use of these data will, however, require substantial improvements in algorithms for large-scale phylogenetic inference. Fast algorithms are available for inferring maximally parsimonious mutational histories from genetic variation data under various restrictive assumptions. perfect model, which assumes that each genetic variant appears only once within a s history, has been particularly important to developing phylogenetic algorithms suitable for high-throughput analysis. But such restrictive models are not robust to the variability observed in real data sets. Heuristic methods that may give solutions far from optimal thus remain the standard in practice.
In this talk, we will cover recent developments in algorithms for optimal phylogenetic inference based on the search near- phylogenies, those that come close to satisfying the perfect phylogeny assumption. Allowing even small deviations from perfection allows the models to describe substantially more and larger data sets. We will examine a recent algorithm proving fixed parameter tractability for the problem of binary parsimony-based inference when the optimal phylogeny differs from a perfect phylogeny by a small number of additional mutations. We will then cover a variation of the method for the practically important case of unphased genotype data, in which contributions from homologous chromosomes are not separated in the input data. This problem can also be solved in polynomial time when the deviation from perfection is bounded. Finally, we describe several applications of these methods on a selection of real variation data.
This work was joint with Srinath Sridhar, Kedar Dhamdhere, Guy Blelloch, Eran Halperin, and R. Ravi.