Computational-Experimental Inference of Protein Family Constraints
Chris Bailey-Kellogg
Dartmouth
Evolutionary pressures on proteins to maintain structure and function have constrained their sequences over time and across species. Sequence, structure, and function are also constrained via common evolutionary history. Since proteins evolve by sequence alteration and selection based on function, determining the origin and nature of these constraints is key to understanding both the evolutionary history of proteins and the principles behind their current function. We are developing an integrated computational-experimental mechanism to infer constraints in protein families. Our approach is based on probabilistic graphical models that rigorously and compactly integrate and represent prior information about a family, including residue conservation and correlation in a multiple sequence alignment, residue interactions and localizations in a structure, and functional annotations of family members. Such prior information often cannot fully characterize the constraints, e.g., because the sequence record does not contain sufficient instances of correlated residues or because the structure does not account for variability in residue interactions across members of the family. Thus experimental approaches are also required both as a discovery tool, and for testing and refining the constraints initially identified computationally. Our approach brings together these two key aspects: constructing and analyzing probabilistic models from existing information; and planning, conducting, and interpreting chimeric protein generation experiments to evaluate and refine the models.