Mass-Based Protein Phylogenetic Approach to Identify Epistasis.

2021 
A mass-based protein phylogeny method, known as phylonumerics, is described to build phylogenetic-like trees using a purpose-built MassTree algorithm. These trees are constructed from sets of numerical mass map data for each protein without the need for gene or protein sequences. Such trees have been shown to be highly congruent with conventional sequence-based trees and provide a reliable means to study the evolutionary history of organisms. Mutations determined from the differences in the mass of peptide pairs across different mass sets are computed by the algorithm and displayed at branch nodes across the tree. By definition, since the trees display a phylogeny representing expressed proteins, all mutations are non-synonymous. The frequency of these mutations and a mutation score based on a sum of these frequencies weighted based upon their position to the root of the tree are output. The algorithm also outputs lists of pairs of mutations separated along interconnected branches of the tree. Those which co-occur or which occur consecutively, or near consecutively, and that are separated by a distance less than the average distance for all mutation pairs, are putatively assigned to be epistatic pairs. These pairs are examined further with a focus on non-conservative substitutions given their importance in driving structural and functional change and protein and organismal evolution. The application of the method is demonstrated for the H3 hemagglutinin protein of type A human H3N2 strains of the influenza virus. The most frequent ancestral mutations within epistatic pairs occur within antigenic site domains while the descendant mutations occur either at other antigenic sites or elsewhere in the protein. Both predominate at reported glycosylation sites. The results for this protein further support a "small steps" evolutionary model for the influenza virus where non-conservative mutations that involve the least structural change are favored over those involving substantive change, which may risk the virus's own extinction.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    44
    References
    5
    Citations
    NaN
    KQI
    []