An Efficient Method to Quantify Structural Distributions in Heterogeneous cryo-EM Datasets

2021 
Abstract Cryogenic Electron Microscopy (cryo-EM) preserves the ensemble of protein conformations in solution and thus provide a promising way to characterize conformational changes underlying protein functions. However, it remains challenging for existing software to elucidate distributions of multiple conformations from a heterogeneous cryo-EM dataset. We developed a new algorithm: Linear Combinations of Template Conformations (LCTC) to obtain distributions of multiple conformations from cryo-EM datasets. LCTC assigns 2D images to the template 3D structures obtained by Multi-body Re-finement of RELION via a novel two-stage matching algorithm. Specifically, an initial rapid assignment of experimental 2D images to template 2D images was applied based on auto-correlation functions of image contours that can efficiently remove the majority of irrelevant 2D images. This is followed by pixel-pixel matching of images with fewer number of 2D images, which can accurately assign the 2D images to the template images. We validate the LCTC method by demonstrating that it can accurately reproduce the distributions of 3 Thermus aquaticus (Taq) RNA polymerase (RNAP) structures with different degrees of clamp opening from a simulated cryo-EM dataset, in which the correct distributions are known. For this dataset, we also show that LCTC greatly outperforms clustering-based Manifold Embedding and Maximum Likelihood-based Multi-body Re-finement algorithms in terms of reproducing the structural distributions. Lastly, we also successfully applied LCTC to reveal the populations of various clamp-opening conformations from an experimental Escherichia coli RNAP cryo-EM dataset. Source code is available at https://github.com/ghl1995/LCTC.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    0
    Citations
    NaN
    KQI
    []