Structure-Based Protein Function Prediction using Graph Convolutional Networks

2020 
The large number of available sequences and the diversity of protein functions challenge current experimental and computational approaches to determining and predicting protein function. We present a deep learning Graph Convolutional Network (GCN) for predicting protein functions and concurrently identifying functionally important residues. This model is initially trained using experimentally determined structures from the Protein Data Bank (PDB) but has significant de-noising capability, with only a minor drop in performance observed when structure predictions are used. We take advantage of this denoising property to train the model on > 200,000 protein structures, including many homology-predicted structures, greatly expanding the reach and applications of the method. Our model learns general structure-function relationships by robustly predicting functions of proteins with ≤ 40% sequence identity to the training set. We show that our GCN architecture predicts functions more accurately than Convolutional Neural Networks trained on sequence data alone and previous competing methods. Using class activation mapping, we automatically identify structural regions at the residue-level that lead to each function prediction for every confidently predicted protein, advancing site-specific function prediction. We use our method to annotate PDB and SWISS-MODEL proteins, making several new confident function predictions spanning both fold and function classifications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    66
    References
    20
    Citations
    NaN
    KQI
    []