PIPENN: Protein Interface Prediction with an Ensemble of Neural Nets

2021 
MotivationProtein interactions play an essential role in many biological and cellular processes, such as protein-protein interaction (PPI) in signaling pathways, binding to DNA in transcription, and binding to small molecules in receptor activation or enzymatic activity. Experimental identification of protein binding interface residues is a time-consuming, costly, and challenging task. Several machine learning and other computational approaches exist which predict such interface residues. Here we explore if Deep Learning (DL) can be used effectively for this prediction task, and which learning strategies and architectures may be most efficient. We introduce seven DL architectures that are applied to eleven independent test sets, focused on the residues involved in PPI interfaces and in binding RNA/DNA and small molecule ligands. ResultsWe constructed a large data set dubbed BioDL, comprising protein-protein interaction data from the PDB and protein-ligand interactions (DNA, RNA and small molecules) from the BioLip database. Additionally, we reused our existing curated homo- and heteromeric PPI data sets. We performed several experiments to assess the impact of different data features, spatial forms, encoding schemes, network initializations, loss functions, regularization mechanisms, and activation functions on the performance of the predictors. Benchmarking the resulting DL models with an independent test set (ZK448) shows no single DL architecture performs best on all instances, but that an ensemble of DL architectures consistently achieves peak prediction performance. Our PIPENNs ensemble predictor outperforms current state-of-the-art sequence-based protein interface predictors on all interaction types, achieving AUCs of 0.718 (protein-protein), 0.823 (protein-nucleotide) and 0.842 (protein- small molecule) respectively. AvailabilitySource code and data sets at https://github.com/ibivu/pipenn/ Contactr.haydarlou@vu.nl
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    39
    References
    1
    Citations
    NaN
    KQI
    []