Protein-protein interactions decoys datasets for machine learning algorithm development

2021 
This is the most complete and diverse protein docking decoys set derived from the Benchmark5, Scorers_set. We used three different rigid-body docking programs to generate the decoys for the Bechmark5. We analyzed all docking decoys with more than 150 different scoring functions from different sources ( CCharppi, FreeSASA, CIPS, CONSRANK). We provide a balanced and unbalanced version of the data. This balanced data is intended for the training and test of machine learning algorithms. the unbalanced data is provided to simulated the real-world scenario. We also provide a set of rigid-body docking decoys from Interactome3D that spans 1391 interactions. We obtained the labels for this set using a weakly-supervised approach we called hAIkal. We used this data to augment the train data and improve machine learning classifiers.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []