leADS: improved metabolic pathway inference based on active dataset subsampling

2021 
Metabolic pathways are composed of reaction sequences catalyzed by enzymes. The set of reactions within and between cells comprises a reactome. Pathways and reactomes can be predicted from organismal or multi-organismal genomes using rule-based or machine learning methods. While machine learning methods overcome issues of probability and scale associated with rule-based methods, several complications remain that can degrade performance including inadequately labeled training data, missing feature information, and inherent imbalances in the distribution of pathways within a dataset. Here, we present leADS (multi-label learning based on active dataset subsampling), a machine learning method, that uses subsampling to reduce the negative impact of training loss due to class imbalance. We demonstrate leADs performance using organismal and multi-organismal datasets in relation to other machine learning pathway prediction methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    2
    Citations
    NaN
    KQI
    []