Multi-Label Classification Based On Subcellular Region-Guided Feature Description For Protein Localisation.

2021 
In this paper, we present a multi-label classification pipeline and a novel feature descriptor for the protein subcellular localisation. The challenge here is the development of a computational model that can classify multi-site proteins on a highly imbalanced dataset with a long-tail distribution and multi-label images. To address this challenge, we design a Location-Sorted Random Projections feature descriptor to represent image intensity and gradient of the protein of interest in reference to the correlated cellular region. Multilabel Synthetic Minority Over-sampling Technique is optimised to generate synthetic features with labels to handle class imbalance. Our method achieves the state-of-the-art performance on a large-scale public dataset and demonstrates excellent performance for the minority classes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    0
    Citations
    NaN
    KQI
    []