Three-Layer Dynamic Transfer Learning Language Model for E. Coli Promoter Classification.

2020 
Classification of functional genomic regions (such as promoters or enhancers) based on sequence data alone is a very important problem. Various data mining algorithms can be used well to apply to predict the promoter region. For example, association and clustering algorithms like Classification And Regression Tree (CART), machine learning algorithms like Simple Logistic, BayesNet, Random forest, or the most popular deep learning like Recurrent Neural Network (RNN), Convolutional Neural Networks (CNN). However, due to large amount of genetic data are unlabeled, these methods cannot directly solve this challenge. Therefore, we present a three-layer dynamic transfer learning language model (TLDTLL) for E. coli promoter classification problems. TLDTLL is an effective algorithm for inductive transfer learning that utilizes pre-training on large unlabeled genomic corpuses. This is particularly advantageous in the context of genomics data, which tends to contain significant volumes of unlabeled data. TLDTLL shows improved results over existing methods for classification of E. coli promoters using only sequence data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []