An improved method for identification of pre-miRNA in Drosophila

2020 
Identification of microRNAs is important in studies of regulation of gene expression in many biologyical processes. In this study, we developed an improved method for identification of microRNAs in Drosophila . We used the iLearn, PyFeat, and Pse-in-One methods to extract the features and then used Max-Relevance-Max-Distance (MRMD2.0) and t-Distributed Stochastic Neighbour Embedding (t-SNE) to reduce dimension of the features and the random forest classifier in Weka to identify miRNAs. With this method, we found that the discriminative features for identification of pre-miRNAs were, in Drosophila melanogaster , the occurrences of G_GUG and C_AGU when the value of the feature vector was greater than 2, and in Drosophila pseudoobscura , the 4-tuple nucleotide composition and the occurrence of 4-length neighbouring nucleic acids when the value of the feature vector was less than 0.02. These vectors covered all compositional information or the frequency of bases. Classification results showed the classification accuracy was 95.7% and 93.6%, the precision rate was 95.8% and 93.6%, and the recall rate was 95.7% and 93.6% in Drosophila melanogaster and Drosophila pseudoobscura , respectively, which are higher than those reported in previous studies.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []