Prediction of RBP binding sites on circRNAs using an LSTM-based deep sequence learning architecture.

2021 
Circular RNAs (circRNAs) are widely expressed in highly diverged eukaryotes. Although circRNAs have been known for many years, their function remains unclear. Interaction with RNA-binding protein (RBP) to influence post-transcriptional regulation is considered to be an important pathway for circRNA function, such as acting as an oncogenic RBP sponge to inhibit cancer. In this study, we design a deep learning framework, CRPBsites, to predict the binding sites of RBPs on circRNAs. In this model, the sequences of variable-length binding sites are transformed into embedding vectors by word2vec model. Bidirectional LSTM is used to encode the embedding vectors of binding sites, and then they are fed into another LSTM decoder for decoding and classification tasks. To train and test the model, we construct four datasets that contain sequences of variable-length binding sites on circRNAs, and each set corresponds to an RBP, which is overexpressed in bladder cancer tissues. Experimental results on four datasets and comparison with other existing models show that CRPBsites has superior performance. Afterwards, we found that there were highly similar binding motifs in the four binding site datasets. Finally, we applied well-trained CRPBsites to identify the binding sites of IGF2BP1 on circCDYL, and the results proved the effectiveness of this method. In conclusion, CRPBsites is an effective prediction model for circRNA-RBP interaction site identification. We hope that CRPBsites can provide valuable guidance for experimental studies on the influence of circRNA on post-transcriptional regulation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    1
    Citations
    NaN
    KQI
    []