Representation learning applications in biological sequence analysis

Iuchi H,Matsutani T,Yamada K,Sumi S,Shion Hosoda,Zhao S,Tsukasa Fukunaga,Michiaki Hamada

Representation learning applications in biological sequence analysis

2021

Remarkable advances in high-throughput sequencing have resulted in rapid data accumulation, and analyzing biological (DNA/RNA/protein) sequences to discover new insights in biology has become more critical and challenging. To tackle this issue, the application of natural language processing (NLP) to biological sequence analysis has received increased attention, because biological sequences are regarded as sentences and k-mers in these sequences as words. Embedding is an essential step in NLP, which converts words into vectors. This transformation is called representation learning and can be applied to biological sequences. Vectorized biological sequences can be used for function and structure estimation, or as inputs for other probabilistic models. Given the importance and growing trend in the application of representation learning in biology, here, we review the existing knowledge in representation learning for biological sequence analysis.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

107

References

Citations