Neural Edit Operations For Biological Sequences

Authors:
Satoshi Koide Toyota Central R&D; Labs.
Keisuke Kawano Toyota Central R&D; Labs., Inc
Takuro Kutsuna Toyota Central R&D; Labs. Inc.

Introduction:

The evolution of biological sequences, such as proteins or DNAs, is driven by the three basic edit operations: substitution, insertion, and deletion.Motivated by the recent progress of neural network models for biological tasks, the authors implement two neural network architectures that can treat such edit operations.

Abstract:

The evolution of biological sequences, such as proteins or DNAs, is driven by the three basic edit operations: substitution, insertion, and deletion. Motivated by the recent progress of neural network models for biological tasks, we implement two neural network architectures that can treat such edit operations. The first proposal is the edit invariant neural networks, based on differentiable Needleman-Wunsch algorithms. The second is the use of deep CNNs with concatenations. Our analysis shows that CNNs can recognize star-free regular expressions, and that deeper CNNs can recognize more complex regular expressions including the insertion/deletion of characters. The experimental results for the protein secondary structure prediction task suggest the importance of insertion/deletion. The test accuracy on the widely-used CB513 dataset is 71.5%, which is 1.2-points better than the current best result on non-ensemble models.

You may want to know: