Expanding the Molecular Alphabet of DNA-Based Data Storage Systems with Neural Network Nanopore Readout Processing

2021 
DNA is a promising next-generation data storage medium, but the recording latency and synthesis cost of oligos using the four natural nucleotides remain high. Here, we describe an improved DNA-based storage system that uses an extended 11-letter molecular alphabet combining natural and chemically modified nucleotides. Our extended-alphabet molecular storage paradigm offers a nearly two-fold increase in storage density and potentially the same order of reduction in the recording time. Experimental results involving a library of 77 custom-designed hybrid sequences reveal that one can readily detect and discriminate different combinations and orders of monomers via MspA nanopores. Furthermore, a neural network architecture designed to classify raw current signals generated by Oxford Nanopore Technologies sequencing ensures an average accuracy exceeding 60%, which is 39 times higher than that of random guessing. Molecular dynamics simulations reveal that the majority of modified nucleotides do not induce dramatic disruption of the DNA double helix, making the extended alphabet system potentially compatible with PCR-based random access data retrieval. The methodologies proposed provide a forward path for new implementations of molecular recorders.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    38
    References
    1
    Citations
    NaN
    KQI
    []