Pattern Matching Compression Algorithm for DNA Sequences

2021 
Scientifically, the DNA contains hereditary information about the biological species. The DNA databases are considered to be the repository of a huge collection of DNA sequences. Ceaseless advancement in DNA researches has led to various key difficulties in the process of transferring, maintaining, as well as storing data. Such a huge size of DNA sequences often results in the requirement of abundant storage space which needs to be handled and minimized with the help of an effective methodology. One such technique being the data compression using which the DNA sequences can be minimized in size thus saving upon space and bandwidth requirement. There is a recommendation of Improved_Compress algorithm which aids in determining DNA compression along with biological sequence’s pattern matching. The matching information can be stored in an offline dictionary. Hence, the proposed algorithm Improved_Compress leads to an average better compression ratio of 89% when compared to the existing algorithm compression ratio through the dictionary-based method and encoded ASCII value with the National Center for Biotechnology Information (NCBI) datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    1
    Citations
    NaN
    KQI
    []