Approximate Hashing for Bioinformatics

Guy Arbitman,Shmuel T. Klein,Pierre Peterlongo,Dana Shapira

Approximate Hashing for Bioinformatics

2021

Guy Arbitman
Shmuel T. Klein
Pierre Peterlongo
Dana Shapira

The paper extends ideas from data compression by deduplication to the Bioinformatic field. The specific problems on which we show our approach to be useful are the clustering of a large set of DNA strings and the search for approximate matches of long substrings, both based on the design of what we call an approximate hashing function. The outcome of the new procedure is very similar to the clustering and search results obtained by accurate tools, but in much less time and with less required memory.

Keywords:

Data compression
outcome
Field (computer science)
Computer science
large set
Cluster analysis
Substring
Hash function
Data deduplication
Algorithm

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations