The k -mismatch problem revisited

Raphaël Clifford,Allyx Fontaine,Ely Porat,Benjamin Sach,Tatiana A. Starikovskaya

The k -mismatch problem revisited

2016

We revisit the complexity of one of the most basic problems in pattern matching. In the k -mismatch problem we must compute the Hamming distance between a pattern of length m and every m -length substring of a text of length n , as long as that Hamming distance is at most k. Where the Hamming distance is greater than k at some alignment of the pattern and text, we simply output "No". We study this problem in both the standard offline setting and also as a streaming problem. In the streaming k -mismatch problem the text arrives one symbol at a time and we must give an output before processing any future symbols. Our main results are as follows: • Our first result is a deterministic O ( nk 2 log k/m + n polylog m ) time offline algorithm for k -mismatch on a text of length n. This is a factor of k improvement over the fastest previous result of this form from SODA 2000 [9, 10]. • We then give a randomised and online algorithm which runs in the same time complexity but requires only O ( k 2 polylog m ) space in total. • Next we give a randomised (1 + e)-approximation algorithm for the streaming k -mismatch problem which uses O ( k 2 polylog m /e 2 ) space and runs in O (polylog m /e 2 ) worst-case time per arriving symbol. • Finally we combine our new results to derive a randomised O ( k 2 polylog m ) space algorithm for the streaming k -mismatch problem which runs in O ([EQUATION] log k + polylog m ) worst-case time per arriving symbol. This improves the best previous space complexity for streaming k -mismatch from FOCS 2009 [26] by a factor of k. We also improve the time complexity of this previous result by an even greater factor to match the fastest known offline algorithm (up to logarithmic factors).

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations