language-icon Old Web
English
Sign In

Compressed pattern matching

In computer science, compressed pattern matching (abbreviated as CPM) is the process of searching for patterns in compressed data with little or no decompression. Searching in a compressed string is faster than searching an uncompressed string and requires less space. In computer science, compressed pattern matching (abbreviated as CPM) is the process of searching for patterns in compressed data with little or no decompression. Searching in a compressed string is faster than searching an uncompressed string and requires less space. If the compressed file uses a variable width encoding it could be present a problem: for example, let “100” be the codeword for a and let “110100” be the codeword for b. If we are looking for an occurrence of a in the text we could obtain as result also an occurrence that is within the codeword of b: we call this event false match. So we have to verify if the occurrence detected is effectively aligned on a codeword boundary. However we could always decode the entire text and then apply a classic string matching algorithm, but this usually requires more space and time and often is not possible, for example if the compressed file is hosted online. This problem of verifying the match returned by the compressed pattern matching algorithm is a true or a false match together with the impossibility of decoding an entire text is called the compressed matching problem.

[ "String searching algorithm", "Data compression", "Pattern matching", "compression" ]
Parent Topic
Child Topic
    No Parent Topic