Exact and approximate pattern matching
A fundamental operation with strings is determining whether a pattern of characters or symbols occurs as a substring in a larger string called the text, or as an approximate subsequence in the text. This problem has been investigated since the early 1960s, not only for its theoretical importance in computer science but because it has many applications in information processing and biological sciences. In computer science, string pattern matching algorithms are used in database search and retrieval, text processing and editing, lexical analysis of computer programs, data compression, cryptography and other applications. In recent years, string matching algorithms have been used as powerful tools in the study of genomics and proteomics, in finding genes and regulatory motifs, and in comparative genomics, gene expression analysis and molecular evolutionary theory.
In this chapter we will begin in Section 7.1 by looking at classic exact pattern matching algorithms for uncompressed text, as some of the compresseddomain methods build on these. The section also looks at algorithms for pattern matching with “don't-care” characters, which allow for uncertainty in the search. We then discuss the compressed domain pattern matching problem in Section 7.2, with the main emphasis on the use of the Burrows-Wheeler Transform to aid compressed-domain pattern search. We will then move on to approximate pattern matching algorithms (Section 7.4), including “k-mismatch algorithms” which allow a fixed number of characters to be different between a pattern and the text it matches. Some uncompressed-domain approximate matching methods are introduced, and then implementations of BWT domain approximate pattern matching are described. It is also possible to design hardware algorithms to accelerate pattern matching, and these will be discussed briefly in Section 7.5.
KeywordsPattern Match Binary Search Edit Distance Longe Common Subsequence Longe Common Subsequence
Unable to display preview. Download preview PDF.