Optimal Spaced Seeds for Faster Approximate String Matching

  • Martin Farach-Colton
  • Gad M. Landau
  • S. Cenk Sahinalp
  • Dekel Tsur
Conference paper

DOI: 10.1007/11523468_101

Part of the Lecture Notes in Computer Science book series (LNCS, volume 3580)
Cite this paper as:
Farach-Colton M., Landau G.M., Sahinalp S.C., Tsur D. (2005) Optimal Spaced Seeds for Faster Approximate String Matching. In: Caires L., Italiano G.F., Monteiro L., Palamidessi C., Yung M. (eds) Automata, Languages and Programming. ICALP 2005. Lecture Notes in Computer Science, vol 3580. Springer, Berlin, Heidelberg

Abstract

Filtering is a standard technique for fast approximate string matching in practice.In filtering, a quick first step is used to rule out almost all positions of a text as possible starting positions for a pattern. Typically this step consists of finding the exact matches of small parts of the pattern. In the followup step, a slow method is used to verify or eliminate each remaining position. The running time of such a method depends largely on the quality of the filtering step, as measured by its false positives rate. The quality of such a method depends on the number of true matches that it misses, that is, on its false negative rate.

A spaced seed is a recently introduced type of filter pattern that allows gaps (i.e. don’t cares) in the small sub-pattern to be searched for. Spaced seeds promise to yield a much lower false positives rate, and thus have been extensively studied, though heretofore only heuristically or statistically.

In this paper, we show how to optimally design spaced seeds that yield no false negatives.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Martin Farach-Colton
    • 1
  • Gad M. Landau
    • 2
  • S. Cenk Sahinalp
    • 3
  • Dekel Tsur
    • 4
  1. 1.Dept. of Computer Science and DIMACSRutgers University 
  2. 2.Dept. of Computer ScienceUniversity of Haifa 
  3. 3.School of Computing ScienceSimon Fraser University 
  4. 4.Dept. of Computer Science and EngineeringUniversity of CaliforniaSan Diego

Personalised recommendations