Hardness of String Similarity Search and Other Indexing Problems

  • S. Cenk Sahinalp
  • Andrey Utis
Conference paper

DOI: 10.1007/978-3-540-27836-8_90

Volume 3142 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Sahinalp S.C., Utis A. (2004) Hardness of String Similarity Search and Other Indexing Problems. In: Díaz J., Karhumäki J., Lepistö A., Sannella D. (eds) Automata, Languages and Programming. ICALP 2004. Lecture Notes in Computer Science, vol 3142. Springer, Berlin, Heidelberg

Abstract

Similarity search is a fundamental problem in computer science. Given a set of points A={A1,...,Ap} from a universe U and a distance measure D, it is possible to pose similarity search queries on a point Q in the form of nearest neighbors (find the string that has the smallest edit distance to a query string) or in the form of furthest neighbors (find the string that has the longest common subsequence with a query string).

Exact similarity search appears to be a very hard problem for most application domains; available solutions require either a preprocessing time/space exponential with p or query time exponential with |Q|. For such problems approximate solutions have recently attracted considerable attention. Approximate nearest (furthest) neighbor search aims to find a point in A whose distance to query point Q is within a small multiplicative factor of that between Q and its nearest (furthest) neighbor.

In this paper, we study hardness of several important similarity search problems for strings as well as other combinatorial objects, for which exact solutions have proven to be very difficult to achieve. We show here that even the approximate versions of these problems are quite hard; more specifically they are as hard as exact similarity search in Hamming space. Thus available cell probe lower bounds for exact similarity search in Hamming space apply for approximate similarity search in string spaces (under Levenshtein edit distance and longest common subsequence) as well as other spaces.

As a consequence of our reductions we also make observations about pairwise approximate distance computations. One such observation gives a simple linear time 2-approximation algorithm for permutation edit distance.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • S. Cenk Sahinalp
    • 1
  • Andrey Utis
    • 2
  1. 1.School of Computing ScienceSimon Fraser UniversityCanada
  2. 2.Department of Computer ScienceUniversity of MarylandCollege ParkUSA