Skip to main content

ESP-Index: A Compressed Index Based on Edit-Sensitive Parsing

  • Conference paper
String Processing and Information Retrieval (SPIRE 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7024))

Included in the following conference series:

Abstract

We propose a compressed self-index based the edit-sensitive parsing (ESP). Given a string S, its ESP tree is equivalent to a context-free grammar deriving just S, which can be represented as a DAG G. Finding pattern P in S is reduced to embedding P into G. Succinct data structures are adopted and G is then decomposed into two LOUDS bit strings and a single array for permutation, requiring (1 + ε)nlogn + 4n + o(n) bits for any 0 < ε < 1 where n corresponds to the number of different symbols in the grammar. The time to count the occurrences of P in S is in \(O(\frac{\log^*\hspace{-.9mm} u}{\varepsilon}(m\log n+occ_c(\log m\log u)))\), where m = |P|, u = |S|, and occ c is the number of occurrences of a maximal common subtree in ESP trees of P and S. Using an additional array in n logu bits of space, our index supports locating P and displaying substring of S. Locating time is the same as counting time and displaying time for a substring of length m is O(m + logu).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cilibrasi, R., Vitanyi, P.M.B.: Clustering by compression. IEEE Transactions on Information Theory 51(4), 1523–1545 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  2. Claude, F., Navarro, G.: Self-indexed text compression using straight-line programs. In: Královič, R., Niwiński, D. (eds.) MFCS 2009. LNCS, vol. 5734, pp. 235–246. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  3. Cormode, G., Muthukrishnan, S.: The string edit distance matching problem with moves. ACM Trans. Algor. 3(1), Article 2 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  4. Delpratt, O., Rahman, N., Raman, R.: Engineering the LOUDS succinct tree representation. In: Àlvarez, C., Serna, M. (eds.) WEA 2006. LNCS, vol. 4007, pp. 134–145. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Munro, J.I., Raman, R., Raman, V., Rao, S.S.: Succinct representations of permutations. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 345–356. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  6. Navarro, G.: Indexing text using the ziv-lempel tire. Journal of Discrete Algorithms 2(1), 87–114 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  7. Navarro, G., Makinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1), Article 2 (2007)

    Article  MATH  Google Scholar 

  8. Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. J. Algorithms 48(2), 294–313 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  9. Shapira, D., Storer, J.A.: Edit distance with move operations. J. Discrete Algorithms 5(2), 380–392 (2007)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Maruyama, S., Nakahara, M., Kishiue, N., Sakamoto, H. (2011). ESP-Index: A Compressed Index Based on Edit-Sensitive Parsing. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds) String Processing and Information Retrieval. SPIRE 2011. Lecture Notes in Computer Science, vol 7024. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24583-1_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24583-1_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24582-4

  • Online ISBN: 978-3-642-24583-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics