Skip to main content

Fgram-Tree: An Index Structure Based on Feature Grams for String Approximate Search

  • Conference paper
Web-Age Information Management (WAIM 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7418))

Included in the following conference series:

Abstract

String approximate search is widely used in many areas. Indexing is no doubt a feasible way for efficient approximate string searching. However, the existing index structures have a common weakness that they do not obey the nature of the index which is a function by mapping different data to different index items, similar data to similar index items, in order to query easily. In this paper, we propose a new type of string indexing structure called Fgram-Tree, which is based on feature grams to build itself and filter strings. It obeys the two maps by placing similar strings into the same node, different strings into different nodes that could greatly improve the efficiency of index. Our index is able to support for different types of search. Compared to other index, it provides high scalability and fast response time.

This paper was partially supported by NGFR 973 grant 2012CB316200 and NSFC grant 61003046, 6111113089. Doctoral Fund of Ministry of Education of China (No. 20102302120054).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition, pp. 159–165 (1990)

    Google Scholar 

  2. Zhang, Z., Hadjieleftheriou, M., Ooi, B.C., Srivastava, D.: Bed-tree:an all-purpose index structure for string similarity search based on edit distance. In: SIGMOD (2010)

    Google Scholar 

  3. Hadjieleftheriou, M., Koudas, N., Srivastava, D.: Incremental maintenance of length normalized indexes for approximate string matching. In: SIGMOD, pp. 429–440 (2009)

    Google Scholar 

  4. Yang, X., Wang, B., Li, C.: Cost-Based Variable-Length-Gram Selection for String Collections to Support Approximate Queries Efficiently. In: SIGMOD (2008)

    Google Scholar 

  5. Li, C., Wang, B., Yang, X.: Improving performance of approximate queries on string collections using variable-length grams. In: VLDB (2007)

    Google Scholar 

  6. Xiao, C., Wang, W., Lin, X.: (Ed-Join)–an efficient algorithm for similarity joins with edit distance constraints. In: VLDB (2008)

    Google Scholar 

  7. Xiao, C., Wang, W., Lin, X.: (PPjoin)Efficient similarity joins for near duplicate detection. In: Proceedings of the International World Wide Web Conference Committee (2008)

    Google Scholar 

  8. Hadjieleftheriou, M., Chandel, A., Koudas, N., Srivastava, D.: Fast indexes and algorithms for set similarity selection queries. In: ICDE (2008)

    Google Scholar 

  9. Behm, A., Ji, S., Li, C., Lu, J.: Space-constrained gram-based indexing for efficient approximate string search. In: ICDE (2009)

    Google Scholar 

  10. Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: VLDB, pp. 918–929 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tong, X., Wang, H. (2012). Fgram-Tree: An Index Structure Based on Feature Grams for String Approximate Search. In: Gao, H., Lim, L., Wang, W., Li, C., Chen, L. (eds) Web-Age Information Management. WAIM 2012. Lecture Notes in Computer Science, vol 7418. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32281-5_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32281-5_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32280-8

  • Online ISBN: 978-3-642-32281-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics