Fgram-Tree: An Index Structure Based on Feature Grams for String Approximate Search

Tong, Xing; Wang, Hongzhi

doi:10.1007/978-3-642-32281-5_24

Xing Tong²¹ &
Hongzhi Wang²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7418))

Included in the following conference series:

International Conference on Web-Age Information Management

1647 Accesses
3 Citations

Abstract

String approximate search is widely used in many areas. Indexing is no doubt a feasible way for efficient approximate string searching. However, the existing index structures have a common weakness that they do not obey the nature of the index which is a function by mapping different data to different index items, similar data to similar index items, in order to query easily. In this paper, we propose a new type of string indexing structure called Fgram-Tree, which is based on feature grams to build itself and filter strings. It obeys the two maps by placing similar strings into the same node, different strings into different nodes that could greatly improve the efficiency of index. Our index is able to support for different types of search. Compared to other index, it provides high scalability and fast response time.

This paper was partially supported by NGFR 973 grant 2012CB316200 and NSFC grant 61003046, 6111113089. Doctoral Fund of Ministry of Education of China (No. 20102302120054).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition, pp. 159–165 (1990)
Google Scholar
Zhang, Z., Hadjieleftheriou, M., Ooi, B.C., Srivastava, D.: Bed-tree:an all-purpose index structure for string similarity search based on edit distance. In: SIGMOD (2010)
Google Scholar
Hadjieleftheriou, M., Koudas, N., Srivastava, D.: Incremental maintenance of length normalized indexes for approximate string matching. In: SIGMOD, pp. 429–440 (2009)
Google Scholar
Yang, X., Wang, B., Li, C.: Cost-Based Variable-Length-Gram Selection for String Collections to Support Approximate Queries Efficiently. In: SIGMOD (2008)
Google Scholar
Li, C., Wang, B., Yang, X.: Improving performance of approximate queries on string collections using variable-length grams. In: VLDB (2007)
Google Scholar
Xiao, C., Wang, W., Lin, X.: (Ed-Join)–an efficient algorithm for similarity joins with edit distance constraints. In: VLDB (2008)
Google Scholar
Xiao, C., Wang, W., Lin, X.: (PPjoin)Efficient similarity joins for near duplicate detection. In: Proceedings of the International World Wide Web Conference Committee (2008)
Google Scholar
Hadjieleftheriou, M., Chandel, A., Koudas, N., Srivastava, D.: Fast indexes and algorithms for set similarity selection queries. In: ICDE (2008)
Google Scholar
Behm, A., Ji, S., Li, C., Lu, J.: Space-constrained gram-based indexing for efficient approximate string search. In: ICDE (2009)
Google Scholar
Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: VLDB, pp. 918–929 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Harbin Institute of Technology, China
Xing Tong & Hongzhi Wang

Authors

Xing Tong
View author publications
You can also search for this author in PubMed Google Scholar
Hongzhi Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, No. 92, West Dazhi Street, 150001, Heilongjiang, Harbin, China
Hong Gao
Information and Computer Science Department, University of Hawaii, 1680 East West Road, 96822, Honolulu, HI, USA
Lipyeow Lim
School of Computer Science, Fudan University, No. 220, Handan Road, 200433, Shanghai, China
Wei Wang
School of Computer Science and Technology, Sichuan University, No. 29 Jiuyanqiao Wangjing Road, 610064, Chengdu, Sichuan, China
Chuan Li
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon,, Hong Kong, China
Lei Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tong, X., Wang, H. (2012). Fgram-Tree: An Index Structure Based on Feature Grams for String Approximate Search. In: Gao, H., Lim, L., Wang, W., Li, C., Chen, L. (eds) Web-Age Information Management. WAIM 2012. Lecture Notes in Computer Science, vol 7418. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32281-5_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-32281-5_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32280-8
Online ISBN: 978-3-642-32281-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics