Abstract
String approximate search is widely used in many areas. Indexing is no doubt a feasible way for efficient approximate string searching. However, the existing index structures have a common weakness that they do not obey the nature of the index which is a function by mapping different data to different index items, similar data to similar index items, in order to query easily. In this paper, we propose a new type of string indexing structure called Fgram-Tree, which is based on feature grams to build itself and filter strings. It obeys the two maps by placing similar strings into the same node, different strings into different nodes that could greatly improve the efficiency of index. Our index is able to support for different types of search. Compared to other index, it provides high scalability and fast response time.
This paper was partially supported by NGFR 973 grant 2012CB316200 and NSFC grant 61003046, 6111113089. Doctoral Fund of Ministry of Education of China (No. 20102302120054).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition, pp. 159–165 (1990)
Zhang, Z., Hadjieleftheriou, M., Ooi, B.C., Srivastava, D.: Bed-tree:an all-purpose index structure for string similarity search based on edit distance. In: SIGMOD (2010)
Hadjieleftheriou, M., Koudas, N., Srivastava, D.: Incremental maintenance of length normalized indexes for approximate string matching. In: SIGMOD, pp. 429–440 (2009)
Yang, X., Wang, B., Li, C.: Cost-Based Variable-Length-Gram Selection for String Collections to Support Approximate Queries Efficiently. In: SIGMOD (2008)
Li, C., Wang, B., Yang, X.: Improving performance of approximate queries on string collections using variable-length grams. In: VLDB (2007)
Xiao, C., Wang, W., Lin, X.: (Ed-Join)–an efficient algorithm for similarity joins with edit distance constraints. In: VLDB (2008)
Xiao, C., Wang, W., Lin, X.: (PPjoin)Efficient similarity joins for near duplicate detection. In: Proceedings of the International World Wide Web Conference Committee (2008)
Hadjieleftheriou, M., Chandel, A., Koudas, N., Srivastava, D.: Fast indexes and algorithms for set similarity selection queries. In: ICDE (2008)
Behm, A., Ji, S., Li, C., Lu, J.: Space-constrained gram-based indexing for efficient approximate string search. In: ICDE (2009)
Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: VLDB, pp. 918–929 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tong, X., Wang, H. (2012). Fgram-Tree: An Index Structure Based on Feature Grams for String Approximate Search. In: Gao, H., Lim, L., Wang, W., Li, C., Chen, L. (eds) Web-Age Information Management. WAIM 2012. Lecture Notes in Computer Science, vol 7418. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32281-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-32281-5_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32280-8
Online ISBN: 978-3-642-32281-5
eBook Packages: Computer ScienceComputer Science (R0)