Efficient Similarity Search for Hierarchical Data in Large Databases

Kailing, Karin; Kriegel, Hans-Peter; Schönauer, Stefan; Seidl, Thomas

doi:10.1007/978-3-540-24741-8_39

Karin Kailing¹¹,
Hans-Peter Kriegel¹¹,
Stefan Schönauer¹¹ &
…
Thomas Seidl¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2992))

Included in the following conference series:

International Conference on Extending Database Technology

2155 Accesses
41 Citations

Abstract

Structured and semi-structured object representations are getting more and more important for modern database applications. Examples for such data are hierarchical structures including chemical compounds, XML data or image data. As a key feature, database systems have to support the search for similar objects where it is important to take into account both the structure and the content features of the objects. A successful approach is to use the edit distance for tree structured data. As the computation of this measure is NP-complete, constrained edit distances have been successfully applied to trees. While yielding good results, they are still computationally complex and, therefore, of limited benefit for searching in large databases. In this paper, we propose a filter and refinement architecture to overcome this problem. We present a set of new filter methods for structural and for content-based information in tree-structured data as well as ways to flexibly combine different filter criteria. The efficiency of our methods, resulting from the good selectivity of the filters is demonstrated in extensive experiments with real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jiang, T., Wang, L., Zhang, K.: Alignment of trees - an alternative to tree edit. In: Crochemore, M., Gusfield, D. (eds.) CPM 1994. LNCS, vol. 807, pp. 75–86. Springer, Heidelberg (1994)
Google Scholar
Selkow, S.: The tree-to-tree editing problem. Information Processing Letters 6, 576–584 (1977)
Article MathSciNet Google Scholar
Zhang, K.: A constrained editing distance between unordered labeled trees. Algorithmica 15, 205–222 (1996)
Article MATH MathSciNet Google Scholar
Wang, J.T.L., Zhang, K., Chang, G., Shasha, D.: Finding approximate pattersn in undirected acyclic graphs. Pattern Recognition 35, 473–483 (2002)
Article MATH Google Scholar
Nierman, A., Jagadish, H.V.: Evaluating structural similarity in XML documents. In: Proc. 5th Int. Workshop on the Web and Databases (WebDB 2002), Madison, Wisconsin, USA, pp. 61–66 (2002)
Google Scholar
Sebastian, T.B., Klein, P.N., Kimia, B.B.: Recognition of shapes by editing shock graphs. In: Proc. 8th Int. Conf. on Computer Vision (ICCV 2001), Vancouver, BC, Canada, vol. 1, pp. 755–762 (2001)
Google Scholar
Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters 19, 255–259 (1998)
Article MATH Google Scholar
Chartrand, G., Kubicki, G., Schultz, M.: Graph similarity and distance in graphs. Aequationes Mathematicae 55, 129–145 (1998)
Article MATH MathSciNet Google Scholar
Kubicka, E., Kubicki, G., Vakalis, I.: Using graph distance in object recognition. In: Proc. ACM Computer Science Conference, pp. 43–48 (1990)
Google Scholar
Papadopoulos, A., Manolopoulos, Y.: Structure-based similarity search with graph histograms. In: Proc. DEXA/IWOSS Int.Workshop on Similarity Search, pp. 174–178 (1999)
Google Scholar
Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics-Doklady 10, 707–710 (1966)
MathSciNet Google Scholar
Wagner, R.A., Fisher, M.J.: The string-to-string correction problem. Journal of the ACM 21, 168–173 (1974)
Article MATH Google Scholar
Zhang, K., Statman, R., Shasha, D.: On the editing distance between unordered labeled trees. Information Processing Letters 42, 133–139 (1992)
Article MATH MathSciNet Google Scholar
Zhang, K., Wang, J., Shasha, D.: On the editing distance between undirected acyclic graphs. International Journal of Foundations of Computer Science 7, 43–57 (1996)
Article MATH Google Scholar
Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993)
Google Scholar
Seidl, T., Kriegel, H.P.: Optimal multi-step k-nearest neighbor search. In: Haas, L.M., Tiwary, A. (eds.) Proc. ACM SIGMOD Int. Conf. on Managment of Data, pp. 154–165. ACM Press, New York (1998)
Google Scholar
Berchtold, S., Keim, D., Kriegel, H.P.: The X-tree: An index structure for high-dimensional data. In: 22nd Conference on Very Large Databases, Bombay, India, pp. 28–39 (1996)
Google Scholar
Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.: Searching in metric spaces. ACM Computing Surveys 33, 273–321 (2001)
Article Google Scholar
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: VLDB 1997, Proc. 23rd Int. Conf. on Very Large Databases, Athens, Greece, August 25-29, pp. 426–435 (1997)
Google Scholar
Wang, J., Zhang, K., Jeong, K., Shasha, D.: A system for approximate tree matching. IEEE Transactions on Knowledge and Data Engineering 6, 559–571 (1994)
Article Google Scholar
Ester, M., Kriegel, H.P., Schubert, M.: Web site mining: A new way to spot competitors, customers and suppliers in the world wide web. In: Proc. 8th Int. Conf on Knowledge Discovery in Databases (SIGKDD 2002), Edmonton, Alberta, Canada, pp. 249–258 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Computer Science, University of Munich,
Karin Kailing, Hans-Peter Kriegel & Stefan Schönauer
Department of Computer Science IX, RWTH Aachen University,
Thomas Seidl

Authors

Karin Kailing
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Peter Kriegel
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Schönauer
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Seidl
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Purdue University,
Elisa Bertino
Laboratory of Distributed Multimedia Information Systems and Applications, Technical University of Crete (MUSIC/TUC) Chania, 73100, Crete, Greece
Stavros Christodoulakis
Institute of Computer Science, FO.R.T.H., Vassilika Vouton, P.O. Box 1385, GR 71110, Heraklion, Greece
Dimitris Plexousakis
Department of Computer Science, University of Crete, P.O.Box 2208, GR 71409, Heraklion, Greece
Vassilis Christophides
National and Kapodistrian University of Athens, Greece
Manolis Koubarakis
IPD, Universität Karlsruhe, Am Fasanengarten 5, 76131, Karlsruhe,
Klemens Böhm
Department of Computer Science and Communication, University of Insubria, 22100, Varese, Italy
Elena Ferrari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kailing, K., Kriegel, HP., Schönauer, S., Seidl, T. (2004). Efficient Similarity Search for Hierarchical Data in Large Databases. In: Bertino, E., et al. Advances in Database Technology - EDBT 2004. EDBT 2004. Lecture Notes in Computer Science, vol 2992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24741-8_39

Download citation

DOI: https://doi.org/10.1007/978-3-540-24741-8_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21200-3
Online ISBN: 978-3-540-24741-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics