Skip to main content

Efficient Similarity Search for Hierarchical Data in Large Databases

  • Conference paper
Advances in Database Technology - EDBT 2004 (EDBT 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2992))

Included in the following conference series:

Abstract

Structured and semi-structured object representations are getting more and more important for modern database applications. Examples for such data are hierarchical structures including chemical compounds, XML data or image data. As a key feature, database systems have to support the search for similar objects where it is important to take into account both the structure and the content features of the objects. A successful approach is to use the edit distance for tree structured data. As the computation of this measure is NP-complete, constrained edit distances have been successfully applied to trees. While yielding good results, they are still computationally complex and, therefore, of limited benefit for searching in large databases. In this paper, we propose a filter and refinement architecture to overcome this problem. We present a set of new filter methods for structural and for content-based information in tree-structured data as well as ways to flexibly combine different filter criteria. The efficiency of our methods, resulting from the good selectivity of the filters is demonstrated in extensive experiments with real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jiang, T., Wang, L., Zhang, K.: Alignment of trees - an alternative to tree edit. In: Crochemore, M., Gusfield, D. (eds.) CPM 1994. LNCS, vol. 807, pp. 75–86. Springer, Heidelberg (1994)

    Google Scholar 

  2. Selkow, S.: The tree-to-tree editing problem. Information Processing Letters 6, 576–584 (1977)

    Article  MathSciNet  Google Scholar 

  3. Zhang, K.: A constrained editing distance between unordered labeled trees. Algorithmica 15, 205–222 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  4. Wang, J.T.L., Zhang, K., Chang, G., Shasha, D.: Finding approximate pattersn in undirected acyclic graphs. Pattern Recognition 35, 473–483 (2002)

    Article  MATH  Google Scholar 

  5. Nierman, A., Jagadish, H.V.: Evaluating structural similarity in XML documents. In: Proc. 5th Int. Workshop on the Web and Databases (WebDB 2002), Madison, Wisconsin, USA, pp. 61–66 (2002)

    Google Scholar 

  6. Sebastian, T.B., Klein, P.N., Kimia, B.B.: Recognition of shapes by editing shock graphs. In: Proc. 8th Int. Conf. on Computer Vision (ICCV 2001), Vancouver, BC, Canada, vol. 1, pp. 755–762 (2001)

    Google Scholar 

  7. Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters 19, 255–259 (1998)

    Article  MATH  Google Scholar 

  8. Chartrand, G., Kubicki, G., Schultz, M.: Graph similarity and distance in graphs. Aequationes Mathematicae 55, 129–145 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  9. Kubicka, E., Kubicki, G., Vakalis, I.: Using graph distance in object recognition. In: Proc. ACM Computer Science Conference, pp. 43–48 (1990)

    Google Scholar 

  10. Papadopoulos, A., Manolopoulos, Y.: Structure-based similarity search with graph histograms. In: Proc. DEXA/IWOSS Int.Workshop on Similarity Search, pp. 174–178 (1999)

    Google Scholar 

  11. Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics-Doklady 10, 707–710 (1966)

    MathSciNet  Google Scholar 

  12. Wagner, R.A., Fisher, M.J.: The string-to-string correction problem. Journal of the ACM 21, 168–173 (1974)

    Article  MATH  Google Scholar 

  13. Zhang, K., Statman, R., Shasha, D.: On the editing distance between unordered labeled trees. Information Processing Letters 42, 133–139 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  14. Zhang, K., Wang, J., Shasha, D.: On the editing distance between undirected acyclic graphs. International Journal of Foundations of Computer Science 7, 43–57 (1996)

    Article  MATH  Google Scholar 

  15. Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993)

    Google Scholar 

  16. Seidl, T., Kriegel, H.P.: Optimal multi-step k-nearest neighbor search. In: Haas, L.M., Tiwary, A. (eds.) Proc. ACM SIGMOD Int. Conf. on Managment of Data, pp. 154–165. ACM Press, New York (1998)

    Google Scholar 

  17. Berchtold, S., Keim, D., Kriegel, H.P.: The X-tree: An index structure for high-dimensional data. In: 22nd Conference on Very Large Databases, Bombay, India, pp. 28–39 (1996)

    Google Scholar 

  18. Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.: Searching in metric spaces. ACM Computing Surveys 33, 273–321 (2001)

    Article  Google Scholar 

  19. Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: VLDB 1997, Proc. 23rd Int. Conf. on Very Large Databases, Athens, Greece, August 25-29, pp. 426–435 (1997)

    Google Scholar 

  20. Wang, J., Zhang, K., Jeong, K., Shasha, D.: A system for approximate tree matching. IEEE Transactions on Knowledge and Data Engineering 6, 559–571 (1994)

    Article  Google Scholar 

  21. Ester, M., Kriegel, H.P., Schubert, M.: Web site mining: A new way to spot competitors, customers and suppliers in the world wide web. In: Proc. 8th Int. Conf on Knowledge Discovery in Databases (SIGKDD 2002), Edmonton, Alberta, Canada, pp. 249–258 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kailing, K., Kriegel, HP., Schönauer, S., Seidl, T. (2004). Efficient Similarity Search for Hierarchical Data in Large Databases. In: Bertino, E., et al. Advances in Database Technology - EDBT 2004. EDBT 2004. Lecture Notes in Computer Science, vol 2992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24741-8_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24741-8_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21200-3

  • Online ISBN: 978-3-540-24741-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics