Advertisement

Similarity Search in Structured Data

  • Hans-Peter Kriegel
  • Stefan Schönauer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2737)

Abstract

Recently, structured data is getting more and more important in database applications, such as molecular biology, image retrieval or XML document retrieval. Attributed graphs are a natural model for the structured data in those applications. For the clustering and classification of such structured data, a similarity measure for attributed graphs is necessary. All known similarity measures for attributed graphs are either limited to a special type of graph or computationally extremely complex, i.e. NP-complete, and are, therefore, unsuitable for data mining in large databases. In this paper, we present a new similarity measure for attributed graphs, called matching distance. We demonstrate, how the matching distance can be used for efficient similarity search in attributed graphs. Furthermore, we propose a filter-refinement architecture and an accompanying set of filter methods to reduce the number of necessary distance calculations during similarity search. Our experiments show that the matching distance is a meaningful similarity measure for attributed graphs and that it enables efficient clustering of structured data.

Keywords

Cost Function Similarity Measure Image Retrieval Query Processing Similarity Search 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berchtold, S., Keim, D., Kriegel, H.P.: The X-tree: An index structure for high dimensional data. In: Proc. 22nd VLDB Conf., Bombay, India, pp. 28–39 (1996)Google Scholar
  2. 2.
    Berchtold, S., Böhm, C., Jagadish, H., Kriegel, H.P., Sander, J.: Independent quantization: An index compression technique for high-dimensional data spaces. In: Proc. of the 16th ICDE, pp. 577–588 (2000)Google Scholar
  3. 3.
    Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proc. 24th VLDB Conf., pp. 194–205 (1998)Google Scholar
  4. 4.
    Huet, B., Cross, A., Hancock, E.: Shape retrieval by inexact graph matching. In: Proc. IEEE Int. Conf. on Multimedia Computing Systems, vol. 2, pp. 40–44. IEEE Computer Society Press, Los Alamitos (1999)Google Scholar
  5. 5.
    Kubicka, E., Kubicki, G., Vakalis, I.: Using graph distance in object recognition. In: Proc. ACM Computer Science Conference, pp. 43–48 (1990)Google Scholar
  6. 6.
    Wiskott, L., Fellous, J.M., Krüger, N., von der Malsburg, C.: Face recognition by elastic bunch graph matching. IEEE PAMI 19, 775–779 (1997)Google Scholar
  7. 7.
    Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics-Doklady 10, 707–710 (1966)MathSciNetGoogle Scholar
  8. 8.
    Wagner, R.A., Fisher, M.J.: The string-to-string correction problem. Journal of the ACM 21, 168–173 (1974)zbMATHCrossRefGoogle Scholar
  9. 9.
    Sanfeliu, A., Fu, K.S.: A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on Systems, Man and Cybernetics 13, 353–362 (1983)zbMATHGoogle Scholar
  10. 10.
    Cook, D.J., Holder, L.B.: Graph-based data mining. IEEE Intelligent Systems 15, 32–41 (2000)Google Scholar
  11. 11.
    Zhang, K., Statman, R., Shasha, D.: On the editing distance between unordered labeled trees. Information Processing Letters 42, 133–139 (1992)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Zhang, K., Wang, J., Shasha, D.: On the editing distance between undirected acyclic graphs. International Journal of Foundations of Computer Science 7, 43–57 (1996)zbMATHCrossRefGoogle Scholar
  13. 13.
    Papadopoulos, A., Manolopoulos, Y.: Structure-based similarity search with graph histograms. In: Proc. DEXA/IWOSS Int. Workshop on Similarity Search, pp. 174–178. IEEE Computer Society Press, Los Alamitos (1999)Google Scholar
  14. 14.
    Petrakis, E.: Design and evaluation of spatial similarity approaches for image retrieval. Image and Vision Computing 20, 59–76 (2002)CrossRefGoogle Scholar
  15. 15.
    Kuhn, H.: The hungarian method for the assignment problem. Nval Research Logistics Quarterly 2, 83–97 (1955)CrossRefGoogle Scholar
  16. 16.
    Munkres, J.: Algorithms for the assignment and transportation problems. Journal of the SIAM 6, 32–38 (1957)MathSciNetGoogle Scholar
  17. 17.
    Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: Proc. ACM SIGMOD, pp. 71–79. ACM Press, New York (1995)Google Scholar
  18. 18.
    Hjaltason, G.R., Samet, H.: Ranking in spatial databases. In: Egenhofer, M.J., Herring, J.R. (eds.) SSD 1995. LNCS, vol. 951, pp. 83–95. Springer, Heidelberg (1995)Google Scholar
  19. 19.
    Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Proc. of 23rd VLDB Conf., pp. 426–435 (1997)Google Scholar
  20. 20.
    Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993)Google Scholar
  21. 21.
    Seidl, T., Kriegel, H.P.: Optimal multi-step k-nearest neighbor search. In: Proc. ACM SIGMOD, pp. 154–165. ACM Press, New York (1998)Google Scholar
  22. 22.
    Korn, F., Sidiropoulos, N., Faloutsos, C., Siegel, E., Protopapas, Z.: Fast and effective retrieval of medical tumor shapes. IEEE TKDE 10, 889–904 (1998)Google Scholar
  23. 23.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, Oregon, pp. 226–231. AAAI Press, Menlo Park (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Hans-Peter Kriegel
    • 1
  • Stefan Schönauer
    • 1
  1. 1.Institute for Computer ScienceUniversity of Munich 

Personalised recommendations