Advertisement

VA-Files vs. R*-Trees in Distance Join Queries

  • Antonio Corral
  • Alejandro D’Ermiliis
  • Yannis Manolopoulos
  • Michael Vassilakopoulos
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3631)

Abstract

In modern database applications the similarity of complex objects is examined by performing distance-based queries (e.g. nearest neighbour search) on data of high dimensionality. Most multidimensional indexing methods have failed to efficiently support these queries in arbitrary high-dimensional datasets (due to the dimensionality curse). Similarity join queries and K closest pairs queries are the most representative distance join queries, where two high-dimensional datasets are combined. These queries are very expensive in terms of response time and I/O activity in case of high-dimensional spaces. On the other hand, the filtering-based approach, as applied by the VA-file, has turned out to be a very promising alternative for nearest neighbour search. In general, the filtering-based approach represents vectors as compact approximations, whereas by first scanning these approximations, only a small fraction of the real vectors is visited. Here, we elaborate on VA-files and develop VA-file based algorithms for answering similarity join and K closest pairs queries on high-dimensional data. Also, performance-wise we compare the use of VA-files and R*-trees (a structure that has been proven to be of robust nature) for answering these queries. The results of the comparison do not lead to a clear winner.

Keywords

Close Pair Similarity Join Page Access Vector File Compact Approximation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an Efficient and Robust Access Method for Points and Rectangles. In: Proc. SIGMOD Conf., pp. 322–331 (1990)Google Scholar
  2. 2.
    Berchtold, S., Böhm, C., Jagadish, H., Kriegel, H.P., Sander, J.: Independent Quantization: an Index Compression Technique for High-Dimensional Data Spaces. In: Proc. ICDE Conf., pp. 577–588 (2000)Google Scholar
  3. 3.
    Böhm, C., Braunmuller, B., Breuning, M.M., Kriegel, H.P.: High Performance Clustering based on Similarity Join. In: Proc. CIKM Conf., pp. 298–305 (2000)Google Scholar
  4. 4.
    Böhm, C., Kriegel, H.P.: A Cost Model and Index Architecture for the Similarity Join. In: Proc. ICDE Conf., pp. 411–420 (2001)Google Scholar
  5. 5.
    Cha, G.H., Chung, C.W.: The GC-tree: a High-Dimensional Index Structure for Similarity Search in Image Databases. Transactions on Multimedia 4(2), 235–247 (2002)CrossRefGoogle Scholar
  6. 6.
    Cha, G.H., Zhu, X., Petkovic, D., Chung, C.W.: An Efficient Indexing Method for Nearest Neighbor Searches in High-Dimensional Image Databases. Transactions on Multimedia 4(1), 76–87 (2002)CrossRefGoogle Scholar
  7. 7.
    Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Algorithms for Processing K-Closest-Pair Queries in Spatial Databases. Data and Knowledge Engineering Journal 49(1), 67–104 (2004)CrossRefGoogle Scholar
  8. 8.
    Corral, A., Vassilakopoulos, M.: On Approximate Algorithms for Distance-Based Queries using R-trees. The Computer Journal 48(2), 220–238 (2005)CrossRefGoogle Scholar
  9. 9.
    Cui, B., Hu, J., Shen, H., Yu, C.: Adaptive Quantization of the High-Dimensional Data for Efficient KNN Processing. In: Proc. DASFAA Conf., pp. 302–313 (2004)Google Scholar
  10. 10.
    Dittrich, J.P., Seeger, B.: GESS: a Scalable Similarity-Join Algorithm for Mining Large Data Sets in High Dimensional Spaces. In: Proc. SIGKDD Conf., pp. 47–56 (2001)Google Scholar
  11. 11.
    Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., Equitz, W.: Efficient and Effective Querying by Image Content. Journal of Intelligent Information System 3(3-4), 231–262 (1994)CrossRefGoogle Scholar
  12. 12.
    Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., Abbadi, A.E.: Vector Approximation Based Indexing for Non-Uniform High Dimensional Data Sets. In: Proc. CIKM Conf., pp. 202–209 (2000)Google Scholar
  13. 13.
    Guttman, A.: R-trees: a Dynamic Index Structure for Spatial Searching. In: Proc. SIGMOD Conf., pp. 47–57 (1984)Google Scholar
  14. 14.
    Koudas, N., Sevcik, K.C.: High Dimensional Similarity Joins: Algorithms and Performance Evaluation. Transactions on Knowledge and Data Engineering 12(1), 3–18 (2000)CrossRefGoogle Scholar
  15. 15.
    Korn, F., Sidiropoulos, N., Faloutsos, C., Siegel, C., Protopapas, Z.: Fast Nearest Neighbor Search in Medical Images Databases. In: Proc. VLDB Conf., pp. 215–226 (1996)Google Scholar
  16. 16.
    Nanopoulos, A., Theodoridis, Y., Manolopoulos, Y.: C2P: Clustering based on Closest Pairs. In: Proc. VLDB Conf., pp. 331–340 (2001)Google Scholar
  17. 17.
    Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The A-tree: an Index Structure for High-Dimensional Spaces using Relative Approximation. In: Proc. VLDB Conf., pp. 516–526 (2000)Google Scholar
  18. 18.
    Shim, K., Srikant, R., Agrawal, R.: High-Dimensional Similarity Joins. In: Proc. of ICDE Conf., pp. 301–311 (1997)Google Scholar
  19. 19.
    Weber, R., Böhm, K.: Trading Quality for Time with Nearest Neighbor Search. In: Proc. EDBT Conf., pp. 21–35 (2000)Google Scholar
  20. 20.
    Weber, R., Schek, H.J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: Proc. VLDB Conf., pp. 194–205 (1998)Google Scholar
  21. 21.

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Antonio Corral
    • 1
  • Alejandro D’Ermiliis
    • 1
  • Yannis Manolopoulos
    • 2
  • Michael Vassilakopoulos
    • 3
  1. 1.Department of Languages and ComputingUniversity of AlmeriaAlmeriaSpain
  2. 2.Department of InformaticsAristotle UniversityThessalonikiGreece
  3. 3.Department of InformaticsTechnological Educational Institute of ThessalonikiThessalonikiGreece

Personalised recommendations