Skip to main content

VA-Files vs. R*-Trees in Distance Join Queries

  • Conference paper
Book cover Advances in Databases and Information Systems (ADBIS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3631))

Abstract

In modern database applications the similarity of complex objects is examined by performing distance-based queries (e.g. nearest neighbour search) on data of high dimensionality. Most multidimensional indexing methods have failed to efficiently support these queries in arbitrary high-dimensional datasets (due to the dimensionality curse). Similarity join queries and K closest pairs queries are the most representative distance join queries, where two high-dimensional datasets are combined. These queries are very expensive in terms of response time and I/O activity in case of high-dimensional spaces. On the other hand, the filtering-based approach, as applied by the VA-file, has turned out to be a very promising alternative for nearest neighbour search. In general, the filtering-based approach represents vectors as compact approximations, whereas by first scanning these approximations, only a small fraction of the real vectors is visited. Here, we elaborate on VA-files and develop VA-file based algorithms for answering similarity join and K closest pairs queries on high-dimensional data. Also, performance-wise we compare the use of VA-files and R*-trees (a structure that has been proven to be of robust nature) for answering these queries. The results of the comparison do not lead to a clear winner.

Supported by the ARCHIMEDES project 2.2.14, «Management of Moving Objects and the WWW», of the Technological Educational Institute of Thessaloniki (EPEAEK II), co-funded by the Greek Ministry of Education and Religious Affairs and the European Union, INDALOG TIC2002-03968 project «A Database Language Based on Functional Logic Programming» of the Spanish Ministry of Science and Technology under FEDER funds, and the framework of the Greek-Serbian bilateral protocol.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an Efficient and Robust Access Method for Points and Rectangles. In: Proc. SIGMOD Conf., pp. 322–331 (1990)

    Google Scholar 

  2. Berchtold, S., Böhm, C., Jagadish, H., Kriegel, H.P., Sander, J.: Independent Quantization: an Index Compression Technique for High-Dimensional Data Spaces. In: Proc. ICDE Conf., pp. 577–588 (2000)

    Google Scholar 

  3. Böhm, C., Braunmuller, B., Breuning, M.M., Kriegel, H.P.: High Performance Clustering based on Similarity Join. In: Proc. CIKM Conf., pp. 298–305 (2000)

    Google Scholar 

  4. Böhm, C., Kriegel, H.P.: A Cost Model and Index Architecture for the Similarity Join. In: Proc. ICDE Conf., pp. 411–420 (2001)

    Google Scholar 

  5. Cha, G.H., Chung, C.W.: The GC-tree: a High-Dimensional Index Structure for Similarity Search in Image Databases. Transactions on Multimedia 4(2), 235–247 (2002)

    Article  Google Scholar 

  6. Cha, G.H., Zhu, X., Petkovic, D., Chung, C.W.: An Efficient Indexing Method for Nearest Neighbor Searches in High-Dimensional Image Databases. Transactions on Multimedia 4(1), 76–87 (2002)

    Article  Google Scholar 

  7. Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Algorithms for Processing K-Closest-Pair Queries in Spatial Databases. Data and Knowledge Engineering Journal 49(1), 67–104 (2004)

    Article  Google Scholar 

  8. Corral, A., Vassilakopoulos, M.: On Approximate Algorithms for Distance-Based Queries using R-trees. The Computer Journal 48(2), 220–238 (2005)

    Article  Google Scholar 

  9. Cui, B., Hu, J., Shen, H., Yu, C.: Adaptive Quantization of the High-Dimensional Data for Efficient KNN Processing. In: Proc. DASFAA Conf., pp. 302–313 (2004)

    Google Scholar 

  10. Dittrich, J.P., Seeger, B.: GESS: a Scalable Similarity-Join Algorithm for Mining Large Data Sets in High Dimensional Spaces. In: Proc. SIGKDD Conf., pp. 47–56 (2001)

    Google Scholar 

  11. Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., Equitz, W.: Efficient and Effective Querying by Image Content. Journal of Intelligent Information System 3(3-4), 231–262 (1994)

    Article  Google Scholar 

  12. Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., Abbadi, A.E.: Vector Approximation Based Indexing for Non-Uniform High Dimensional Data Sets. In: Proc. CIKM Conf., pp. 202–209 (2000)

    Google Scholar 

  13. Guttman, A.: R-trees: a Dynamic Index Structure for Spatial Searching. In: Proc. SIGMOD Conf., pp. 47–57 (1984)

    Google Scholar 

  14. Koudas, N., Sevcik, K.C.: High Dimensional Similarity Joins: Algorithms and Performance Evaluation. Transactions on Knowledge and Data Engineering 12(1), 3–18 (2000)

    Article  Google Scholar 

  15. Korn, F., Sidiropoulos, N., Faloutsos, C., Siegel, C., Protopapas, Z.: Fast Nearest Neighbor Search in Medical Images Databases. In: Proc. VLDB Conf., pp. 215–226 (1996)

    Google Scholar 

  16. Nanopoulos, A., Theodoridis, Y., Manolopoulos, Y.: C2P: Clustering based on Closest Pairs. In: Proc. VLDB Conf., pp. 331–340 (2001)

    Google Scholar 

  17. Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The A-tree: an Index Structure for High-Dimensional Spaces using Relative Approximation. In: Proc. VLDB Conf., pp. 516–526 (2000)

    Google Scholar 

  18. Shim, K., Srikant, R., Agrawal, R.: High-Dimensional Similarity Joins. In: Proc. of ICDE Conf., pp. 301–311 (1997)

    Google Scholar 

  19. Weber, R., Böhm, K.: Trading Quality for Time with Nearest Neighbor Search. In: Proc. EDBT Conf., pp. 21–35 (2000)

    Google Scholar 

  20. Weber, R., Schek, H.J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: Proc. VLDB Conf., pp. 194–205 (1998)

    Google Scholar 

  21. Web site: http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Corral, A., D’Ermiliis, A., Manolopoulos, Y., Vassilakopoulos, M. (2005). VA-Files vs. R*-Trees in Distance Join Queries. In: Eder, J., Haav, HM., Kalja, A., Penjam, J. (eds) Advances in Databases and Information Systems. ADBIS 2005. Lecture Notes in Computer Science, vol 3631. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11547686_12

Download citation

  • DOI: https://doi.org/10.1007/11547686_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28585-4

  • Online ISBN: 978-3-540-31895-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics