VA-Files vs. R*-Trees in Distance Join Queries

Corral, Antonio; D’Ermiliis, Alejandro; Manolopoulos, Yannis; Vassilakopoulos, Michael

doi:10.1007/11547686_12

Antonio Corral¹⁸,
Alejandro D’Ermiliis¹⁸,
Yannis Manolopoulos¹⁹ &
…
Michael Vassilakopoulos²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3631))

Included in the following conference series:

East European Conference on Advances in Databases and Information Systems

565 Accesses
4 Citations

Abstract

In modern database applications the similarity of complex objects is examined by performing distance-based queries (e.g. nearest neighbour search) on data of high dimensionality. Most multidimensional indexing methods have failed to efficiently support these queries in arbitrary high-dimensional datasets (due to the dimensionality curse). Similarity join queries and K closest pairs queries are the most representative distance join queries, where two high-dimensional datasets are combined. These queries are very expensive in terms of response time and I/O activity in case of high-dimensional spaces. On the other hand, the filtering-based approach, as applied by the VA-file, has turned out to be a very promising alternative for nearest neighbour search. In general, the filtering-based approach represents vectors as compact approximations, whereas by first scanning these approximations, only a small fraction of the real vectors is visited. Here, we elaborate on VA-files and develop VA-file based algorithms for answering similarity join and K closest pairs queries on high-dimensional data. Also, performance-wise we compare the use of VA-files and R*-trees (a structure that has been proven to be of robust nature) for answering these queries. The results of the comparison do not lead to a clear winner.

Supported by the ARCHIMEDES project 2.2.14, «Management of Moving Objects and the WWW», of the Technological Educational Institute of Thessaloniki (EPEAEK II), co-funded by the Greek Ministry of Education and Religious Affairs and the European Union, INDALOG TIC2002-03968 project «A Database Language Based on Functional Logic Programming» of the Spanish Ministry of Science and Technology under FEDER funds, and the framework of the Greek-Serbian bilateral protocol.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an Efficient and Robust Access Method for Points and Rectangles. In: Proc. SIGMOD Conf., pp. 322–331 (1990)
Google Scholar
Berchtold, S., Böhm, C., Jagadish, H., Kriegel, H.P., Sander, J.: Independent Quantization: an Index Compression Technique for High-Dimensional Data Spaces. In: Proc. ICDE Conf., pp. 577–588 (2000)
Google Scholar
Böhm, C., Braunmuller, B., Breuning, M.M., Kriegel, H.P.: High Performance Clustering based on Similarity Join. In: Proc. CIKM Conf., pp. 298–305 (2000)
Google Scholar
Böhm, C., Kriegel, H.P.: A Cost Model and Index Architecture for the Similarity Join. In: Proc. ICDE Conf., pp. 411–420 (2001)
Google Scholar
Cha, G.H., Chung, C.W.: The GC-tree: a High-Dimensional Index Structure for Similarity Search in Image Databases. Transactions on Multimedia 4(2), 235–247 (2002)
Article Google Scholar
Cha, G.H., Zhu, X., Petkovic, D., Chung, C.W.: An Efficient Indexing Method for Nearest Neighbor Searches in High-Dimensional Image Databases. Transactions on Multimedia 4(1), 76–87 (2002)
Article Google Scholar
Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Algorithms for Processing K-Closest-Pair Queries in Spatial Databases. Data and Knowledge Engineering Journal 49(1), 67–104 (2004)
Article Google Scholar
Corral, A., Vassilakopoulos, M.: On Approximate Algorithms for Distance-Based Queries using R-trees. The Computer Journal 48(2), 220–238 (2005)
Article Google Scholar
Cui, B., Hu, J., Shen, H., Yu, C.: Adaptive Quantization of the High-Dimensional Data for Efficient KNN Processing. In: Proc. DASFAA Conf., pp. 302–313 (2004)
Google Scholar
Dittrich, J.P., Seeger, B.: GESS: a Scalable Similarity-Join Algorithm for Mining Large Data Sets in High Dimensional Spaces. In: Proc. SIGKDD Conf., pp. 47–56 (2001)
Google Scholar
Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., Equitz, W.: Efficient and Effective Querying by Image Content. Journal of Intelligent Information System 3(3-4), 231–262 (1994)
Article Google Scholar
Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., Abbadi, A.E.: Vector Approximation Based Indexing for Non-Uniform High Dimensional Data Sets. In: Proc. CIKM Conf., pp. 202–209 (2000)
Google Scholar
Guttman, A.: R-trees: a Dynamic Index Structure for Spatial Searching. In: Proc. SIGMOD Conf., pp. 47–57 (1984)
Google Scholar
Koudas, N., Sevcik, K.C.: High Dimensional Similarity Joins: Algorithms and Performance Evaluation. Transactions on Knowledge and Data Engineering 12(1), 3–18 (2000)
Article Google Scholar
Korn, F., Sidiropoulos, N., Faloutsos, C., Siegel, C., Protopapas, Z.: Fast Nearest Neighbor Search in Medical Images Databases. In: Proc. VLDB Conf., pp. 215–226 (1996)
Google Scholar
Nanopoulos, A., Theodoridis, Y., Manolopoulos, Y.: C²P: Clustering based on Closest Pairs. In: Proc. VLDB Conf., pp. 331–340 (2001)
Google Scholar
Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The A-tree: an Index Structure for High-Dimensional Spaces using Relative Approximation. In: Proc. VLDB Conf., pp. 516–526 (2000)
Google Scholar
Shim, K., Srikant, R., Agrawal, R.: High-Dimensional Similarity Joins. In: Proc. of ICDE Conf., pp. 301–311 (1997)
Google Scholar
Weber, R., Böhm, K.: Trading Quality for Time with Nearest Neighbor Search. In: Proc. EDBT Conf., pp. 21–35 (2000)
Google Scholar
Weber, R., Schek, H.J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: Proc. VLDB Conf., pp. 194–205 (1998)
Google Scholar
Web site: http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.html

Download references

Author information

Authors and Affiliations

Department of Languages and Computing, University of Almeria, 04120, Almeria, Spain
Antonio Corral & Alejandro D’Ermiliis
Department of Informatics, Aristotle University, GR-54124, Thessaloniki, Greece
Yannis Manolopoulos
Department of Informatics, Technological Educational Institute of Thessaloniki, P.O. BOX 141, GR-57400, Thessaloniki, Greece
Michael Vassilakopoulos

Authors

Antonio Corral
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro D’Ermiliis
View author publications
You can also search for this author in PubMed Google Scholar
Yannis Manolopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Michael Vassilakopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Knowledge and Business Engineering, University of Vienna, Rathausstrasse 19/9, A-1010, Vienna, Austria
Johann Eder
Institute of Cybernetics, Tallinn University of Technology, Akadeemia 21, 12618, Tallinn, Estonia
Hele-Mai Haav , Ahto Kalja & Jaan Penjam , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Corral, A., D’Ermiliis, A., Manolopoulos, Y., Vassilakopoulos, M. (2005). VA-Files vs. R*-Trees in Distance Join Queries. In: Eder, J., Haav, HM., Kalja, A., Penjam, J. (eds) Advances in Databases and Information Systems. ADBIS 2005. Lecture Notes in Computer Science, vol 3631. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11547686_12

Download citation

DOI: https://doi.org/10.1007/11547686_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28585-4
Online ISBN: 978-3-540-31895-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics