Combining Approximation Techniques and Vector Quantization for Adaptable Similarity Search

Böhm, Christian; Kriegel, Hans-Peter; Seidl, Thomas

doi:10.1023/A:1016515829761

Combining Approximation Techniques and Vector Quantization for Adaptable Similarity Search

Published: September 2002

Volume 19, pages 207–230, (2002)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Christian Böhm¹,
Hans-Peter Kriegel² &
Thomas Seidl³

52 Accesses
1 Citation
Explore all metrics

Abstract

Adaptable similarity queries based on quadratic form distance functions are widely popular in data mining application domains including multimedia, CAD, molecular biology or medical image databases. Recently it has been recognized that quantization of feature vectors can substantially improve query processing for Euclidean distance functions, as demonstrated by the scan-based VA-file and the index structure IQ-tree. In this paper, we address the problem that determining quadratic form distances between quantized vectors is difficult and computationally expensive. Our solution provides a variety of new approximation techniques for quantized vectors which are combined by an extended multistep query processing architecture. In our analysis section, we show that the filter steps complement each other. Consequently, it is useful to apply our filters in combination. We show the superiority of our approach over other architectures and over competitive query processing methods. In our experimental evaluation, the sequential scan is outperformed by a factor of 2.3. Compared to the X-tree on 64 dimensional color histogram data, we measured an improvement factor of 5.7.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast Global Registration

Clustering, coding, and the concept of similarity

Article 19 March 2024

RefinerHash: a new hashing-based re-ranking technique for image retrieval

Article 08 April 2024

References

Ankerst, M., Braunmüller, B., Kriegel, H.-P., and Seidl, T. (1998). Improving Adaptable Similarity Query Processing by Using Approximations. In Proc. 24th Int. Conf. on Very Large Databases (VLDB) (pp. 206–217).
Ankerst, M., Kastenmüller, G., Kriegel, H.-P., and Seidl, T. (1999). 3D Shape Histograms for Similarity Search and Classification in Spatial Databases. In Proc. 6th Int. Symposium on Spatial Databases (SSD), Lecture Notes in Computer Science, Vol. 1651 (pp. 207–226).
Google Scholar
Ankerst, M., Kriegel, H.-P., and Seidl, T. (1998). A Multi-Step Approach for Shape Similarity Search in Image Databases. IEEE Transactions on Knowledge and Data Engineering (TKDE), 10(6), 996–1004.
Google Scholar
Berchtold, S., Böhm, C., Jagadish, H.V., Kriegel, H.-P., and Sander, J. (2000). Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces. In Proc. 16th Int. Conf. on Data Engineering (ICDE) (pp. 577–588).
Berchtold, S., Böhm, C., Keim, D., and Kriegel, H.-P. (1997). A Cost Model for Nearest Neighbor Search in High-Dimensional Data Space. In Proc. ACM PODS Symposium on Principles of Database Systems (pp. 78–86).
Berchtold, S., Keim, D., and Kriegel, H.-P. (1996). The X-Tree: An Index Structure for High-Dimensional Data. In Proc. 22nd Int. Conf. on Very Large Data Bases (VLDB) (pp. 28–39).
Beyer, K., Goldstein, J., Ramakrishnan, R., and Shaft, U. (1999). When is “Nearest Neighbor” Meaningful? In Proc. 7th Int. Conf. on Database Theory (ICDT) (pp. 217–235).
Böhm, C. (1998). Efficiently Indexing High-Dimensional Data Spaces. Ph.D. Thesis, University of Munich.
Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., and Equitz, W. (1994). Efficient and Effective Querying by Image Content. Journal of Intelligent Information Systems, 3, 231–262.
Google Scholar
Faloutsos, C., Ranganathan, M., and Manolopoulos, Y. (1994). Fast Subsequence Matching in Time-Series Databases. In Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 419–429).
Hafner, J., Sawhney, H.S., Equitz, W., Flickner, M., and Niblack, W. (1995). Efficient Color Histogram Indexing for Quadratic Form Distance Functions. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), 17(7), 729–736.
Google Scholar
Henrich, A. (1998). The LSD ^h-Tree: An Access Structure for Feature Vectors. In Proc. 14th Int. Conf. on Data Engineering (ICDE) (pp. 362–369).
Ishikawa, Y., Subramanya, R., and Faloutsos, C. (1998). MindReader: Querying Databases Through Multiple Examples. In Proc. 24th Int. Conf. on Very Large Databases (VLDB) (pp. 218–227).
Katayama, N. and Satoh, S. (1997). The SR-Tree: An Index Structure for High-Dimensional Nearest Neighbor Queries. In Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 369–380).
Kaul, A., O'Connor, M.A., and Srinivasan, V. (1991). Computing Minkowski Sums of Regular Polygons. In Proc. 3rd Canadian Conf. on Computing Geometry (pp. 74–77).
Korn, F., Sidiropoulos, N., Faloutsos, C., Siegel, E., and Protopapas, Z. (1998). Fast and Effective Retrieval of Medical Tumor Shapes. In IEEE Transaction on Knowledge and Data Engineering (TKDE), 10(6), 889–904.
Google Scholar
Kriegel, H.-P., Schmidt, T., and Seidl, T. (1997). 3D Similarity Search by Shape Approximation. In Proc. Fifth Int. Symposium on Large Spatial Databases (SSD), vmLecture Notes in Computer Science, Vol. 1262 (pp. 11–28).
Google Scholar
Kriegel, H.-P. and Seidl, T. (1998). Approximation-Based Similarity Search for 3-D Surface Segments. In GeoInformatica Int. Journal, 2(2), 113–147.
Google Scholar
Lin, K., Jagadish, H.V., and Faloutsos, C. (1994). The TV-Tree: An Index Structure for High-Dimensional Data. VLDB Journal, 3(4), 517–542.
Google Scholar
Seidl, T. and Kriegel, H.-P. (1997). Efficient User-Adaptable Similarity Search in Large Multimedia Databases. In Proc. 23rd Int. Conf. on Very Large Data Bases (VLDB) (pp. 506–515).
Seidl, T. and Kriegel, H.-P. (1998). Optimal Multi-Step k-Nearest Neighbor Search. In Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 154–165).
Smith, J.R. (1997). Integrated Spatial and Feature Image Systems: Retrieval, Compression and Analysis. Ph.D. Thesis, Graduate School of Arts and Sciences, Columbia University.
Weber, R., Schek, H.-J., and Blott, S. (1998). A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In Proc. 24th Int. Conf. on Very Large Databases (VLDB) (pp. 194–205).
White, D.A. and Jain, R. (1996). Similarity Indexing with the SS-Tree. In Proc. 12th Int. Conf. on Data Engineering (ICDE) (pp. 516–523).

Download references

Author information

Authors and Affiliations

University for Health Informatics and Technology Tyrol, Innrain 98, 6020, Innsbruck, Austria
Christian Böhm
Institute for Computer Science, University of Munich, Oettingenstr. 67, 80538, München, Germany
Hans-Peter Kriegel
Department of Computer and Information Science, University of Constance, Box D78, 78457, Konstanz, Germany
Thomas Seidl

Authors

Christian Böhm
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Peter Kriegel
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Seidl
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Böhm, C., Kriegel, HP. & Seidl, T. Combining Approximation Techniques and Vector Quantization for Adaptable Similarity Search. Journal of Intelligent Information Systems 19, 207–230 (2002). https://doi.org/10.1023/A:1016515829761

Download citation

Issue Date: September 2002
DOI: https://doi.org/10.1023/A:1016515829761

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining Approximation Techniques and Vector Quantization for Adaptable Similarity Search

Abstract

Access this article

Similar content being viewed by others

Fast Global Registration

Clustering, coding, and the concept of similarity

RefinerHash: a new hashing-based re-ranking technique for image retrieval

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Combining Approximation Techniques and Vector Quantization for Adaptable Similarity Search

Abstract

Access this article

Similar content being viewed by others

Fast Global Registration

Clustering, coding, and the concept of similarity

RefinerHash: a new hashing-based re-ranking technique for image retrieval

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation