Optimal Dimension Order: A Generic Technique for the Similarity Join

Böhm, Christian; Krebs, Florian; Kriegel, Hans-Peter

doi:10.1007/3-540-46145-0_14

Optimal Dimension Order: A Generic Technique for the Similarity Join

Christian Böhm⁷,
Florian Krebs⁸ &
Hans-Peter Kriegel⁸

Conference paper
First Online: 01 January 2002

1242 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2454))

Abstract

The similarity join is an important database primitive which has been successfully applied to speed up applications such as similarity search, data analysis and data mining. The similarity join combines two point sets of a multidimensional vector space such that the result contains all point pairs where the distance does not exceed a given Parameter ∈. Although the similarity join is clearly CPU bound, most previous publications propose strategies that primarily improve the I/O performance. Only little effort has been taken to address CPU aspects. In this Paper, we show that most of the computational overhead is dedicated to the final distance computations between the feature vectors. Consequently, we propose a generic technique to reduce the response time of a large number of basic algorithms for the similarity join. It is applicable for index based join algorithms as well as for most join algorithms based on hashing or sorting. Our technique, called Optimal Dimension Order, is able to avoid and accelerate distance calculations between feature vectors by a careful order of the dimensions. The order is determined according to a probability model. In the experimental evaluation, we show that our technique yields high performance improvements for various underlying similarity join algorithms such as the R-tree similarity join, the breadth- first-R-tree join, the Multipage Index Join, and the ∈-Grid-Order.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ankerst M., Breunig M.M., Kriegel H.-P., Sander J.: OPTICS: Ordering Points To Identify the Clustering Structure, ACM SIGMOD Int. Conf. on Management of Data, 1999.
Google Scholar
Agrawal R., Lin K., Sawhney H., Shim K.: Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases, Int. Conf. on Very Large Data Bases (VLDB), 1995.
Google Scholar
Arge L., Procopiuc O., Ramaswamy S., Suel T., Vitter J.S.: Scalable Sweeping-Based Spatial Join, Int. Conf. on Very Large Databases (VLDB), 1998.
Google Scholar
Böhm C., Braunmüller B., Breunig M.M., Kriegel H.-P.: Fast Clustering Based on High-Dimensional Similarity Joins, Int. Conf. on Information Knowledge Management (CIKM), 2000.
Google Scholar
Berchtold S., Böhm C., Jagadish H.V., Kriegel H.-P., Sander J.: Independent Quantization: An Index Compression Technique for High Dimensional Spaces, IEEE Int. Conf. on Data Engineering (ICDE), 2000.
Google Scholar
Berchtold S., Böhm C., Keim D., Kriegel H.-P.: A Cost Model For Neurest Neighbor Search in High-Dimensional Data Space, ACM Symposium on Principles of Database Systems (PODS), 1997.
Google Scholar
Böhm C., Braunmüller B., Krebs F., Kriegel H.-P.: Epsilon Grid Order: An Algorithm for the Similarity Join on Massive High-Dimensional Data, ACM SIGMOD Int. Conf. on Management of Data, 2001.
Google Scholar
Böhm C., Kriegel H.-P.: A Cost Model und Index Architecture for the Similarity Join, IEEE Int. Conf. on Data Engineering (ICDE), 2001.
Google Scholar
Brinkho. T., Kriegel H.-P., Seeger B.: Efficient Processing of Spatial Joins Using R-trees, ACM SIGMOD Int. Conf. on Management of Data, 1993.
Google Scholar
Brinkho. T., Kriegel H.-P., Seeger B.: Parallel Processing of Spatial Joins Sing R-trees, IEEE Int. Conf. on Data Engineering (ICDE), 1996.
Google Scholar
Huang Y.-W., Jing N., Rundensteiner E. A.: Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations, Int. Conf. on Very Large Databases (VLDB), 1997.
Google Scholar
Koudas N., Sevcik C.: Size Separation Spatial Join, ACM SIGMOD Int. Conf. on Managern. of Data, 1997.
Google Scholar
Koudas N., Sevcik C.: High Dimensional Similarity Joins: Algorithms and Performance Evaluation, IEEE Int. Conf. on Data Engineering (ICDE), Best Paper Award, 1998.
Google Scholar
Lo M.-L., Ravishankar C.V.: Spatial Joins Using Seeded Trees, ACM SIGMOD Int. Conf., 1994.
Google Scholar
Lo M.-L., Ravishankar C.V.: Spatial Hash Joins, ACM SIGMOD Int. Conf, 1996.
Google Scholar
Patel J.M., DeWitt D.J., Partition Based Spatial-Merge Join, ACM SIGMOD Int. Conf., 1996.
Google Scholar
Preparata F.P., Shamos M.I.: ‘Computational Geometry’, Chapter 5 (‘Proximity: Fundamental Algorithms’), Springer Verlag New York, 1985.
Google Scholar
Sander J., Ester M., Kriegel H.-P., Xu X.: Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and its Applications, Data Mining and Knowledge Discovery, Vol. 2, No. 2, 1998.
Google Scholar
Shim K., Srikant R., Agrawal R.: High-Dimensional Similarity Joins, Int. Conf. on Data Engineering, 1997.
Google Scholar
Ullman J.D.: Database and Knowledge-Base Systems, Vol. II, Computer Science Press, Rockville MD, 1989
Google Scholar

Download references

Author information

Authors and Affiliations

University for Health Informatics and Technology, Innsbruck
Christian Böhm
University of Munich, Munich
Florian Krebs & Hans-Peter Kriegel

Authors

Christian Böhm
View author publications
You can also search for this author in PubMed Google Scholar
Florian Krebs
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Peter Kriegel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, 606-8501, Kyoto, Japan
Yahiko Kambayashi
Institute for Computer Science and Business Informatics, University of Vienna, Liebiggasse 4, 1010, Vienna, Austria
Werner Winiwarter
Center for Spatial Information Science (CSIS), University of Tokyo, 4-6-1, Komaba, Meguro-ku, 153-8904, Tokyo, Japan
Masatoshi Arikawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Böhm, C., Krebs, F., Kriegel, HP. (2002). Optimal Dimension Order: A Generic Technique for the Similarity Join. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture Notes in Computer Science, vol 2454. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0_14

Download citation

DOI: https://doi.org/10.1007/3-540-46145-0_14
Published: 02 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44123-6
Online ISBN: 978-3-540-46145-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics