TI-DBSCAN: Clustering with DBSCAN by Means of the Triangle Inequality

Kryszkiewicz, Marzena; Lasek, Piotr

doi:10.1007/978-3-642-13529-3_8

Marzena Kryszkiewicz²⁴ &
Piotr Lasek²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6086))

Included in the following conference series:

International Conference on Rough Sets and Current Trends in Computing

1938 Accesses
32 Citations

Abstract

Grouping data into meaningful clusters is an important data mining task. DBSCAN is recognized as a high quality density-based algorithm for clustering data. It enables both the determination of clusters of any shape and the identification of noise in data. The most time-consuming operation in DBSCAN is the calculation of a neighborhood for each data point. In order to speed up this operation in DBSCAN, the neighborhood calculation is expected to be supported by spatial access methods. DBSCAN, nevertheless, is not efficient in the case of high dimensional data. In this paper, we propose a new efficient TI DBSCAN algorithm and its variant TI-DBSCAN-REF that apply the same clustering methodology as DBSCAN. Unlike DBSCAN, TI-DBSCAN and TI-DBSCAN-REF do not use spatial indices; instead they use the triangle inequality property to quickly reduce the neighborhood search space. The experimental results prove that the new algorithms are up to three orders of magnitude faster than DBSCAN, and efficiently cluster both low and high dimensional data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Beckmann, N., Kriegel, H.P.: The R*-tree: An Efficient and Robust Access Method for Points and Rectangles. In: Proc. of ACM SIGMOD, Atlantic City, pp. 322–331 (1990)
Google Scholar
Elkan, C.: Using the Triangle Inequality to Accelerate k-Means. In: Proc. of ICML 2003, Washington, pp. 147–153 (2003)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Database with Noise. In: Proc. of KDD 1996, Portland, pp. 226–231 (1996)
Google Scholar
Guttman, A.: R-Trees: A Dynamic Index Structure For Spatial Searching. In: Proc. of ACM SIGMOD, Boston, pp. 47–57 (1984)
Google Scholar
Kryszkiewicz, M., Lasek, P.: TI-DBSCAN: Clustering with DBSCAN by Means of the Triangle Inequality, ICS Research Report, Warsaw University of Technology (April 2010)
Google Scholar
Kryszkiewicz, M., Skonieczny, Ł.: Faster Clustering with DBSCAN. In: Proc. of IIPWM 2005, Gdańsk, pp. 605–614 (2005)
Google Scholar
Moore, A.W.: The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data. In: Proc. of UAI, Stanford, pp. 397–405 (2000)
Google Scholar
Stonebraker, M., Frew, J., Gardels, K., Meredith, J.: The SEQUOIA 2000 Storage Benchmark. In: Proc. of ACM SIGMOD, Washington, pp. 2–11 (1993)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: A New Data Clustering Algorithm and its Applications. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)
Article Google Scholar
http://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19, 00-665, Warsaw, Poland
Marzena Kryszkiewicz & Piotr Lasek

Authors

Marzena Kryszkiewicz
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Lasek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Mathematics, Warsaw University, Banacha 2, 02-097, Warsaw, Poland
Marcin Szczuka
ICS, Warsaw University of Technology,,
Marzena Kryszkiewicz
Department of Applied Computer Science, University of Winnipeg, R3B 2E9, Winnipeg, Manitoba, Canada
Sheela Ramanna
Dept. of Computer Science, The University of Wales, Aberystwyth, UK
Richard Jensen
Harbin Institute of Technology, PO Box 458, 150006, Harbin, China
Qinghua Hu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kryszkiewicz, M., Lasek, P. (2010). TI-DBSCAN: Clustering with DBSCAN by Means of the Triangle Inequality. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds) Rough Sets and Current Trends in Computing. RSCTC 2010. Lecture Notes in Computer Science(), vol 6086. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13529-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-13529-3_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13528-6
Online ISBN: 978-3-642-13529-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TI-DBSCAN: Clustering with DBSCAN by Means of the Triangle Inequality