Abstract
Density Peaks based Clustering (DPC) is a recently proposed clustering algorithm, which is realized by first selecting some representative objects named density peaks, then assigning each remaining objects to one of the density peaks. Different from classical centroid-based clustering algorithms, DPC can find arbitrary-shaped clusters, and no predefined initial centroid set is required. However, a key disadvantage of the DPC lies in its computational complexity. DPC requires computation of two indicators for each data object. When the number of data increases, the computational complexity of DPC grows dramatically, which limits the application in many real-world problems. For example, when we use the taxi drop-offs to analyze the human mobility, DPC cannot be directly used due to the large number of taxi drop-off records. This paper proposes an efficient DPC algorithm based on grid density. By partitioning the effective data space into a desirable number of grids, two indicators of each grid are computed, as the number of grids is much smaller than that of data objects, a great amount of computational time and memory space can be saved. In experiments, we compare Grid-DPC with K-centers, affinity propagation and DPC on both synthetic and publicly available datasets. Results demonstrate that Grid-DPC can achieve comparable clustering performance with the classical DPC. We also employee Grid-DPC to analyze large-scale taxi records of a city in China and of New York Manhattan area. The discovered human mobility zones have great potential in urban planning and can help taxi drivers make better routing decisions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn., pp. 443–496. Morgan Kaufmann Publishers Inc., Burlington (2011)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1073–1080. ACM (2009)
Fu, L., Medico, E.: FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinform. 8(1), 3 (2007)
Lancichinetti, A., Fortunato, S., Kertész, J.: Detecting the overlapping and hierarchical community structure in complex networks. New J. Phys. 11(3), 033015 (2009)
Veenman, C.J., Reinders, M.J.T., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1273–1280 (2002)
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 341–352 (2007)
Jain, A.K., Law, M.H.C.: Data clustering: a user’s dilemma. In: Pal, S.K., Bandyopadhyay, S., Biswas, S. (eds.) PReMI 2005. LNCS, vol. 3776, pp. 1–10. Springer, Heidelberg (2005). https://doi.org/10.1007/11590316_1
Acknowledgements
This work was supported in part by the Natural Science Foundation of China [Grant Numbers 71771034].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Guo, C., Na, Z., Sun, L., Chen, X. (2018). An Efficient Clustering Algorithm Based on Grid Density and its Application in Human Mobility Analysis. In: Huynh, VN., Inuiguchi, M., Tran, D., Denoeux, T. (eds) Integrated Uncertainty in Knowledge Modelling and Decision Making. IUKM 2018. Lecture Notes in Computer Science(), vol 10758. Springer, Cham. https://doi.org/10.1007/978-3-319-75429-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-75429-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75428-4
Online ISBN: 978-3-319-75429-1
eBook Packages: Computer ScienceComputer Science (R0)