Incremental Clustering for Trajectories
Trajectory clustering has played a crucial role in data analysis since it reveals underlying trends of moving objects. Due to their sequential nature, trajectory data are often received incrementally, e.g., continuous new points reported by GPS system. However, since existing trajectory clustering algorithms are developed for static datasets, they are not suitable for incremental clustering with the following two requirements. First, clustering should be processed efficiently since it can be frequently requested. Second, huge amounts of trajectory data must be accommodated, as they will accumulate constantly.
An incremental clustering framework for trajectories is proposed in this paper. It contains two parts: online micro-cluster maintenance and offline macro-cluster creation. For online part, when a new bunch of trajectories arrives, each trajectory is simplified into a set of directed line segments in order to find clusters of trajectory subparts. Micro-clusters are used to store compact summaries of similar trajectory line segments, which take much smaller space than raw trajectories. When new data are added, micro-clusters are updated incrementally to reflect the changes. For offline part, when a user requests to see current clustering result, macro-clustering is performed on the set of micro-clusters rather than on all trajectories over the whole time span. Since the number of micro-clusters is smaller than that of original trajectories, macro-clusters are generated efficiently to show clustering result of trajectories. Experimental results on both synthetic and real data sets show that our framework achieves high efficiency as well as high clustering quality.
KeywordsLine Segment Trajectory Data Incremental Data Trajectory Cluster Incremental Cluster
Unable to display preview. Download preview PDF.
- 1.Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB 2003 (2003)Google Scholar
- 2.Ankerst, M., Breunig, M., Kriegel, H.-P., Sander, J.: OPTICS: Ordering points to identify the clustering structure. In: SIGMOD 1999 (1999)Google Scholar
- 3.Breunig, M.M., Kriegel, H.-P., Kröger, P., Sander, J.: Data bubbles: Quality preserving performance boosting for hierarchical clustering. In: SIGMOD 2001 (2001)Google Scholar
- 4.Cadez, I.V., Gaffney, S., Smyth, P.: A general probabilistic framework for clustering individuals and objects. In: KDD 2000 (2000)Google Scholar
- 5.Douglas, D., Peucker, T.: Algorithms for the reduction of the number of points required to represent a line or its character. In: The Ameican Cartographer (1973)Google Scholar
- 6.Ester, M., Kriegel, H.P., Sander, J., Wimmer, M., Xu, X.: Incremental clustering for mining in data warehousing environment. In: VLDB 1998 (1998)Google Scholar
- 7.Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases. In: KDD 1996 (1996)Google Scholar
- 8.Gaffney, S., Robertson, A., Smyth, P., Camargo, S., Ghil, M.: Probabilistic clustering of extratropical cyclones using regression mixture models. Technical Report UCI-ICS 06-02, University of California, Irvine (January 2006)Google Scholar
- 9.Gaffney, S., Smyth, P.: Trajectory clustering with mixtures of regression models. In: KDD 1999 (1999)Google Scholar
- 10.Chen, M.K.L.J., Gao, Y.: Noisy logo recognition using line segment hausdorff distance. Pattern Recognition (2002)Google Scholar
- 11.Lee, J.-G., Han, J., Whang, K.-Y.: Trajectory clustering: A partition-and-group framework. In: SIGMOD 2007 (2007)Google Scholar
- 12.MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symp. Math. Statist., Prob., vol. 1, pp. 281–297 (1967)Google Scholar
- 13.Sacharidis, D., Patroumpas, K., Terrovitis, M., Kantere, V., Potamias, M., Mouratidis, K., Sellis, T.: On-line discovery of hot motion paths. In: EDBT 2008 (2008)Google Scholar
- 14.Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: SIGMOD 1996 (1996)Google Scholar