Abstract
Stop and move information can be used to uncover useful semantic patterns; therefore, annotating GPS trajectories as either stopping or moving is beneficial. However, the task of automatically discovering if the entity is stopping or moving is challenging due to the spatial noisiness of real-world GPS trajectories. Existing approaches classify each entry definitively as being either a stop or a move: hiding all indication that some classifications can be made with more certainty than others. Such an indication of the “goodness of classification” of each entry would allow the user to filter out certain stop classifications that appear too ambiguous for their use-case, which in a data-mining context may ultimately lead to less false patterns. In this work we propose such an approach that takes a noisy GPS trajectory as input and calculates the stop probability at each entry. Through the use of a minimum stop probability parameter our proposed approach allows the user to directly filter out any classified stops that are of an unacceptable probability for their application. Using several real-world and synthetic GPS trajectories (that we have made available) we compared the classification effectiveness, parameter sensitivity, and running time of our approach to two well-known existing approaches SMoT and CB-SMoT. Experimental results indicated the efficiency, effectiveness, and sampling rate robustness of our approach compared to the existing approaches. The results also demonstrated that the user can increase the minimum stop probability parameter to easily filter out low probability stop classifications—which equated to effectively reducing the number of false positive classifications in our ground truth experiments. Lastly, we proposed estimation heuristics for each our approaches’ parameters and empirically demonstrated the effectiveness of each heuristic using real-world trajectories. Specifically, the results revealed that even when all of the parameters were estimated the classification effectiveness of our approach was higher than existing approaches across a range of sampling rates.
Similar content being viewed by others
Notes
In the case where stops must occur for some minimum amount of time it is straightforward to enforce this constraint on POSMIT’s stop/move classification result. Firstly, all contiguous entries that are classified as stops are merged into groups, each of these groups then has their combined durations calculated, and finally groups whose durations are too low become moves.
Entries with spatial coordinates in a non-Cartesian geographic projection will need to be unprojected to calculate a suitable Euclidean distance. Also, Euclidean distance was chosen over great-circle distance for this problem because it is most widely used in spatial analysis (Smith et al. 2015), and it is faster to compute and intra-point distance between points in a candidate stop are intrinsically small; thus, factoring in the curvature of Earth in this case would be negligible.
References
Alvares LO, Bogorny V, Kuijpers B, de Macedo J.A.F, Moelans B, Vaisman A (2007) A model for enriching trajectories with semantic geographical information. In: Proceedings of the 15th annual ACM international symposium on advances in geographic information systems GIS ’07. ACM, New York, pp 22:1–22:8
Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) Optics: ordering points to identify the clustering structure. SIGMOD Rec 28(2):49–60. https://doi.org/10.1145/304181.304187
Boukhechba M, Bouzouane A, Bouchard B, Gouin-Vallerand C, Giroux S (2015) Online recognition of people’s activities from raw GPS data: semantic trajectory data analysis. In: Proceedings of the 8th ACM international conference on PErvasive technologies related to assistive environments PETRA ’15. ACM, New York, pp 40:1–40:8. https://doi.org/10.1145/2769493.2769498
Calenge C, Dray S, Royer-Carenzi M (2009) The concept of animals’ trajectories from a data analysis perspective. Ecol Inf 4(1):34–41. https://doi.org/10.1016/j.ecoinf.2008.10.002
Cao H, Mamoulis N, Cheung DW (2007) Discovery of periodic patterns in spatiotemporal sequences. IEEE Trans Knowl Data Eng 19(4):453–467. https://doi.org/10.1109/TKDE.2007.1002
Cao X, Cong G, Jensen CS (2010) Mining significant semantic locations from GPS data. Proc VLDB Endow 3(1–2):1009–1020. https://doi.org/10.14778/1920841.1920968
DATA.GOV.IE: Dublin bus GPS sample data from Dublin city council (insight project) (2013). https://data.gov.ie/dataset/dublin-bus-gps-sample-data-from-dublin-city-council-insight-project. Accessed 12 Nov 2017
de Smith MJ, Goodchild MF, Longley PA (2015) Geospatial analysis: a comprehensive guide to principles, techniques and software tools, 5th edn. The Winchelsea Press
Ester M, peter Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD ’96: proceedings of the 2nd international conference on knowledge discovery and data mining. AAAI Press, pp 226–231
Fischer MM, Getis A (eds) (2010) Handbook of applied spatial analysis: software tools, methods and applications. Springer
Fu Z, Tian Z, Xu Y, Qiao C (2016) A two-step clustering approach to extract locations from individual GPS trajectory data. ISPRS Int J Geoinf. https://doi.org/10.3390/ijgi5100166
Giannotti F, Nanni M, Pinelli F, Pedreschi D (2007) Trajectory pattern mining. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining KDD ’07. ACM, New York, pp 330–339. https://doi.org/10.1145/1281192.1281230
Gong L, Sato H, Yamamoto T, Miwa T, Morikawa T (2015) Identification of activity stop locations in gps trajectories by density-based clustering method combined with support vector machines. J Mod Transp 23(3):202–213. https://doi.org/10.1007/s40534-015-0079-x
Gonzalez MC, Hidalgo CA, Barabasi AL (2008) Understanding individual human mobility patterns. Nature 453(7196):779–782
Guidotti R, Trasarti R, Nanni M (2015) Tosca: two-steps clustering algorithm for personal locations detection. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems SIGSPATIAL ’15. ACM, New York, pp 38:1–38:10
Guidotti R, Trasarti R, Nanni M, Giannotti F, Pedreschi D (2017) There’s a path for everyone: a data-driven personal model reproducing mobility agendas. In: 2017 IEEE international conference on data science and advanced analytics (DSAA) pp 303–312
Haining R (2003) Spatial data analysis: theory and practice. Cambridge University Press. https://books.google.com.au/books?id=CYZSh347eiAC
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New York
Huang L, Li Q, Yue Y (2010) Activity identification from GPS trajectories using spatial temporal POIS’ attractiveness. In: Proceedings of the 2nd ACM SIGSPATIAL international workshop on location based social networks LBSN ’10. ACM, New York, pp 27–30. https://doi.org/10.1145/1867699.1867704
Hwang YC, Lin CC, Chang JR, Mori H, Huang HC (2009) Predicting essential genes based on network and sequence analysis. Mol BioSyst 5:1672–1678
Hwang S, Evans C, Hanke T (2017) Detecting stop episodes from GPS trajectories with gaps. Springer, New York, pp 427–439. https://doi.org/10.1007/978-3-319-40902-3_23
Khetarpaul S, Chauhan R, Gupta SK, Subramaniam LV, Nambiar U (2011) Mining GPS data to determine interesting locations. In: Proceedings of the 8th international workshop on information integration on the Web: In Conjunction with WWW 2011 IIWeb ’11. ACM, New York, pp 8:1–8:6. https://doi.org/10.1145/1982624.1982632
Leung KWT, Lee DL, Lee WC (2011) Clr: a collaborative location recommendation framework based on co-clustering. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval SIGIR ’11. ACM, New York, pp 305–314. https://doi.org/10.1145/2009916.2009960
Luo T, Zheng X, Xu G, Fu K, Ren W (2017) An improved DBSCAN algorithm to detect stops in individual trajectories. ISPRS Int J Geo-Inf. https://doi.org/10.3390/ijgi6030063
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability Volume 1: Statistics. University of California Press, Berkeley, pp 281–297. http://projecteuclid.org/euclid.bsmsp/1200512992
McCarroll D (2017) Simple statistical tests for geography. CRC Press, Boca Raton
Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9(1):141–142
Palma AT, Bogorny V, Kuijpers B, Alvares LO (2008) A clustering-based approach for discovering interesting places in trajectories. In: Proceedings of the 2008 ACM symposium on applied computing SAC ’08. ACM, New York, pp 863–868. https://doi.org/10.1145/1363686.1363886
Pelekis N, Kopanakis I, Kotsifakos E, Frentzos E, Theodoridis Y (2009) Clustering trajectories of moving objects in an uncertain world. In: 2009 Ninth IEEE international conference on data mining, pp 417–427. https://doi.org/10.1109/ICDM.2009.57
Powers D (2011) Evaluation: from precision recall and f-measure to ROC informedness markedness & correlation. J Mach Learn Technol 2:37–63
Rocha JAMR, Times VC, Oliveira G, Alvares LO, Bogorny V (2010) Db-SMoT: a direction-based spatio-temporal clustering method. In: 2010 5th IEEE international conference intelligent systems, pp 114–119. https://doi.org/10.1109/IS.2010.5548396
Satopaa V, Albrecht J, Irwin D, Raghavan B (2011) Finding a “kneedle” in a haystack: detecting knee points in system behavior. In: Proceedings of the 2011 31st international conference on distributed computing systems workshops ICDCSW ’11. IEEE Computer Society, Washington, pp 166–171. https://doi.org/10.1109/ICDCSW.2011.20
Smith MJ, Goodchild MF, Longley PA (2015) Geospatial analysis: a comprehensive guide to principles techniques and software tools, 5th edn. The Winchelsea Press, Leicester
Spaccapietra S, Parent C, Damiani ML, de Macedo JA, Porto F, Vangenot C (2008) A conceptual view on trajectories. Data Knowl Eng 65(1):126–146. https://doi.org/10.1016/j.datak.2007.10.008
Spinsanti L, Celli F, Renso C (2010) Where you stop is who you are: understanding peoples activities by places visited. In: BMI ’10: Proceedings of the 5th BMI workshop on behaviour monitoring and interpretation. CEUR-WS Karlsruhe, Germany, pp 38–52
Takeuchi Y, Sugimoto M (2006) Cityvoyager: an outdoor recommendation system based on user location history. In: Proceedings of the third international conference on ubiquitous intelligence and computing UIC’06. Springer, Berlin, pp 625–636. https://doi.org/10.1007/11833529_64
Thierry B, Chaix B, Kestens Y (2013) Detecting activity locations from raw gps data: a novel kernel-based algorithm. Int J Health Geogr 12(1):14
Tobler WR (1970) A computer movie simulating urban growth in the detroit region. Econ Geogr 46:234–240
Trajcevski G (2011) Uncertainty in spatial trajectories. Springer, New York, pp 63–107. https://doi.org/10.1007/978-1-4614-1629-6_3
Tran LH, Nguyen QVH, Do NH, Yan Z (2011) Robust and hierarchical stop discovery in sparse and diverse trajectories. Technical report EPFL EPFL
Xiang L, Gao M, Wu T (2016) Extracting stops from noisy trajectories: a sequence oriented clustering approach. ISPRS Int J Geo-Inf. https://doi.org/10.3390/ijgi5030029
Xie K, Deng K, Zhou X (2009) From trajectories to activities: a spatio-temporal join approach. In: Proceedings of the 2009 international workshop on location based social networks LBSN ’09. ACM, New York, pp 25–32. https://doi.org/10.1145/1629890.1629897
Ying JJC, Lee WC, Tseng VS (2014) Mining geographic-temporal-semantic patterns in trajectories for location prediction. ACM Trans Intell Syst Technol 5(1):2:1–2:33. https://doi.org/10.1145/2542182.2542184
Yuan J, Zheng Y, Zhang C, Xie W, Xie X, Sun G, Huang Y (2010) T-drive: driving directions based on taxi trajectories. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems GIS ’10. ACM, New York, pp 99–108. https://doi.org/10.1145/1869790.1869807
Zheng Y, Zhang L, Xie X, Ma WY (2009) Mining interesting locations and travel sequences from GPS trajectories. In: Proceedings of the 18th international conference on World Wide Web WWW ’09. ACM, New York, pp 791–800. https://doi.org/10.1145/1526709.1526816
Zimmermann M, Kirste T, Spiliopoulou M (2009) Finding stops in error-prone trajectories of moving objects with time-based clustering. Springer, Berlin, pp 275–286. https://doi.org/10.1007/978-3-642-10263-9_24
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Srinivasan Parthasarathy.
Rights and permissions
About this article
Cite this article
Bermingham, L., Lee, I. A probabilistic stop and move classifier for noisy GPS trajectories. Data Min Knowl Disc 32, 1634–1662 (2018). https://doi.org/10.1007/s10618-018-0568-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-018-0568-8