Skip to main content

Scaling DBSCAN-like Algorithms for Event Detection Systems in Twitter

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10048))

Abstract

The increasing use of mobile social networks has lately transformed news media. Real-world events are nowadays reported in social networks much faster than in traditional channels. As a result, the autonomous detection of events from networks like Twitter has gained lot of interest in both research and media groups. DBSCAN-like algorithms constitute a well-known clustering approach to retrospective event detection. However, scaling such algorithms to geographically large regions and temporarily long periods present two major shortcomings. First, detecting real-world events from the vast amount of tweets cannot be performed anymore in a single machine. Second, the tweeting activity varies a lot within these broad space-time regions limiting the use of global parameters. Against this background, we propose to scale DBSCAN-like event detection techniques by parallelizing and distributing them through a novel density-aware MapReduce scheme. The proposed scheme partitions tweet data as per its spatial and temporal features and tailors local DBSCAN parameters to local tweet densities. We implement the scheme in Apache Spark and evaluate its performance in a dataset composed of geo-located tweets in the Iberian peninsula during the course of several football matches. The results pointed out to the benefits of our proposal against other state-of-the-art techniques in terms of speed-up and detection accuracy.

J. Capdevila—Obra Social “la Caixa”.

J. Torres—Spanish Ministry of Economy and Competitivity under contract TIN2015-65316 and BSC-CNS Severo Ochoa programs (SEV2015-0493, SEV-2011-00067).

J. Cerquides—The SGR program (2014 SGR 118) of the Catalan Governement and Collectiveware (TIN2015-66863-C2-1-R).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    www.twitter.com.

  2. 2.

    https://blog.twitter.com/2013/new-tweets-per-second-record-and-how.

References

  1. Wong, W., Neill, D.: Tutorial on event detection. In: Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) (2009)

    Google Scholar 

  2. Kulldorff, M., Athas, W., Feurer, E., Miller, B., Key, C.: Am. J. Publ. Health 88(9), 1377–1380 (1998)

    Article  Google Scholar 

  3. Yu, Z.: Tutorial on location-based social networks. In: Proceedings of the 21st International Conference on World wide web (WWW) (2012)

    Google Scholar 

  4. Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web (WWW) (2010)

    Google Scholar 

  5. Lee, R., Sumiya, K.: Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection. In: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Location Based Social Networks (LBSN) (2010)

    Google Scholar 

  6. Newman, N.: Mainstream media and the distribution of news in the age of social discovery. Reuters Institute for the Study of Journalism, University of Oxford (2011)

    Google Scholar 

  7. Stelter, B., Cohen, N.: Citizen Journalists Provided Glimpses of Mumbai Attacks. (2008). http://www.nytimes.com/2008/11/30/world/asia/30twitter.html

  8. Atefeh, F., Khreich, W.: A survey of techniques for event detection in Twitter. Comput. Intell. 1, 132–164 (2015)

    Article  MathSciNet  Google Scholar 

  9. Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: real-world event identification on Twitter. In: Proceedings of the Fifth International Conference on Weblogs and Social Media (2011)

    Google Scholar 

  10. Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, vol. 96(34) (1996)

    Google Scholar 

  11. Gomide, J., Veloso, A., Meira, W., Almeida, V., Benevenuto, F., Ferraz, F., Teixeira, M.: Dengue surveillance based on a computational model of spatio-temporal locality of Twitter. In: Proceedings of the 3rd International Web Science Conference (2011)

    Google Scholar 

  12. Tamura, K., Ichimura, T.: Density-based spatiotemporal clustering algorithm for extracting bursty areas from georeferenced documents. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC) (2013)

    Google Scholar 

  13. Birant, D., Kut, A.: ST-DBSCAN: an algorithm for clustering spatial-temporal data. Data and Knowledge Engineering (2007)

    Google Scholar 

  14. Singh, S.: Spatial temporal analysis of social media data. Master Thesis at Technische Universität München (2015)

    Google Scholar 

  15. Capdevila, J., Cerquides, J., Nin, J., Torres, J.: Tweet-SCAN: an event discovery technique for geo-located tweets. In: Artificial Intelligence Research and Development - Proceedings of the 18th International Conference of the Catalan Association for Artificial Intelligence (2015)

    Google Scholar 

  16. Capdevila, J., Cerquides, J., Nin, J., Torres, J.: Tweet-SCAN: An event discovery technique for geo-located tweets. Pattern Recognition Letters. Available online 25 August (2016)

    Google Scholar 

  17. Blei, D.: Probabilistic topic models. Commun. ACM. 55(4), 77–84 (2012)

    Article  MathSciNet  Google Scholar 

  18. Li, L., Goodchild, M., Xu, B.: Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr. Cartography Geogr. Inf. Sci. 40, 261–277 (2013)

    Article  Google Scholar 

  19. Zaharia, M., Chowdhury, M., Franklin, M., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (2010)

    Google Scholar 

  20. Sander, J., Ester, M., Kriegel, H., Xu, X.: Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Mining Knowl. Discov. 2(2), 169–194 (1998)

    Article  Google Scholar 

  21. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)

    Article  Google Scholar 

  22. He, Y., Tan, H., Luo, W., Feng, S., Fan, J.: MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Front. Comput. Sci. 8, 83–99 (2014)

    Article  MathSciNet  Google Scholar 

  23. Cordova, I., Moh, T.S.: DBSCAN on resilient distributed datasets. In: International Conference on High Performance Computing Simulation (HPCS), pp. 531–540 (2015)

    Google Scholar 

  24. Meagher, D.: Octree Encoding: A New Technique for the Representation, Manipulation and Display of Arbitrary 3-D Objects by Computer. Electrical and Systems Engineering Department Rensseiaer Polytechnic Institute Image Processing Laboratory (1980)

    Google Scholar 

  25. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  26. Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retrieval 12(4), 461–486 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joan Capdevila .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Capdevila, J., Pericacho, G., Torres, J., Cerquides, J. (2016). Scaling DBSCAN-like Algorithms for Event Detection Systems in Twitter. In: Carretero, J., Garcia-Blas, J., Ko, R., Mueller, P., Nakano, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2016. Lecture Notes in Computer Science(), vol 10048. Springer, Cham. https://doi.org/10.1007/978-3-319-49583-5_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49583-5_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49582-8

  • Online ISBN: 978-3-319-49583-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics