Skip to main content

An Evolutionary Methodology for Handling Data Scarcity and Noise in Monitoring Real Events from Social Media Data

  • Conference paper
  • First Online:
Advances in Artificial Intelligence -- IBERAMIA 2014 (IBERAMIA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8864))

Included in the following conference series:

Abstract

Every day text-based social media channels are flooded with millions of messages that comprise the most diverse topics. These channels are being used as a rich data source for monitoring different real world events such as natural disasters and disease outbreaks, to name a few. However, depending on the event being investigated, this monitoring may be severely affected by data scarcity and noise, allowing just coarse grain analysis in terms of time and space, which lack the specificity necessary for supporting actions at the local level. In this context, we present a methodology to handle data scarcity and noise while monitoring real world events using social media data in a fine grain. We apply our methodology to dengue-related data from Brazil, and show how it could improve significantly the performance of event monitoring at a local scale almost doubling the correlation observed in some cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Achrekar, H., Gandhe, A., Lazarus, R., Yu, S.H., Liu, B.: Predicting flu trends using twitter data. In: Proceedings of the Communications Workshops. IEEE (2011)

    Google Scholar 

  2. Achrekar, H., Gandhe, A., Lazarus, R., Yu, S.H., Liu, B.: Twitter improves seasonal influenza prediction. In: Proceedings of HEALTHINF (2012)

    Google Scholar 

  3. Althouse, B.M., Ng, Y.Y., Cummings, D.A.T.: Prediction of dengue incidence using search query surveillance. Plos Neglected Tropical Disease 8, e1258–4 (2011)

    Google Scholar 

  4. Chew, C., Eysenback, G., Pandemics in the age of twitter: Content analysis of tweets during the 2009 h1n1 outbreak. Plos One 5 (2010)

    Google Scholar 

  5. Cook, S., Conrad, C., Fowlkes, A.L., Mohebbi, M.H.: Assessing google flu trends performance in the united states during the 2009 influenza virus a (h1n1) pandemic. Plos One 6 (2011)

    Google Scholar 

  6. Cullota, A.: Towards detecting influenza epidemics by analyzing twitter messages. In: Proceedings of 1st Workshop on Social Media Analytics. ACM (2010)

    Google Scholar 

  7. Deb, K., Agrawal, R.B.: Simulated binary crossover for continuous search space. Complex Systems 9, 115–148 (1995)

    MATH  MathSciNet  Google Scholar 

  8. Einben, A.E., Smith, J.E.: Introduction to evolutionary computing. Springer (2003)

    Google Scholar 

  9. Fortin, F., De Rainville, F.M., Gardner, M., Parizeau, M., Gagne, C.: DEAP: Evolutionary algorithms made easy. Journal of Machine Learning Research 13, 2171–2175 (2012)

    MATH  Google Scholar 

  10. Ginsberg, J., Mohebbi, M.H., Patel, R.S., Brammer, L., Smolinski, M.S., Brilliant, L.: Detecting influenza epidemics using search engine query data. Nature 457, 1012–1015 (2009)

    Article  Google Scholar 

  11. Gomide, J., Veloso, A., Meira Jr., W., Almeida, V., Benevenuto, F., Ferraz, F., Teixeira, M.: Dengue surveillance based on a computational model of spatio-temporal locality of twitter. In: Proceedings of the ACM WebSci Conference (2011)

    Google Scholar 

  12. Hulth, A., Ryvedik, G., Lindle, A.: Web queries as a source for syndromic surveillance. Plos One 4 (2009)

    Google Scholar 

  13. Kivinen, J., Mannila, H.: The power of sampling in knowledge discovery. In: Proceedings of the Symposium on Principles of Databases Systems, pp. 77–85 (1994)

    Google Scholar 

  14. Lampos, V., Cristianini, N.: Tracking the flu pandemic by monitoring the social web. In: Proceedings of 2nd Workshop on Cognitive Information Processing. IAPR (2010)

    Google Scholar 

  15. Paul, M.J., Dredze, M.: Analyzing twitter for public health. In: Proceedings ICWSM (2011)

    Google Scholar 

  16. Polgreen, P.M., Chen, Y., Pennock, D.M., Nelson, F.D., Weinstein, R.A.: Using internet searches for influenza surveillance. Clinical Infectious Diseases 47 (2008)

    Google Scholar 

  17. Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of International Conference on World Wide Web, pp. 851–860. ACM (2010)

    Google Scholar 

  18. Santos, J.C., Matos, S.: Analysing twitter and web queries for flu trend prediction. Theoretical Biology and Medical Modelling 11 (2014)

    Google Scholar 

  19. Veloso, A., Meira Jr., W., Zaki, M.J.: Lazy associative classification. In: Proceedings of the International Conference on Data Mining, pp. 645–654 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Denise E. F. de Brito .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Souza, R.C.S.N.P., de Brito, D.E.F., Cardoso, R.L., de Oliveira, D.M., Meira, W., Pappa, G.L. (2014). An Evolutionary Methodology for Handling Data Scarcity and Noise in Monitoring Real Events from Social Media Data. In: Bazzan, A., Pichara, K. (eds) Advances in Artificial Intelligence -- IBERAMIA 2014. IBERAMIA 2014. Lecture Notes in Computer Science(), vol 8864. Springer, Cham. https://doi.org/10.1007/978-3-319-12027-0_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12027-0_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12026-3

  • Online ISBN: 978-3-319-12027-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics