Abstract
Every day text-based social media channels are flooded with millions of messages that comprise the most diverse topics. These channels are being used as a rich data source for monitoring different real world events such as natural disasters and disease outbreaks, to name a few. However, depending on the event being investigated, this monitoring may be severely affected by data scarcity and noise, allowing just coarse grain analysis in terms of time and space, which lack the specificity necessary for supporting actions at the local level. In this context, we present a methodology to handle data scarcity and noise while monitoring real world events using social media data in a fine grain. We apply our methodology to dengue-related data from Brazil, and show how it could improve significantly the performance of event monitoring at a local scale almost doubling the correlation observed in some cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Achrekar, H., Gandhe, A., Lazarus, R., Yu, S.H., Liu, B.: Predicting flu trends using twitter data. In: Proceedings of the Communications Workshops. IEEE (2011)
Achrekar, H., Gandhe, A., Lazarus, R., Yu, S.H., Liu, B.: Twitter improves seasonal influenza prediction. In: Proceedings of HEALTHINF (2012)
Althouse, B.M., Ng, Y.Y., Cummings, D.A.T.: Prediction of dengue incidence using search query surveillance. Plos Neglected Tropical Disease 8, e1258–4 (2011)
Chew, C., Eysenback, G., Pandemics in the age of twitter: Content analysis of tweets during the 2009 h1n1 outbreak. Plos One 5 (2010)
Cook, S., Conrad, C., Fowlkes, A.L., Mohebbi, M.H.: Assessing google flu trends performance in the united states during the 2009 influenza virus a (h1n1) pandemic. Plos One 6 (2011)
Cullota, A.: Towards detecting influenza epidemics by analyzing twitter messages. In: Proceedings of 1st Workshop on Social Media Analytics. ACM (2010)
Deb, K., Agrawal, R.B.: Simulated binary crossover for continuous search space. Complex Systems 9, 115–148 (1995)
Einben, A.E., Smith, J.E.: Introduction to evolutionary computing. Springer (2003)
Fortin, F., De Rainville, F.M., Gardner, M., Parizeau, M., Gagne, C.: DEAP: Evolutionary algorithms made easy. Journal of Machine Learning Research 13, 2171–2175 (2012)
Ginsberg, J., Mohebbi, M.H., Patel, R.S., Brammer, L., Smolinski, M.S., Brilliant, L.: Detecting influenza epidemics using search engine query data. Nature 457, 1012–1015 (2009)
Gomide, J., Veloso, A., Meira Jr., W., Almeida, V., Benevenuto, F., Ferraz, F., Teixeira, M.: Dengue surveillance based on a computational model of spatio-temporal locality of twitter. In: Proceedings of the ACM WebSci Conference (2011)
Hulth, A., Ryvedik, G., Lindle, A.: Web queries as a source for syndromic surveillance. Plos One 4 (2009)
Kivinen, J., Mannila, H.: The power of sampling in knowledge discovery. In: Proceedings of the Symposium on Principles of Databases Systems, pp. 77–85 (1994)
Lampos, V., Cristianini, N.: Tracking the flu pandemic by monitoring the social web. In: Proceedings of 2nd Workshop on Cognitive Information Processing. IAPR (2010)
Paul, M.J., Dredze, M.: Analyzing twitter for public health. In: Proceedings ICWSM (2011)
Polgreen, P.M., Chen, Y., Pennock, D.M., Nelson, F.D., Weinstein, R.A.: Using internet searches for influenza surveillance. Clinical Infectious Diseases 47 (2008)
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of International Conference on World Wide Web, pp. 851–860. ACM (2010)
Santos, J.C., Matos, S.: Analysing twitter and web queries for flu trend prediction. Theoretical Biology and Medical Modelling 11 (2014)
Veloso, A., Meira Jr., W., Zaki, M.J.: Lazy associative classification. In: Proceedings of the International Conference on Data Mining, pp. 645–654 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Souza, R.C.S.N.P., de Brito, D.E.F., Cardoso, R.L., de Oliveira, D.M., Meira, W., Pappa, G.L. (2014). An Evolutionary Methodology for Handling Data Scarcity and Noise in Monitoring Real Events from Social Media Data. In: Bazzan, A., Pichara, K. (eds) Advances in Artificial Intelligence -- IBERAMIA 2014. IBERAMIA 2014. Lecture Notes in Computer Science(), vol 8864. Springer, Cham. https://doi.org/10.1007/978-3-319-12027-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-12027-0_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12026-3
Online ISBN: 978-3-319-12027-0
eBook Packages: Computer ScienceComputer Science (R0)