Twitter Mining for Multiclass Classification Events of Traffic and Pollution
During the last decade social media have generated tons of data, that is the primal information resource for multiple applications. Analyzing this information let us to discover almost immediately unusual situations, such as traffic jumps, traffic accidents, state of the roads, etc.. This research proposes an approach for classifying pollution and traffic tweets automatically. Taking advantage of the information in tweets, it evaluates several machine learning supervised algorithms for text classification, where it determines that the support vector machine (SVM) algorithm achieves the highest accuracy value of 85,8% classifying events of traffic and not traffic. Furthermore, to determine the events that correspond to traffic or pollution we perform a multiclass classification. Where we obtain an accuracy of 78.9%.
KeywordsTwitter event detection Pollution detection Traffic detection Twitter mining Algorithms of classification SVM
- 2.Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, Amsterdam (2014)Google Scholar
- 5.Patil, L.H., Atique, M.: A novel feature selection based on information gain using WordNet. In: Proceedings of SAI Conference, London, UK, pp. 625–629 (2013)Google Scholar
- 7.Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)Google Scholar
- 8.Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, Montreal, Canada, vol. 14, pp. 1137–1145 (1995)Google Scholar
- 9.Zeng, Z.-Q., Yu, H.-B., Xu, H.-R., Xie, Y.-Q., Gao, J.: Fast training support vector machines using parallel sequential minimal optimization. In: 3rd International Conference on Intelligent System and Knowledge Engineering, ISKE 2008, vol. 1, pp. 997–1001. IEEE (2008)Google Scholar