Abstract
Social media has become a very rich source of information. Labeling unstructured social media text is a critical task as features belong to multiple labels. Without appropriate labels, raw data does not make any sense. So it is mandatory to provide appropriate labels. In this work, we have proposed a modified multilabel K nearest neighbor (Modified ML-KNN) for generating multiple labels of tweets which when configured with a certain distance measure and number of nearest neighbors gives better performance than conventional ML-KNN. To validate the proposed approach, we have used two different twitter data sets, one Disease related tweets set prepared by us using five different disease keywords and an other benchmark Seattle data set consisting of incident-related tweets. The modified ML-KNN is able to improve the performance of conventional ML-KNN with a minimum of 5% in both the datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sofean M, Smith M (2012) A real-time disease surveillance architecture using social networks. Stud Health Technol Inf 180:823–827
Guo J, Zhang P, Guo L (2012) Mining hot topics from twitter streams. Procedia Comput Sci 9:2008–2011
Rui W, Xing K, Jia Y (2016) BOWL: Bag of word clusters text representation using word embeddings. In: International conference on knowledge science, engineering and management. Springer International Publishing
Ding W et al (2008) LRLW-LSI: an improved latent semantic indexing (LSI) text classifier. Lect Note Comput Sci 5009:483
Zhang ML, Zhou ZH (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837
Aha DW (1991) Incremental constructive induction: an instance-based approach. In: Proceedings of the eighth international workshop on machine learning
Cha SH (2007) Comprehensive survey on distance/similarity measures between probability density functions. City 1(2):1
Tsoumakas G et al (2011) Mulan: a java library for multi-label learning. J Mach Learn Res, 2411–2414
Schulz A et al (2014) Evaluating multi-label classification of incident-related tweets. In: Making Sense of Microposts (Microposts2014), vol 7
Velardi P et al (2014) Twitter mining for fine-grained syndromic surveillance. Artif Intell Med 61(3):153–163
Roesslein J (2009) Tweepy documentation. http://tweepy.readthedocs.io/en/v3.5
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Srivastava, S.K., Singh, S.K. (2019). Multi-label Classification of Twitter Data Using Modified ML-KNN. In: Kolhe, M., Trivedi, M., Tiwari, S., Singh, V. (eds) Advances in Data and Information Sciences . Lecture Notes in Networks and Systems, vol 39. Springer, Singapore. https://doi.org/10.1007/978-981-13-0277-0_3
Download citation
DOI: https://doi.org/10.1007/978-981-13-0277-0_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0276-3
Online ISBN: 978-981-13-0277-0
eBook Packages: EngineeringEngineering (R0)