Abstract
With the advancing of social media communication, people communicate with each other through SMS (Short Message Service), tweets, and chats messages. But the texts used in such medium are quite different from the standard text such as in limitation of character length, misspelling, and some in abbreviated form also called as Non-Standard Form (NSF) which will not be found on dictionaries. The aim of this paper is to study the different existing approaches used for normalizing such kind of texts.
References
Sproat, R., Black, A., Chen, S., Kumar, S., Ostendorf, M., Richards, C.: Normalization of non-standard words. Comput. Speech Lang. 15(3), 287–333 (2001)
https://en.wikipedia.org/wiki/Text_normalization, 9:52 PM, 15/06/16
Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7, 171–176 (1964)
Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Doklady 10, 707 (1966)
Xue, Z., Yin, D., Davison, B.D.: Normalizing microtext. Analyzing Microtext. AAAI Workshop (WS-11-05) (2011)
Pennell, D.L., Liu, Y.: Normalization of text messages for text-to-speech. ICASSP, 978-1-4244-4296-6/10/$25, IEEE (2010)
Pennell, D., Liu, Y.: Toward text message normalization: modeling abbreviation generation. In: ICASSP, Prague, Czech Republic (2011)
Kobus, C., et al.: Normalizing SMS: are two metaphors better than one? In: Proceedings of the 22nd international conference on computational linguistics, pp. 441–448, Manchester (2008)
Khanuja, G.S., Yadav, S.: Normalisation of SMS text. Department of Computer Science and Engineering, IIT Kanpur, India (2013)
Aw, A.T., Zhang, M., Xiao, J., Su, J.: A phrase-based statistical model for SMS text normalization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 3340, Sydney, Australia (2006)
Choudhury, M., Saraf, R., Jain, V., Mukherjee, A., Sarkar, S., Basu, A.: Investigation and modeling of the structure of texting language. Int. J. Doc. Anal. Recogn. 10:157–174 (2007)
Cook, P., Stevenson, S.: An unsupervised model for text message normalization. In: Proceedings of the Workshop on Computational Approaches to Linguistic Creativity, pp. 71–78 (2009)
Li, C., Liu, Y.: Normalization of text messages using character- and phone-based machine translation approaches. Computer Science Department, The University of Texas at Dallas, Richardson, TX, USA (2012)
Kaufmann, M.: Syntactic normalization of twitter messages. In: The 8th International Conference on Natural Language Processing (2010)
Aw, A.T., Zhang, M., Fan, Z.Z., Yeo, P.K., Su, J.: Input normalization for an English-to-Chinese SMS translation system. MT Summit (2005)
Zhu, C., et al.: A unified tagging approach to text normalization. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 688–695, Prague, Czech Republic (2007)
Desai, N., Narvekar, M.: Normalization of noisy text data. In: International Conference on Advanced Computing Technologies and Applications, ICACTA (2015)
Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In Proceedings of ACL Hong Kong, pp. 286–293 (2000)
Pennell, D., Liu, Y.: A character level machine translation approach for normalization of SMS abbreviations. In: Fifth International Joint Conference on Natural Language Processing, pp. 974–982 (2011)
Clarka, E., Arakia, K.: Text normalization in social media: progress, problems and applications for a pre-processing system of casual English. Pacific Association for Computational Linguistics (2011)
Raghunathan, K., Krawczyk, S., Manning, C.: SMS text normalisation using SMT. NLP Department of Stanford University, Standford Projects (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chitrapriya, N., Ruhul Islam, M., Roy, M., Pradhan, S. (2018). A Study on Different Normalization Approaches of Word. In: Kalam, A., Das, S., Sharma, K. (eds) Advances in Electronics, Communication and Computing. Lecture Notes in Electrical Engineering, vol 443. Springer, Singapore. https://doi.org/10.1007/978-981-10-4765-7_25
Download citation
DOI: https://doi.org/10.1007/978-981-10-4765-7_25
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-4764-0
Online ISBN: 978-981-10-4765-7
eBook Packages: EngineeringEngineering (R0)