PhonSenticNet: A Cognitive Approach to Microtext Normalization for Concept-Level Sentiment Analysis

Satapathy, Ranjan; Singh, Aalind; Cambria, Erik

doi:10.1007/978-3-030-34980-6_20

Ranjan Satapathy¹⁰,
Aalind Singh¹¹ &
Erik Cambria¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11917))

Included in the following conference series:

International Conference on Computational Data and Social Networks

1034 Accesses
13 Citations

Abstract

With the current upsurge in the usage of social media platforms, the trend of using short text (microtext) in place of text with standard words has seen a significant rise. The usage of microtext poses a considerable performance issue to sentiment analysis, since models are trained on standard words. This paper discusses the impact of coupling sub-symbolic (phonetics) with symbolic (machine learning) Artificial Intelligence to transform the out-of-vocabulary (OOV) concepts into their standard in-vocabulary (IV) form. We develop binary classifier to detect OOV sentences and then they are transformed to phoneme subspace using grapheme to phoneme converter. We compare the phonetic and string distance using the Sorensen similarity algorithm. The phonetically similar IV concepts thus obtained are then used to compute the correct polarity value, which was previously being miscalculated because of the presence of microtext. Our proposed framework improves the accuracy of polarity detection by 6% as compared to the earlier model. In conclusion, we apply a grapheme to phoneme converter for microtext normalization and show its application on sentiment analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.internationalphoneticassociation.org/content/full-ipa-chart.
2.
http://sentic.net/senticnet-5.0.zip.
3.
http://github.com/kite1988/nus-sms-corpus.
4.
Repetition of a soundex encoding for greater than one.
5.
https://sentic.net/demos/#polarity.

References

Aw, A., Zhang, M., Xiao, J., Su, J.: A phrase-based statistical model for SMS text normalization. In: ACL, pp. 33–40 (2006)
Google Scholar
Beaufort, R., Roekhaut, S., Cougnon, L.A.l., Fairon, C.d.: A hybrid rule/model-based finite-state framework for normalizing SMS messages. In: ACL, pp. 770–779. Association for Computational Linguistics (2010)
Google Scholar
Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008)
Article Google Scholar
Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 286–293 (2000)
Google Scholar
Brown, A.: Singapore English in a Nutshell: An Alphabetical Description of its Features. Federal Publications, Singapore (1999)
Google Scholar
Cambria, E., Benson, T., Eckl, C., Hussain, A.: Sentic PROMs: application of sentic computing to the development of a novel unified framework for measuring health-care quality. Expert Syst. Appl. 39(12), 10533–10543 (2012)
Article Google Scholar
Cambria, E., Hussain, A., Durrani, T., Havasi, C., Eckl, C., Munro, J.: Sentic computing for patient centered applications. In: IEEE ICSP, pp. 1279–1282 (2010)
Google Scholar
Cambria, E., Hussain, A., Havasi, C., Eckl, C.: Sentic computing: exploitation of common sense for the development of emotion-sensitive systems. In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds.) Development of Multimodal Interfaces: Active Listening and Synchrony. LNCS, vol. 5967, pp. 148–156. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12397-9_12
Chapter Google Scholar
Cambria, E., Poria, S., Gelbukh, A., Thelwall, M.: Sentiment analysis is a big suitcase. IEEE Intell. Syst. 32(6), 74–80 (2017)
Article Google Scholar
Cambria, E., Poria, S., Hazarika, D., Kwok, K.: SenticNet 5: discovering conceptual primitives for sentiment analysis by means of context embeddings. In: Thirty-Second AAAI Conference on Artificial Intelligence, pp. 1795–1802 (2018)
Google Scholar
Choudhury, M., Saraf, R., Jain, V., Sarkar, S., Basu, A.: Investigation and modeling of the structure of texting language. Int. J. Doc. Anal. Recogn. 10(3–4), 157–174 (2007)
Article Google Scholar
Church, K.W., Gale, W.A.: Probability scoring for spelling correction. Stat. Comput. 1(2), 93–103 (1991)
Article Google Scholar
Cook, P., Stevenson, S.: An unsupervised model for text message normalization. In: Proceedings of the Workshop on Computational Approaches to Linguistic Creativity, pp. 71–78 (2009)
Google Scholar
Han, B., Baldwin, T.: Lexical normalisation of short text messages: Makn sens a# Twitter. In: ACL, pp. 368–378 (2011)
Google Scholar
Howard, N., Cambria, E.: Intention awareness: improving upon situation awareness in human-centric environments. Human-centric Comput. Inf. Sci. 3(9), 1–17 (2013)
Google Scholar
Hutto, C.J., Gilbert, E.: VADER: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth International AAAI Conference on Weblogs and Social Media, pp. 216–225 (2014)
Google Scholar
Kaufmann, M., Kalita, J.: Syntactic normalization of Twitter messages. natural language processing, Kharagpur, India (2010)
Google Scholar
Khoury, R.: Microtext normalization using probably-phonetically-similar word discovery. In: 2015 IEEE 11th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pp. 392–399 (2015)
Google Scholar
Kobus, C., Yvon, F., Damnati, G.é.: Normalizing SMS: are two metaphors better than one? In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 441–448. Association for Computational Linguistics (2008)
Google Scholar
Laurent, A., Deléglise, P., Meignier, S.: Grapheme to phoneme conversion using an SMT system. In: Tenth Annual Conference of the International Speech Communication Association, pp. 708–711 (2009)
Google Scholar
Li, M., Zhang, Y., Zhu, M., Zhou, M.: Exploring distributional similarity based models for query spelling correction. In: ACL, pp. 1025–1032 (2006)
Google Scholar
Li, Z., Yarowsky, D.: Unsupervised translation induction for Chinese abbreviations using monolingual corpora. In: Proceedings of ACL-08: HLT, pp. 425–433 (2008)
Google Scholar
Liu, F., Weng, F., Wang, B., Liu, Y.: Insertion, deletion, or substitution? normalizing text messages without pre-categorization nor supervision. ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies 2, pp. 71–76 (2011)
Google Scholar
Mortensen, D.R., Dalmia, S., Littell, P.: Epitran: precision G2P for many languages. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), pp. 7–12. European Language Resources Association (ELRA), Paris, France, May 2018
Google Scholar
Pennell, D.L., Liu, Y.: A character-level machine translation approach for normalization of SMS abbreviations. In: IJCNLP, pp. 974–982 (2011)
Google Scholar
Pennell, D.L., Liu, Y.: Normalization of informal text. Comput. Speech Lang. 28(1), 256–277 (2014)
Article Google Scholar
Qazi, A., Syed, K., Raj, R., Cambria, E., Tahir, M., Alghazzawi, D.: A concept-level approach to the analysis of online review helpfulness. Comput. Hum. Behav. 58, 75–81 (2016)
Article Google Scholar
Qian, T., Hollingshead, K., Yoon, S.Y., Kim, K.Y., Sproat, R.: A python toolkit for universal transliteration. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC 2010), pp. 2897–2901 (2010)
Google Scholar
Rajagopal, D., Cambria, E., Olsher, D., Kwok, K.: A graph-based approach to commonsense concept extraction and semantic similarity detection. In: WWW, pp. 565–570 (2013)
Google Scholar
Ramos, J., et al.: Using TF-IDF to determine word relevance in document queries. Proceedings of the first instructional conference on machine learning. 242, 133–142 (2003)
Google Scholar
Rao, K., Peng, F., Sak, H., Beaufays, F.: Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4225–4229. IEEE (2015)
Google Scholar
Read, J.: Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the ACL Student Research Workshop, pp. 43–48. Association for Computational Linguistics (2005)
Google Scholar
Rosa, K.D., Ellen, J.: Text classification methodologies applied to micro-text in military chat. In: Proceedings of the Eight International Conference on Machine Learning and Applications, Miami, pp. 710–714 (2009)
Google Scholar
Satapathy, R., Guerreiro, C., Chaturvedi, I., Cambria, E.: Phonetic-based microtext normalization for twitter sentiment analysis. In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 407–413. IEEE (2017)
Google Scholar
Sproat, R., Black, A.W., Chen, S., Kumar, S., Ostendorf, M., Richards, C.: Normalization of non-standard words. Comput. Speech Lang. 15(3), 287–333 (2001)
Article Google Scholar
Toutanova, K., Moore, R.C.: Pronunciation modeling for improved spelling correction. In: ACL, pp. 144–151 (2002)
Google Scholar
Vilares, D., Peng, H., Satapathy, R., Cambria, E.: Babelsenticnet: a commonsense reasoning framework for multilingual sentiment analysis. In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1292–1298. IEEE (2018)
Google Scholar
Wang, P., Ng, H.T.: A beam-search decoder for normalization of social media text with application to machine translation. In: HLT-NAACL, pp. 471–481 (2013)
Google Scholar
Warschauer, M.: The internet and linguistic pluralism. Silicon literacies: Communication, innovation and education in the electronic age, pp. 62–74 (2002)
Google Scholar
Xue, Z., Yin, D., Davison, B.D.: Normalizing Microtext. Analyzing Microtext, pp. 74–79 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Nanyang Technological University, Singapore, Singapore
Ranjan Satapathy & Erik Cambria
Vellore Institute of Technology, Vellore, India
Aalind Singh

Authors

Ranjan Satapathy
View author publications
You can also search for this author in PubMed Google Scholar
Aalind Singh
View author publications
You can also search for this author in PubMed Google Scholar
Erik Cambria
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erik Cambria .

Editor information

Editors and Affiliations

University of Calabria, Rende, Italy
Andrea Tagarelli
University of Illinois at Urbana-Champaign, Urbana, IL, USA
Hanghang Tong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Satapathy, R., Singh, A., Cambria, E. (2019). PhonSenticNet: A Cognitive Approach to Microtext Normalization for Concept-Level Sentiment Analysis. In: Tagarelli, A., Tong, H. (eds) Computational Data and Social Networks. CSoNet 2019. Lecture Notes in Computer Science(), vol 11917. Springer, Cham. https://doi.org/10.1007/978-3-030-34980-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-34980-6_20
Published: 11 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34979-0
Online ISBN: 978-3-030-34980-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics