Abstract
People nowadays prefer sharing their opinions towards various products and services frequently on social networking sites (SNSs). These online reviews are huge in size and act as a goldmine for organizations to understand and monitor public reviews of their products and services. But these online reviews are highly unstructured in nature due to the presence of various linguistic features like hashtags, URLs, misspelled words, emoticons and many more. This highly unstructured data makes sentiment classification a challenging task. Hence, data pre-processing is an underlying and fundamental step in sentiment analysis. In the present work, authors have rigorously explored a series of pre-processing steps and observed that the sequence order of pre-processing steps affects the overall results. Hence, a sequence order of pre-processing steps has been proposed and implemented on two different social networks - Twitter and Google+. Twitter has been selected because of its tremendous popularity among netizens and Google+ has been selected because the domain of data for the proposed approach closely matches with the users’ interests on Google+. As the existing approach for handling data on Twitter cannot be implemented directly to handle Google+ data, a modified approach for Google+ has been suggested and implemented by the authors. In addition, some new dictionaries for handling linguistic features have been compiled and existing dictionaries have also been modified to improve pre-processing results. The proposed approach is implemented to evaluate the overall results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
References
Agrawal, T., Singhal, A., Agarwal, S.: A comparative study of potential of various social networks for target brand marketing. In: 2016 International Conference on Information Technology (InCITe)-The Next Generation IT Summit on the ThemeInternet of Things: Connect your Worlds, pp. 305–311. IEEE (2016)
Angiani, G., et al.: A comparison between preprocessing techniques for sentiment analysis in Twitter. In: KDWeb (2016)
Bao, Y., Quan, C., Wang, L., Ren, F.: The role of pre-processing in Twitter sentiment analysis. In: Huang, D.-S., Jo, K.-H., Wang, L. (eds.) ICIC 2014. LNCS (LNAI), vol. 8589, pp. 615–624. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09339-0_62
Garg, Y., Chatterjee, N.: Sentiment analysis of Twitter feeds. In: Srinivasa, S., Mehta, S. (eds.) BDA 2014. LNCS, vol. 8883, pp. 33–52. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13820-6_3
Hutto, C., Gilbert, E.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth International Conference on Weblogs and Social Media (ICWSM-14). http://comp.social.gatech.edu. Accessed 20 Apr 2016
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, vol. 1, no. 12 (2009)
Gupta, I., Joshi, N.: Tweet normalization: a knowledge based approach. In: 2017 International Conference on Infocom Technologies and Unmanned Systems (Trends and Future Directions) (ICTUS), pp. 157–162. IEEE (2017)
Haddi, E., Liu, X., Shi, Y.: The role of text pre-processing in sentiment analysis. Proc. Comput. Sci. 17, 26–32 (2013)
Jianqiang, Z.: Pre-processing boosting twitter sentiment analysis? In: 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp. 748–753. IEEE (2015)
Jianqiang, Z., Xiaolin, G.: Comparison research on text pre-processing methods on Twitter sentiment analysis. IEEE Access 5, 2870–2879 (2017)
Krouska, A., Troussas, C., Virvou, M.: The effect of preprocessing techniques on Twitter sentiment analysis. In: 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA), pp. 1–5. IEEE (2016)
Saif, H., He, Y., Alani, H.: Alleviating data sparsity for twitter sentiment analysis. In: CEUR Workshop Proceedings. (CEUR-WS. org) (2012)
Singh, T., Kumari, M.: Role of text pre-processing in Twitter sentiment analysis. Proc. Comput. Sci. 89, 549–554 (2016)
https://developers.google.com/+/web/api/rest/latest/comments
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Agrawal, T., Singhal, A. (2019). An Efficient Knowledge-Based Text Pre-processing Approach for Twitter and Google+. In: Singh, M., Gupta, P., Tyagi, V., Flusser, J., Ă–ren, T., Kashyap, R. (eds) Advances in Computing and Data Sciences. ICACDS 2019. Communications in Computer and Information Science, vol 1046. Springer, Singapore. https://doi.org/10.1007/978-981-13-9942-8_36
Download citation
DOI: https://doi.org/10.1007/978-981-13-9942-8_36
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9941-1
Online ISBN: 978-981-13-9942-8
eBook Packages: Computer ScienceComputer Science (R0)