An Efficient Knowledge-Based Text Pre-processing Approach for Twitter and Google+

  • Tripti AgrawalEmail author
  • Archana Singhal
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1046)


People nowadays prefer sharing their opinions towards various products and services frequently on social networking sites (SNSs). These online reviews are huge in size and act as a goldmine for organizations to understand and monitor public reviews of their products and services. But these online reviews are highly unstructured in nature due to the presence of various linguistic features like hashtags, URLs, misspelled words, emoticons and many more. This highly unstructured data makes sentiment classification a challenging task. Hence, data pre-processing is an underlying and fundamental step in sentiment analysis. In the present work, authors have rigorously explored a series of pre-processing steps and observed that the sequence order of pre-processing steps affects the overall results. Hence, a sequence order of pre-processing steps has been proposed and implemented on two different social networks - Twitter and Google+. Twitter has been selected because of its tremendous popularity among netizens and Google+ has been selected because the domain of data for the proposed approach closely matches with the users’ interests on Google+. As the existing approach for handling data on Twitter cannot be implemented directly to handle Google+ data, a modified approach for Google+ has been suggested and implemented by the authors. In addition, some new dictionaries for handling linguistic features have been compiled and existing dictionaries have also been modified to improve pre-processing results. The proposed approach is implemented to evaluate the overall results.


Data pre-processing Sentiment analysis Social networks Twitter Google+ 


  1. 1.
    Agrawal, T., Singhal, A., Agarwal, S.: A comparative study of potential of various social networks for target brand marketing. In: 2016 International Conference on Information Technology (InCITe)-The Next Generation IT Summit on the ThemeInternet of Things: Connect your Worlds, pp. 305–311. IEEE (2016)Google Scholar
  2. 2.
    Angiani, G., et al.: A comparison between preprocessing techniques for sentiment analysis in Twitter. In: KDWeb (2016)Google Scholar
  3. 3.
    Bao, Y., Quan, C., Wang, L., Ren, F.: The role of pre-processing in Twitter sentiment analysis. In: Huang, D.-S., Jo, K.-H., Wang, L. (eds.) ICIC 2014. LNCS (LNAI), vol. 8589, pp. 615–624. Springer, Cham (2014). Scholar
  4. 4.
    Garg, Y., Chatterjee, N.: Sentiment analysis of Twitter feeds. In: Srinivasa, S., Mehta, S. (eds.) BDA 2014. LNCS, vol. 8883, pp. 33–52. Springer, Cham (2014). Scholar
  5. 5.
    Hutto, C., Gilbert, E.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth International Conference on Weblogs and Social Media (ICWSM-14). Accessed 20 Apr 2016
  6. 6.
    Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, vol. 1, no. 12 (2009)Google Scholar
  7. 7.
    Gupta, I., Joshi, N.: Tweet normalization: a knowledge based approach. In: 2017 International Conference on Infocom Technologies and Unmanned Systems (Trends and Future Directions) (ICTUS), pp. 157–162. IEEE (2017)Google Scholar
  8. 8.
    Haddi, E., Liu, X., Shi, Y.: The role of text pre-processing in sentiment analysis. Proc. Comput. Sci. 17, 26–32 (2013)CrossRefGoogle Scholar
  9. 9.
    Jianqiang, Z.: Pre-processing boosting twitter sentiment analysis? In: 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp. 748–753. IEEE (2015)Google Scholar
  10. 10.
    Jianqiang, Z., Xiaolin, G.: Comparison research on text pre-processing methods on Twitter sentiment analysis. IEEE Access 5, 2870–2879 (2017)CrossRefGoogle Scholar
  11. 11.
    Krouska, A., Troussas, C., Virvou, M.: The effect of preprocessing techniques on Twitter sentiment analysis. In: 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA), pp. 1–5. IEEE (2016)Google Scholar
  12. 12.
    Saif, H., He, Y., Alani, H.: Alleviating data sparsity for twitter sentiment analysis. In: CEUR Workshop Proceedings. (CEUR-WS. org) (2012)Google Scholar
  13. 13.
    Singh, T., Kumari, M.: Role of text pre-processing in Twitter sentiment analysis. Proc. Comput. Sci. 89, 549–554 (2016)CrossRefGoogle Scholar
  14. 14.
  15. 15.

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of DelhiDelhiIndia

Personalised recommendations