Analysis of Streaming Data Using Big Data and Hybrid Machine Learning Approach

  • Mamoon Rashid
  • Aamir Hamid
  • Shabir A. Parah


A lot of data is generated from multiple sources. This data contains many hidden patterns and information. Data from Social Networks mostly contains opinions. These opinions can be mined to lead various extractions from organizational point of view. In this chapter, the authors are storing the Twitter Streaming Data into HDFS of Hadoop by using Flume and then extracting with Apache Hive. Later, Machine Learning classification algorithms are applied to decode the sentiment in this data. A novel approach based on hybrid Naïve Bayes and Decision Tree Algorithms are used to enhance the performance of sentiment analysis of streaming twitter data. Naïve Bayes is a powerful and simple classification algorithm. But it assumes independence of features. So, Decision Tree has been used in conjunction with it to get more accurate result. Decision Tree has some rules. Algorithms are combined using Averaging Rule. The implemented research approach achieved an accuracy of 86.44% in comparison to 81.11% for Naïve Bayes Classifier.


Big Data Sentiment Multimedia Decision Tree Naïve Bayes Machine learning 


  1. 1.
    Manyika, James, Michael Chui, Brad Brown, Richard Dobbs, Charles Roxburgh, Angela Hung Byers (2011). Big data: The next frontier for innovation, competition, and productivity. Report McKinsey Global Institute.Google Scholar
  2. 2.
    Dean, Jeffrey, Sanjay Ghemawat (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1): 107-113.CrossRefGoogle Scholar
  3. 3.
    Temple, Krystal. (2012). What Happens in an Internet Minute? Inside Scoop.Google Scholar
  4. 4.
    Bollen, Johan, Huina Mao, Xiaojun Zeng (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1):1-8.CrossRefGoogle Scholar
  5. 5.
    Vargas, S., McCreadie, R., Macdonald, C., Ounis, I. (2016, March). Comparing Overall and Targeted Sentiments in Social Media during Crises. In ICWSM (pp. 695-698).Google Scholar
  6. 6.
    Sagiroglu, S., Sinanc, D. (2013, May). Big data: A review. In Collaboration Technologies and Systems (CTS), 2013 International Conference on (pp. 42-47). IEEE.Google Scholar
  7. 7.
    Sreedhar, C., Kavitha, D., Rani, K. A. Big Data and Hadoop. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume, 3.Google Scholar
  8. 8.
    YashikaVerma, SumitHooda. (2015). A Review Paper on Big Data and Hadoop. International Journal of Scientific Research and Development, Volume 3, Issue 2, (pp. 682–684).Google Scholar
  9. 9.
    Ajinkiya Ingle, Anjali Kante, Shriya Samak, Anita Kumari. (2015). Sentiment Analysis of Twitter Data using Hadoop. International Journal of Engineering Research and General Science 3(6): 144-147.Google Scholar
  10. 10.
    Huma Pandey, Shikha Pandey (2016). Sentiment Analysis on Twitter Data-set using Naïve Bayes Algorithm. IEEE, 2nd International Conference on Applied and Theoretical Computing and Communication Technology: 416-419.Google Scholar
  11. 11.
    Abirami A M, Ms. V. Gayathri (2016). A Survey on Sentiment Analysis Methods and Approach. IEEE Eighth International Conference on Advanced Computing (ICoAC): 72-76.Google Scholar
  12. 12.
    Kumar, Sebastian. (2012). Sentiment Analysis on Twitter. IJCSI International Journal of Computer Science: 372–378.Google Scholar
  13. 13.
    Divya Sehgal, Ambuj Kumar Agarwal (2016). Sentiment Analysis of Big Data Applications using Twitter Data with the Help of HADOOP Framework. IEEE, 5th International Conference on System Modelling & Advancement in Research Trends: 251-255.Google Scholar
  14. 14.
    Jalpa Mehta, Jayesh Patil, Rutesh Patil, Mansi Somani, Sheel Varma. (2016). Sentiment Analysis on Product Reviews using Hadoop. International Journal of Computer Applications 142(11): 38-41.CrossRefGoogle Scholar
  15. 15.
    Ravi Babu. (2017). Sentiment Analysis of reviews for E-Shopping Websites. International Journal of Engineering and Computer Science 6(1): 19965-19968.Google Scholar
  16. 16.
    Godbole, N., Srinivasaiah, M. & Skiena, S. (2007). Large-Scale Sentiment Analysis for News and Blogs. Proceedings of the International Conference on Weblogs and Social edia (ICWSM).Google Scholar
  17. 17.
    Luiz F. S. Coletta, Nadia F. F. da Silva, Eduardo R. Hruschka, Estevam R. Hruschka Jr. (2014). Combining Classification and Clustering for Tweet Sentiment Analysis. IEEE Brazilian Conference on Intelligent Systems, 210-215.Google Scholar
  18. 18.
    Vaishali Sarathy, Srinidhi S, Karthika S. (2015). Sentiment Analysis Using Big Data From Social Media:23rd IRF International Conference: 40-45.Google Scholar
  19. 19.
    Rajni Singh, Rajdeep Kaur. (2015). Sentiment Analysis on Social Media and Online Review, International Journal of Computer Applications 121(20): 44-48.CrossRefGoogle Scholar
  20. 20.
    Akshay Amolik, Niketan Jivane, Mahavir Bhandari and Dr. M Venkatesan (2016). Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques. International Journal of Engineering and Technology, 7(6):2038–2044.Google Scholar
  21. 21.
    Cui, Zhang, Liu, Ma. (2011). Emotion tokens: Bridging the gap among multilingual twitter sentiment analysis. Information Retrieval Technology: 238–249.Google Scholar
  22. 22.
    Abhinandan P Shirahatti, Neha Patil, Durgappa Kubasad and Arif Mujawar (2015). Sentiment Analysis on Twitter Data using Hadoop. International Journal of Emerging Technology in Computer Science and Electronics, 14(2): 831–837.Google Scholar
  23. 23.
    Kim, Hovy (2006). Identifying and analyzing judgment opinions. Proceedings of HLT/NAACL: 200–207.Google Scholar
  24. 24.
    Thakor (2017). A Survey Paper on Classification Algorithms in Big Data. International Journal of Research Culture Society 1(3): 21-27.Google Scholar
  25. 25.
    Ankur Goel, Jyoti Gautam, Sitesh Kumar (2016). Real Time Sentiment Analysis of Tweets Using Naive Bayes. IEEE 2nd International Conference on Next Generation Computing Technologies (NGCT-2016) Dehradun: 257-261.Google Scholar
  26. 26.
    Rao, Ravichandran. (2009). Semi-supervised polarity lexicon induction, Conference of the European Chapter of the Association for Computational Linguistics: 675–682.Google Scholar
  27. 27.
    M Rashid, R Chawla (2013). Securing Data Storage by Extending Role Based Access Control. International Journal of Cloud Applications and Computing, 3(4), 28-37. DOI: Scholar
  28. 28.
    Priya. V, S Divya Vandana. (2016). Chennai Rains Sentiment-An Analysis Of Opinion About Youngsters Reflected In Tweets Using Hadoop. International Journal of Pharmacy & Technology, 8(3): 16172-16180.Google Scholar
  29. 29.
    Warih Maharani. (2013). Microblogging Sentiment Analysis with Lexical Based and Machine Learning Approaches. IEEE International Conference of Information and Communication Technology (ICoICT): 439-443.Google Scholar
  30. 30.
    Mrigank Mridul, Akashdeep Khajuria, Snehasish Dutta, Kumar N. (2014). Analysis of Big data using Apache Hadoop and Map Reduce. International Journal of Advanced Research in Computer Science and Software Engineering 4(5): 555-560.Google Scholar
  31. 31.
    Bingwei Liu, Erik Blasch, Yu Chen, Dan Shen and Genshe Chen (2013). Scalable Sentiment Classification for Big Data Analysis Using Naive Bayes Classifier. IEEE International Conference on Big Data: 99-104.Google Scholar
  32. 32.
    Mohit Mertiya, Ashima Singh. (2016). Combining Naive Bayes and Adjective Analysis for Sentiment Detection on Twitter. International Conference on Inventive Computation Technologies (ICICT), Coimbatore, pp. 1-6. DOI:
  33. 33.
    Shivangi Sharma. (2017). Design and Implementation of Improved Naive Bayes Algorithm for Sentiment Analysis on Movies Review. International Journal of Innovative Research in Computer and Communication Engineering, 5 (1): 285-291Google Scholar
  34. 34.
    Edison M, A. Aloysius (2016). Concepts and Methods of Sentiment Analysis on Big Data. International Journal of Innovative Research in Science Engineering and Technology 5(9):16288-16296.Google Scholar
  35. 35.
    Sukhpal Kaur, Mamoon Rashid (2016). Web News Mining using Back Propagation Neural Network and Clustering using K-Means Algorithm in Big Data. Indian Journal of Science and Technology, Vol 9(41), DOI:
  36. 36.
    Ghiassi, M., & Lee, S. (2018). A domain transferable lexicon set for Twitter sentiment analysis using a supervised machine learning approach. Expert Systems with Applications, 106, 197-216.Google Scholar
  37. 37.
    Yoo, S., Song, J., & Jeong, O. (2018). Social media contents based sentiment analysis and prediction system. Expert Systems with Applications, 105, 102-111.Google Scholar
  38. 38.
    Zhang, S., Wei, Z., Wang, Y., & Liao, T. (2018). Sentiment analysis of Chinese micro-blog text based on extended sentiment dictionary. Future Generation Computer Systems, 81, 395-403.Google Scholar
  39. 39.
    Soleymani, M., Garcia, D., Jou, B., Schuller, B., Chang, S. F., & Pantic, M. (2017). A survey of multimodal sentiment analysis. Image and Vision Computing, 65, 3-14.Google Scholar
  40. 40.
    Pandey, A. C., Rajpoot, D. S., & Saraswat, M. (2017). Twitter sentiment analysis using hybrid cuckoo search method. Information Processing & Management, 53(4), 764-779.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Mamoon Rashid
    • 1
  • Aamir Hamid
    • 2
  • Shabir A. Parah
    • 3
  1. 1.School of Computer Science and Engineering, Lovely Professional UniversityJalandharIndia
  2. 2.Department of Computer Science and EngineeringSwami Vivekanand Institute of Engineering & TechnologyChandigarhIndia
  3. 3.Department of Electronics and Instrumentation TechnologyUniversity of KashmirSrinagarIndia

Personalised recommendations