Skip to main content

Challenges in Mining Big Data Streams

  • Conference paper
  • First Online:
Data and Communication Networks

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 847))

Abstract

Big data deals with data of very large data size, heterogeneous data types and from different sources. The data is very complex in nature and having growing data. Dealing with big data is one of the emerging areas of research which is expanding at a rapid rate in all domains of engineering and medical sciences. A major challenge imposes on the analysis of big data is originated from big data generation source, which generate data with very fast speed with varying data distribution due to which the classical methods are unable to process big data. This paper discusses the characteristics, challenges, and issues with big data mining. It also illustrates the examples taken from various fields like medical, finance, social networking sites, stock exchange, etc. to realize the application and importance of big data mining. This paper explains about the use of parallel computing in data mining security issues and how to deal with them. Furthermore, this paper also discusses challenges associated with big streaming data with concept drifts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Nature Editorial: Community cleverness required. Nature 455(7209), 1 (2008)

    Article  Google Scholar 

  2. Howe, D., et al.: Big data: the future of biocuration. Nature 455, 47–50 (2008)

    Article  Google Scholar 

  3. Labrinidis, A., Jagadish, H.: Challenges and opportunities with big data. Proc. VLDB Endowment 5(12), 2032–2033 (2012)

    Article  Google Scholar 

  4. IBM: What is big data: bring big data to the enterprise. http://www.01.ibm.com/software/data/bigdata/ (2012)

  5. Blog, T.: Dispatch from the Denver debate. http://blog.twitter.com/2012/10/dispatch-from-denver-debate.html (2012)

  6. Michel, F.: How many photos are uploaded to flickr every day and month? http://www.flickr.com/photos/franckmichel/6855169886/ (2012)

  7. Rajaraman, A., Ullman, J.: Mining of Massive Data Sets. Cambridge University Press (2011)

    Google Scholar 

  8. Dewdney, P., Hall, P., Schilizzi, R., Lazio, J.: The square kilometre array. Proc. IEEE 97(8), 1482–1496 (2009)

    Article  Google Scholar 

  9. Chang, E.Y., Bai, H., Zhu, K.: Parallel algorithms for mining large-scale rich-media data. In: Proceedings of the 17th ACM International Conference on Multimedia (MM’09), pp. 917–918 (2009)

    Google Scholar 

  10. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), pp. 71–80 (2000)

    Google Scholar 

  11. Wu, X., Zhang, S.: Synthesizing high-frequency rules from different data sources. IEEE Trans. Knowl. Data Eng. 15(2), 353–367 (2003)

    Article  Google Scholar 

  12. Wu, X., Zhang, C., Zhang, S.: Database classification for multi-database mining. Inf. Syst. 30(1), 71–88 (2005)

    Article  Google Scholar 

  13. Na, S., Xumin, L., Yong, G.: Research on k-means clustering algorithm: an improved k-means clustering. In: Third International Symposium on Intelligent Information Technology and Security Informatics, pp. 63, IEEE (2010)

    Google Scholar 

  14. Vidhya, K.A., Aghila, G.: A survey of Naïve Bayes machine learning approach in text document classification (IJCSIS). Int. J. Comput. Sci. Inf. Secur. 7(2) (2010)

    Google Scholar 

  15. Wu, X., Zhu, X., Wu, G., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–106 (2014)

    Article  Google Scholar 

  16. TM Forum: Challenges of big data (2012)

    Google Scholar 

  17. Srivastava, R., Bhatia, M.: Offline vs. online sentiment analysis: issues with sentiment analysis of online micro-texts. Int. J. Inf. Retrieval Res. (IJIRR) 7(4), 1–18 (2017)

    Google Scholar 

  18. Srivastava, R., Bhatia, M.: Real-time unspecified major sub-events detection in the twitter data stream that cause the change in the sentiment score of the targeted event. Int. J. Inf. Technol. Web Eng. (IJITWE) 12(4), 1–21 (2017)

    Article  Google Scholar 

  19. Srivastava, R., Bhatia, M.: Challenges with sentiment analysis of on-line micro-texts. Int. J. Intell. Syst. Appl. 9(7), 31 (2017)

    Google Scholar 

  20. Srivastava, R., Bhatia, M.: Ensemble methods for sentiment analysis of on-line micro-texts. Presented at the International Conference on Recent Advances and Innovations in Engineering (ICRAIE) (2016)

    Google Scholar 

  21. Shafer, J., Agrawal, R., Mehta, M.: SPRINT: a scalable parallel classifier for data mining. In: Proceedings of the 22nd Conference on VLDB (1996)

    Google Scholar 

  22. Luo, D., Ding, C., Huang, H.: Parallelization with multiplicative algorithms for big data mining. In: Proceedings of the IEEE 12th International Conference on Data Mining, pp. 489–498 (2012)

    Google Scholar 

  23. Chen, R., Sivakumar, K., Kargupta, H.: Collective mining of Bayesian networks from distributed heterogeneous data. Knowl. Inf. Syst. 6(2), 164–187 (2004)

    Article  Google Scholar 

  24. Das, S., Sismanis, Y., Beyer, K.S., Gemulla, R., Haas, P.J., McPherson, J.: Ricardo: integrating R and Hadoop. In: Proceedings of the ACMSIGMOD International Conference on Management Data (SIGMOD’10), pp. 987–998 (2010)

    Google Scholar 

  25. Wegener, D., Mock, M., Adranale, D., Wrobel, S.: Toolkit-based high-performance data mining of large data on MapReduce clusters. In: Proceedings of the International Conference on Data Mining Workshops (ICDMW’09), pp. 296–301 (2009)

    Google Scholar 

  26. Kopanas, I., Avouris, N., Daskalaki, S., The role of domain knowledge in a large scale data mining project. In: Vlahavas, I.P., Spyropoulos, C.D. (eds.) Proceedings of the Second Hellenic Conference on AI: Methods and Applications of Artificial Intelligence, pp. 288–299 (2002)

    Chapter  Google Scholar 

  27. Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. J. Comput. Sci. 2(1), 1–8 (2011)

    Article  Google Scholar 

  28. Machanavajjhala, A., Reiter, J.P.: Big privacy: protecting confidentiality in big data. ACM Crossroads 19(1), 20–23 (2012)

    Article  Google Scholar 

  29. Mittal, V., kashyap, I.: Online methods of learning in occurrence of concept drift. Int. J. Comput. Appl. 117(13), 18–22 (2015)

    Article  Google Scholar 

  30. Mittal, V., Kashyap, I.: Empirical study of impact of various concept drifts in data stream mining methods. Int. J. Intell. Syst. Appl. 8(12), 65 (2016)

    Google Scholar 

  31. Mittal, V., Kashyap, I.: An overview of real world applications with concept drifting data streams (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Veena Tayal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tayal, V., Srivastava, R. (2019). Challenges in Mining Big Data Streams. In: Jain, L., E. Balas, V., Johri, P. (eds) Data and Communication Networks. Advances in Intelligent Systems and Computing, vol 847. Springer, Singapore. https://doi.org/10.1007/978-981-13-2254-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-2254-9_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-2253-2

  • Online ISBN: 978-981-13-2254-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics