Abstract
Big data deals with data of very large data size, heterogeneous data types and from different sources. The data is very complex in nature and having growing data. Dealing with big data is one of the emerging areas of research which is expanding at a rapid rate in all domains of engineering and medical sciences. A major challenge imposes on the analysis of big data is originated from big data generation source, which generate data with very fast speed with varying data distribution due to which the classical methods are unable to process big data. This paper discusses the characteristics, challenges, and issues with big data mining. It also illustrates the examples taken from various fields like medical, finance, social networking sites, stock exchange, etc. to realize the application and importance of big data mining. This paper explains about the use of parallel computing in data mining security issues and how to deal with them. Furthermore, this paper also discusses challenges associated with big streaming data with concept drifts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Nature Editorial: Community cleverness required. Nature 455(7209), 1 (2008)
Howe, D., et al.: Big data: the future of biocuration. Nature 455, 47–50 (2008)
Labrinidis, A., Jagadish, H.: Challenges and opportunities with big data. Proc. VLDB Endowment 5(12), 2032–2033 (2012)
IBM: What is big data: bring big data to the enterprise. http://www.01.ibm.com/software/data/bigdata/ (2012)
Blog, T.: Dispatch from the Denver debate. http://blog.twitter.com/2012/10/dispatch-from-denver-debate.html (2012)
Michel, F.: How many photos are uploaded to flickr every day and month? http://www.flickr.com/photos/franckmichel/6855169886/ (2012)
Rajaraman, A., Ullman, J.: Mining of Massive Data Sets. Cambridge University Press (2011)
Dewdney, P., Hall, P., Schilizzi, R., Lazio, J.: The square kilometre array. Proc. IEEE 97(8), 1482–1496 (2009)
Chang, E.Y., Bai, H., Zhu, K.: Parallel algorithms for mining large-scale rich-media data. In: Proceedings of the 17th ACM International Conference on Multimedia (MM’09), pp. 917–918 (2009)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), pp. 71–80 (2000)
Wu, X., Zhang, S.: Synthesizing high-frequency rules from different data sources. IEEE Trans. Knowl. Data Eng. 15(2), 353–367 (2003)
Wu, X., Zhang, C., Zhang, S.: Database classification for multi-database mining. Inf. Syst. 30(1), 71–88 (2005)
Na, S., Xumin, L., Yong, G.: Research on k-means clustering algorithm: an improved k-means clustering. In: Third International Symposium on Intelligent Information Technology and Security Informatics, pp. 63, IEEE (2010)
Vidhya, K.A., Aghila, G.: A survey of Naïve Bayes machine learning approach in text document classification (IJCSIS). Int. J. Comput. Sci. Inf. Secur. 7(2) (2010)
Wu, X., Zhu, X., Wu, G., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–106 (2014)
TM Forum: Challenges of big data (2012)
Srivastava, R., Bhatia, M.: Offline vs. online sentiment analysis: issues with sentiment analysis of online micro-texts. Int. J. Inf. Retrieval Res. (IJIRR) 7(4), 1–18 (2017)
Srivastava, R., Bhatia, M.: Real-time unspecified major sub-events detection in the twitter data stream that cause the change in the sentiment score of the targeted event. Int. J. Inf. Technol. Web Eng. (IJITWE) 12(4), 1–21 (2017)
Srivastava, R., Bhatia, M.: Challenges with sentiment analysis of on-line micro-texts. Int. J. Intell. Syst. Appl. 9(7), 31 (2017)
Srivastava, R., Bhatia, M.: Ensemble methods for sentiment analysis of on-line micro-texts. Presented at the International Conference on Recent Advances and Innovations in Engineering (ICRAIE) (2016)
Shafer, J., Agrawal, R., Mehta, M.: SPRINT: a scalable parallel classifier for data mining. In: Proceedings of the 22nd Conference on VLDB (1996)
Luo, D., Ding, C., Huang, H.: Parallelization with multiplicative algorithms for big data mining. In: Proceedings of the IEEE 12th International Conference on Data Mining, pp. 489–498 (2012)
Chen, R., Sivakumar, K., Kargupta, H.: Collective mining of Bayesian networks from distributed heterogeneous data. Knowl. Inf. Syst. 6(2), 164–187 (2004)
Das, S., Sismanis, Y., Beyer, K.S., Gemulla, R., Haas, P.J., McPherson, J.: Ricardo: integrating R and Hadoop. In: Proceedings of the ACMSIGMOD International Conference on Management Data (SIGMOD’10), pp. 987–998 (2010)
Wegener, D., Mock, M., Adranale, D., Wrobel, S.: Toolkit-based high-performance data mining of large data on MapReduce clusters. In: Proceedings of the International Conference on Data Mining Workshops (ICDMW’09), pp. 296–301 (2009)
Kopanas, I., Avouris, N., Daskalaki, S., The role of domain knowledge in a large scale data mining project. In: Vlahavas, I.P., Spyropoulos, C.D. (eds.) Proceedings of the Second Hellenic Conference on AI: Methods and Applications of Artificial Intelligence, pp. 288–299 (2002)
Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. J. Comput. Sci. 2(1), 1–8 (2011)
Machanavajjhala, A., Reiter, J.P.: Big privacy: protecting confidentiality in big data. ACM Crossroads 19(1), 20–23 (2012)
Mittal, V., kashyap, I.: Online methods of learning in occurrence of concept drift. Int. J. Comput. Appl. 117(13), 18–22 (2015)
Mittal, V., Kashyap, I.: Empirical study of impact of various concept drifts in data stream mining methods. Int. J. Intell. Syst. Appl. 8(12), 65 (2016)
Mittal, V., Kashyap, I.: An overview of real world applications with concept drifting data streams (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tayal, V., Srivastava, R. (2019). Challenges in Mining Big Data Streams. In: Jain, L., E. Balas, V., Johri, P. (eds) Data and Communication Networks. Advances in Intelligent Systems and Computing, vol 847. Springer, Singapore. https://doi.org/10.1007/978-981-13-2254-9_15
Download citation
DOI: https://doi.org/10.1007/978-981-13-2254-9_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2253-2
Online ISBN: 978-981-13-2254-9
eBook Packages: EngineeringEngineering (R0)