Abstract
In this book, we studied the problem of data stream mining. Recently, it became a very important and challenging issue of computer science research. The reason is the enormous growth of data amounts generated in various areas of human activities. Data streams [1,2,3] are potentially of infinite size and often arrive at the system with very high rates. Therefore, it is not possible to store all the data in memory. Appropriate algorithms should use some synopsis structures to compress the information gathered from the past data. Moreover, data stream mining algorithms should be fast enough. Most often they have an incremental nature, i.e. each data element is processed at most once. Alternatively, the data stream can be analyzed in a block-based manner. Another feature of data streams is that the underlying data distribution may change over time. It is known in the literature as ‘concept drift’ [4, 5]. A good data stream mining method should be able to react to different types of changes. In this book, we studied various data stream mining algorithms. We focused on three groups of methods, based on decision trees, probabilistic neural networks, and ensemble methods. A separate part of the book was devoted to each group.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gama, J.: Knowledge Discovery from Data Streams, 1st edn. Chapman and Hall/CRC, United Kingdom (2010)
Lemaire, V., Salperwyck, C., Bondu, A.: A survey on supervised classification on data streams. In: European Business Intelligence Summer School, pp. 88–125. Springer, Berlin (2014)
Garofalakis, M., Gehrke, J., Rastogi, R. (eds.): Data Stream Management: Processing High-Speed Data Streams. Data-Centric Systems and Applications. Springer, Cham (2016)
Tsymbal, A.: The problem of concept drift: definitions and related work, Technical report, Department of Computer Science, Trinity College Dublin (2004)
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)
Rutkowski, L., Pietruczuk, L., Duda, P., Jaworski, M.: Decision trees for mining data streams based on the McDiarmid’s bound. IEEE Trans. Knowl. Data Eng. 25(6), 1272–1279 (2013)
Matuszyk, P., Krempl, G., Spiliopoulou, M.: Correcting the usage of the Hoeffding inequality in stream mining. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds.) Advances in Intelligent Data Analysis XII. Lecture Notes in Computer Science, vol. 8207, pp. 298–309. Springer, Berlin (2013)
De Rosa, R., Cesa-Bianchi, N.: Splitting with confidence in decision trees with application to stream mining. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2015)
Jaworski, M., Duda, P., Rutkowski, L.: New splitting criteria for decision trees in stationary data streams. IEEE Trans. Neural Netw. Learn. Syst. 29, 2516–2529 (2018)
Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: A new method for data stream mining based on the misclassification error. IEEE Trans. Knowl. Data Eng. 26(5), 1048–1059 (2015)
De Rosa, R., Cesa-Bianchi, N.: Confidence decision trees via online and active learning for streaming data. J. Artif. Intell. Res. 60(60), 1031–1055 (2017)
Rutkowski, L.: Generalized regression neural networks in time-varying environment. IEEE Trans. Neural Netw. 15(3), 576–596 (2004)
Pietruczuk, L., Rutkowski, L., Jaworski, M., Duda, P.: The Parzen kernel approach to learning in non-stationary environment. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 3319–3323 (2014)
Duda, P., Jaworski, M., Rutkowski, L.: Knowledge discovery in data streams with the orthogonal series-based generalized regression neural networks. Inf. Sci. 460–461, 497–518 (2017)
Duda, P., Jaworski, M., Rutkowski, L.: Convergent time-varying regression models for data streams: tracking concept drift by the recursive parzen-based generalized regression neural networks. Int. J. Neural Syst. 28(02), 1750048 (2018)
Rutkowski, L.: Adaptive probabilistic neural-networks for pattern classification in time-varying environment. IEEE Trans. Neural Netw. 15(4), 811–827 (2004)
Jaworski, M., Duda, P., Rutkowski, L., Najgebauer, P., Pawlak, M.: Heuristic regression function estimation methods for data streams with concept drift. Lecture Notes in Computer Science, vol. 10246, pp. 726–737 (2017)
Jaworski, M.: Regression function and noise variance tracking methods for data streams with concept drift. Int. J. Appl. Math. Comput. Sci. 28(3), 559–567 (2018)
Pietruczuk, L., Rutkowski, L., Jaworski, M., Duda, P.: A method for automatic adjustment of ensemble size in stream data mining. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 9–15 (2016)
Pietruczuk, L., Rutkowski, L., Jaworski, M., Duda, P.: How to adjust an ensemble size in stream data mining? Inf. Sci. 381, 46–54 (2017)
Duda, P., Jaworski, M., Rutkowski, L.: Online GRNN-based ensembles for regression on evolving data streams. In: Huang, T., Lv, J., Sun, C., Tuzikov, A.V. (eds.) Advances in Neural Networks – ISNN 2018, pp. 221–228. Springer International Publishing, Cham (2018)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Rutkowski, L., Jaworski, M., Duda, P. (2020). Final Remarks and Challenging Problems. In: Stream Data Mining: Algorithms and Their Probabilistic Properties. Studies in Big Data, vol 56. Springer, Cham. https://doi.org/10.1007/978-3-030-13962-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-13962-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13961-2
Online ISBN: 978-3-030-13962-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)