Abstract
The aim of this paper is to propose and describe methodology for identification of repetitive sequences in big data sets. These repetitive sequences can represent for example sequences of failures that emerge in industrial processes. Proposed methodology deals with sequences which are based on time, when the elements of particular sequence emerged. One way to approach such identification is to use so called brute-force scanning, but this approach is very demanding on computational power and computational time for big data sets cases. Our methodology approaches this issue from the side of data mining and data analysis point of view.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tanuška P.: Tézy inauguračnej prednášky, MTF STU (2013)
Friedman, J.H.: Data Mining and Statistics: What’s the Connection? Stanford University, Stanford, CA 94305, 10 November 2016. http://statweb.stanford.edu/~jhf/ftp/dm-stat.pdf
Babcock, B., Datar, M., Motwani, R., O’Callaghan, L.: Maintaining variance and k-medians over data stream windows. In: Proceedings of ACM Symposium on Principles of Database Systems (2003)
Kamath, C.: On the role of data mining techniques in uncertainty quantification. Int. J. Uncertain. Quantification 2(1), 73–94 (2012)
Nazari, Z., et al.: A new hierarchical clustering algorithm. In: ICIIBMS 2015, Track2: Artificial Intelligence, Robotics, and Human-Computer Interaction, Okinawa, Japan
Alpydin, E.: Introduction to Machine Learning, pp. 143–158. The MIT Press, Cambridge (2010)
Acknowledgments
This publication was written with financial support of the KEGA agency in the frame of the project 040STU-4/2016 “Modernization of the Automatic Control Hardware course by applying the concept Industry 4.0”.
This publication is the result of implementation of the project: “UNIVERSITY SCIENTIFIC PARK: CAMPUS MTF STU - CAMBO” (ITMS: 26220220179) supported by the Research & Development Operational Program funded by the EFRR.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Nemeth, M., Michalconok, G. (2019). Proposal of the Methodology for Identification of Repetitive Sequences in Big Data. In: Silhavy, R. (eds) Software Engineering and Algorithms in Intelligent Systems. CSOC2018 2018. Advances in Intelligent Systems and Computing, vol 763. Springer, Cham. https://doi.org/10.1007/978-3-319-91186-1_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-91186-1_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91185-4
Online ISBN: 978-3-319-91186-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)