Proposal of the Methodology for Identification of Repetitive Sequences in Big Data
The aim of this paper is to propose and describe methodology for identification of repetitive sequences in big data sets. These repetitive sequences can represent for example sequences of failures that emerge in industrial processes. Proposed methodology deals with sequences which are based on time, when the elements of particular sequence emerged. One way to approach such identification is to use so called brute-force scanning, but this approach is very demanding on computational power and computational time for big data sets cases. Our methodology approaches this issue from the side of data mining and data analysis point of view.
KeywordsData mining Big data Failure Repetitive sequences
This publication was written with financial support of the KEGA agency in the frame of the project 040STU-4/2016 “Modernization of the Automatic Control Hardware course by applying the concept Industry 4.0”.
This publication is the result of implementation of the project: “UNIVERSITY SCIENTIFIC PARK: CAMPUS MTF STU - CAMBO” (ITMS: 26220220179) supported by the Research & Development Operational Program funded by the EFRR.
- 1.Tanuška P.: Tézy inauguračnej prednášky, MTF STU (2013)Google Scholar
- 2.Friedman, J.H.: Data Mining and Statistics: What’s the Connection? Stanford University, Stanford, CA 94305, 10 November 2016. http://statweb.stanford.edu/~jhf/ftp/dm-stat.pdf
- 3.Babcock, B., Datar, M., Motwani, R., O’Callaghan, L.: Maintaining variance and k-medians over data stream windows. In: Proceedings of ACM Symposium on Principles of Database Systems (2003)Google Scholar
- 5.Nazari, Z., et al.: A new hierarchical clustering algorithm. In: ICIIBMS 2015, Track2: Artificial Intelligence, Robotics, and Human-Computer Interaction, Okinawa, JapanGoogle Scholar
- 6.Alpydin, E.: Introduction to Machine Learning, pp. 143–158. The MIT Press, Cambridge (2010)Google Scholar