Mining and applications of repeating patterns
 267 Downloads
Abstract
Mining the valuable knowledge from real data has been a hot topic for a long time. Repeating pattern is one of the important knowledge, occurring in many real applications such as musical data and medical data. In this paper, our purposes are to contribute an efficient mining algorithm for repeating patterns and to conduct a real application using the repeating patterns mined. In terms of mining the repeating patterns, although a number of past studies were made on this issue, the performance cannot still earn the users’ satisfactions especially for large data sets. For this issue, in this paper, we propose an efficient algorithm named Fast Mining of Repeating Patterns, which achieves high performance of discovering the repeating patterns by a novel index called QuickPattern Index. In terms of applications, a music recommender system named repeatingpatternbased music recommender system is proposed to deal with problems in music recommendation. Even facing a very sparse rating matrix, the recommendation can still be completed. The experimental results show that our proposed mining algorithm and recommender system outperform the previous works in terms of efficiency and effectiveness, respectively.
Keywords
Repeating pattern QuickPattern Index Data mining Knowledge discovery Music recommendation1 Introduction
The great progress of information technology makes the real data grow rapidly. Actually, there is a large amount of knowledge in these data such as graph data, sequence data, transaction data, and so on. Therefore, data mining on pattern discovery has been studied for many decades. To discover the valuable patterns hidden in the real data, a number of mining algorithms are proposed nowadays. In recent studies, the patterns are categorized into several categories, including association patterns, sequential patterns, cyclic patterns, repeating patterns, and so on. Different patterns are useful to different fields of data engineering. For association patterns, they are motivated by applications of market basket analysis to discover relations between products purchased. For sequential patterns, they are somewhat different from the association patterns, because the sequential patterns are with temporal continuities. For cyclic patterns, the starting pattern is also the ending pattern in a sequence, which can be viewed as the extension of sequential patterns. For repeating patterns, a repeating pattern contains a set of sequential elements, which can also be viewed as a sequence repeats in a regular form. For example, a string \(\{1, 2, 3, 1, 2, 4, 1, 2\}\) contains a repeating substring \(\{1, 2\}\), which is identified as a repeating pattern. The major difference between the cyclic patterns and repeating patterns is that a repeating pattern does not need to consider the start and end of a sequence must be the same. In fact, repeating patterns are popular, because they can be regarded as a set of representative patterns facilitating object recognition. Lots of repeating patterns appear in our life, such as musical data and medical data.
Example of a useritem rating matrix
Item 1  Item 2  Item 3  Item 4  Item 5  Item 6  

User 1  2  0  0  3  0  0 
User 2  2  0  0  0  0  4 
User 3  0  3  0  0  2  0 
User 4  0  2  0  2  0  3 
User 5  3  2  0  2  0  0 
User 6  1  0  0  0  3  0 
 I.
To accelerate the retrieval of repeating patterns, in this paper, we propose an algorithm named Fast Mining of Repeating Patterns (FMRP) to achieve high performance of mining repeating patterns by the proposed QuickPattern Index (QPI). With this index, the occurrences and positions of the patterns are kept to reduce the cost of searching the repeating patterns. Hence, without scanning the sequence iteratively, the repeating patterns can be discovered by only one scan of the input sequence. The experimental results reveal our proposed algorithm performs better than the compared methods in terms of execution time.
 II.
In addition to provide an efficient mining algorithm, another aim in this paper is to apply the repeating patterns to the real application—music recommendation. The main difference between this paper and previous recommender systems is that the previous recommender systems never consider the sequentially repeating patterns which represent the user listening senses. From the experimental results, we can know that the problems in traditional recommender systems can be alleviated clearly through the repeating patternbased recommendation.
2 Related work
2.1 Algorithms of mining of repeating patterns
Hsu et al. [9] proposed a method to generate the repeating patterns. In this method, a stringjoin operation and a data structure called RP tree were proposed to achieve high performance of mining of repeating patterns. The basic idea of this approach is to iteratively join two short repeating patterns into a long one. To speed up the join procedure, a tree structure named RP tree was proposed. Although the tree structure can reduce the time complexity of join operations, the checking cost is so high that the generation of repeating patterns is inefficient. In addition to RP tree, another tree structure for generating the repeating patterns is suffix tree. Basically, the suffix tree is the compressed tree for the nonempty suffixes of a string. Since a suffix tree is a compressed tree, it consists of an important idea that the procedure of mining of repeating patterns highly refers to its subtrees. Once constructed, several operations can be performed quickly, for instance, locating a substring, locating matches for a regular expression pattern, etc. Suffix trees also provide the lineartime solution for the longest common substring problem [16, 20]. Unfortunately, the construction of such a tree for a string takes much time and space.
2.2 Applications of repeating patterns
In the field of medical biology, repeating subsequences are a kind of repeating patterns. Actually, the repeating patterns in biological cell DNA occur in multiple copies throughout a genome. The functions and descriptions of these subsequences are currently being characterized by scientists. Tandem repeat is a kind of repeating patterns occurring in DNA when a pattern of nucleotides repeats and the repetitions are directly adjacent to each other. Several protein domains also form tandem repeats within their amino acid primary structures. In practical, tandem repeats are very helpful in the field of bioinformatics. In addition to bioinformatics, repeating patterns are usually supported for the matching of local image features. They can be modeled as a set of sparse repeated features in which the crystallographic group theory. Muller et al. [5] proposed an approach to detect symmetric structures and to reconstruct a 3D geometric model. Liu et al. [10, 24] proposed a new method for detection of repeated patterns following a Kronecker Product formulation. They handled problems of pose variation and varying brightness by employing the lowrank part of the rearranged input facade image. Automatic video summarization [11, 23] was proposed as an effective way to accelerate the video browsing and retrieval. The video structure is first analyzed by spatial–temporal analysis. Then, the video nontrivial repeating patterns are extracted to remove the visualcontent redundancy among videos.
2.3 Music recommendation
In general, music recommendation can be categorized into two main classifications, namely, userbased and itembased recommender systems. For userbased music recommender systems [8, 14], the basic idea is to predict the unknown ratings by the known ratings of relevant users. Bobadilla et al. [1] defined the significances of the users and items to predict the users’ ratings. In contrast to userbased recommender systems, the itembased recommender systems [4, 19] adopted known ratings of relevant items to predict the unknown ratings. In addition to the above recommender systems, Qi et al. [13] computed the users’ similarities based on the inferred tag ratings to conduct the recommendation. In [12], the user profiles and tag information are fused to generate a framework of joint itemtag recommendation. Su et al. [17, 18] integrated information of tags, play counts, and artists to improve the recommendation quality. Cheng et al. [3] integrated acoustic features and user personalities to conduct a personalized recommendation service. Rahman et al. [15] proposed a personalized recommender system by fuzzy influences. Xue et al. [22] attempted to discover the preference factors by matrix factorization models. Although the above recommender systems perform well, they still encounter problems of new item and data sparsity.
3 Proposed methods
3.1 Mining of repeating patterns
3.1.1 Overview of the proposed mining algorithm
 I.
Construction of the index
 II.
Generation of repeating patterns
Before describing our proposed method, the repeating pattern is defined as Definition 1.
Definition 1
For a string \(\hbox {B}\), if a substring \(\hbox {A}\) appears more than once, its length is larger than one and no any other substring contains \(\hbox {A}\), we call \(\hbox {A}\) is a repeating pattern of \(\hbox {B}\). \(\square \)
3.1.2 Construction of the index
Figure 2 shows the process of constructing the index. In this process, two indexes are constructed, including an array storing all patterns and an array storing the pattern positions in the input string. Through these two indexes, the next process called generation of repeating patterns can perform efficiently.
3.1.3 Generation of repeating patterns
3.2 Music recommendation by the repeating patterns
3.2.1 Overview of the proposed music recommender system
 I.
Offline preprocessing
 II.
Online recommendation
3.2.2 Offline preprocessing stage
A. Music lowlevel feature extraction
B. Music symbolization
C. Mining of repeating patterns
This operation is to generate the repeating patterns of each pattern string by our proposed mining algorithm. Here, the mining procedure is not described again. Finally, the Term Frequencies (called TF in this paper) of the repeating patterns are calculated, which can be defined as Definition 2.
Definition 2
Note that, for music recommendation, the repeating pattern is somewhat different from that in Definition 1. In formal, assume that there are two repeating substrings A and C in a string B, where A contains C. Here, even if A contains C, A and C are both kept as our repeating patterns for music recommendation. For example, the repeating patterns of a string \(\{1, 2, 3, 1, 2, 3\}\) include \(\{1, 2, 3\}\), \(\{1, 2\}\) and \(\{2, 3\}\).
D. Construction of rating classification model
3.2.3 Online recommendation stage
This stage is triggered with a visit of an active user. Basically, the main idea behind this stage is to regard the rating prediction as a rating classification. Because each user’s model has been generated in the offline preprocessing stage, all unknown ratings of the active user can be predicted quickly.
4 Illustrative examples
To make the proposed methods easy to understand, two examples are shown in this section.
4.1 Example for mining of repeating patterns
4.1.1 Construction of the index
Example of the resulting set P
Pattern  Occurrence  Position 

\(p_{1}\)  4  1, 5, 9, 13 
\(p_{2}\)  1  4 
\(p_{3}\)  4  2, 6, 10, 14 
\(p_{4}\)  1  8 
\(p_{5}\)  2  3, 11 
\(p_{6}\)  1  12 
\(p_{7}\)  2  7, 15 
\(p_{8}\)  1  16 
4.1.2 Generation of repeating patterns

Step 1: Considering Table 2, a pattern is selected from Table 2, referring to Line 1 of Fig. 3.

Step 1.1: The selected pattern \(p_{x}\) is conducted as the root and the next positions of \(p_{x}\) are grouped into the set \(p_{x}\).NextSteps. Here, the example pattern \(p_{x}\) is \(p_{1}\) and its NextSteps is \(P_{3}\).

Step 1.2: For each \(p_{x}\), find each pattern \(p_{i}\) in \(p_{x}\).NextSteps. If the occurrence of \(p_{i}\) is more than 1, link \(p_{i}\) and \(p_{x}\) as the path where the root is \(p_{x}\). Then, all patterns in the next positions of \(p_{i}\) are further grouped into the set \(p_{x}.{ NextTwoSteps}\). If the occurrences of \(p_{i}\) are more than 1, link \(p_{i}\) and all patterns in the next positions of \(p_{i}\). In this example, as shown in Fig. 7, the green grid is defined as {pattern#, (next positions), (parents)}. In Fig. 7, {\(p_{3}, (3, 7, 11, 15), (p_{1})\)} denotes that, the \(p_{i}\) is \(p_{3}\), where the next position set is {3, 7, 11, 15} and the parent is \(p_{1}\).

Step 2: Repeat Step 1 until all repeating patterns are generated.
4.2 Example for repeatingpatternbased music recommendation
5 Empirical study
Recalling the goals of this paper, the experiments contain two main parts, namely, evaluations for mining of repeating patterns and evaluations for repeatingpatternbased music recommendation. In the following subsections, the individual experimental results are shown separately in great detail.
5.1 Evaluations for mining of repeating patterns
5.1.1 Experimental settings
Parameter settings for mining of repeating patterns
Minimum  Maximum  Interval  

#Transactions  500  1000  100 
#Transaction length  500  1000  100 
#Pattern categories  10  200  50 
5.1.2 Comparisons of the proposed method and other methods in terms of execution time
5.2 Evaluations for repeatingpatternbased music recommendation
In the previous section, the efficiency of algorithm for mining the repeating patterns has been shown. Another issue in this paper is to convince the readers of usage of repeating patterns. Therefore, our intent for the following evaluations is to reveal how useful the repeating patterns are.
5.2.1 Experimental settings
Rating densities for different divided subsets
Data 1  Data 2  Data 3  Data 4  

Rating density  10%  7.5%  5%  2.5% 
#Patterns of different #categories of patterns for music
#Categories: 25  #Categories: 50  #Categories: 75  #Categories: 100  

#Patterns  41,785  43,563  40,806  38,692 
5.2.2 Evaluations for different numbers of patterns on different data sets
5.2.3 Comparisons with other methods on different rating data sets
6 Conclusion and future work
Frankly speaking, it is not easy to discover the valuable knowledge from our real life. For this issue, the past studies made on knowledge discovery can be categorized two classifications, namely, algorithmdriven and applicationdriven studies. Algorithmdriven studies focused on how to increase the efficiency, while the applicationdriven ones focused on how to successfully apply the mining algorithms to applications. In this paper, on one hand, we propose an efficient mining algorithm to discover the repeating patterns. On the other hand, an application for music recommendation is lifted to justify the motivation of pattern mining. In terms of mining algorithm, first, we construct an informative index called QPI containing positions and occurrences of patterns. It can effectively reduce the cost of mining of repeating patterns. By QPI, a prefixsearchbased algorithm named FMRP performs efficiently on mining the repeating patterns. In terms of application, a music recommender system based on the repeating patterns is proposed to cope with problems in traditional recommender systems. The main intent is to link the user preferences and the repeating patterns on music. The experimental results reveal that our proposed mining algorithm is more efficient than the compared methods on different data. Moreover, the proposed recommender system based on the repeating patterns performs more effective in facing very sparse data. This is just the beginning of applying the repeating patterns. In the future, the repeating patterns will further be used as representative features in recognizing objects in other real applications.
Notes
Acknowledgements
This research was supported by Ministry of Science and Technology, Taiwan, R.O.C. under grant no. MOST 1052221E230011MY2 and MOST 1052632S424001.
References
 1.Bobadilla, J., Hernando, A., Ortega, F., Gutiérrez, A.: Collaborative filtering based on significances. Inf. Sci. 185(1), 1–17 (2012)CrossRefGoogle Scholar
 2.Cooper, M., Foote, J.: Automatic music summarization via similarity analysis. In: The 2002 IEEE International Conference on Music Information Retrieval, pp. 81–85 (2002)Google Scholar
 3.Cheng, R., Tang, B.: A music recommendation system based on acoustic features and user personalities. In: The PacificAsia Conference on Knowledge Discovery and Data Mining, pp 203–213 (2016)Google Scholar
 4.Deshpande, M., Karypis, G.: Itembased top\(N\) recommendation algorithms. ACM Trans. Inf. Syst. 22(1), 143–177 (2004)CrossRefGoogle Scholar
 5.Friedman, S., Stamos, I.: Online detection of repeated structures in point clouds of urban scenes for compression and registration. Int. J. Comput. Vis. 102(1–3), 112–128 (2013)CrossRefGoogle Scholar
 6.Gool, L.V., Zeng, G., Wonka, P., Muller, P.: Imagebased procedural modeling of facades. In: The ACM SIGGRAPH Conference on Computer Graphics, pp. 63–130 (2007)Google Scholar
 7.Han, B.J., Hwang, E., Rho, S.: An efficient voice transcription scheme for music retrieval. In: The 2007 IEEE International Conference on Multimedia and Ubiquitous Engineering, pp. 28–26 (2007)Google Scholar
 8.Herlocker, J.L., Konstan, J.A., Borchers, A., Riedl, J.: An algorithmic framework for performing collaborative filtering. In: The International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 230–237 (1999)Google Scholar
 9.Hsu, J.L., Liu, C.C., Chen, L.P.: Discovering nontrivial repeating patterns in music data. IEEE Trans. Multimedia 3(3), 311–325 (2001)CrossRefGoogle Scholar
 10.Liu, J., Psarakis, E., Stamos, I.: Automatic Kronecker product model based detection of repeated patterns in 2D urban images. In: The 2013 IEEE International Conference on Computer Vision, pp. 401–408 (2013)Google Scholar
 11.Ma, Y.F., Lu, L., Zhang, H.J., Li, M.J.: A user attention model for video summarization. In: The Tenth ACM International Conference on Multimedia, pp. 533–542 (2002)Google Scholar
 12.Peng, J., Zeng, D.D., Zhao, H., Wang, F.: Collaborative filtering in social tagging systems based on joint itemtag recommendations. In: The ACM International Conference on Information and Knowledge Management, pp. 809–818 (2010)Google Scholar
 13.Qi, Q., Chen, Z., Liu, J., Hui, C., Wu, Q.: Using inferred tag ratings to improve userbased collaborative filtering. In: Annual ACM Symposium on Applied Computing, pp. 2008–2013 (2012)Google Scholar
 14.Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: an open architecture for collaborative filtering of netnews. In: The International Conference on ACM Computer Supported Cooperative Work, pp. 175–186 (1994)Google Scholar
 15.Rahman, M.S., Rahman, M.S., Chowdhury, S.U.I., Mahmood, A., Rahman, R.M.: A personalized music recommender service based on fuzzy inference system. In: The IEEE/ACIS 15th International Conference on Computer and Information Science (2016)Google Scholar
 16.Singh, A.: Ukkonen’s Suffix Tree Construction. (2014) Retrieved from http://www.geeksforgeeks.org/ukkonenssuffixtreeconstructionpart6/
 17.Su, J.H., Chang, W.Y., Tseng, V.S.: Personalized music recommendation by mining social media tags. In: The 17th International Conference on KnowledgeBased and Intelligent Information & Engineering Systems, pp. 291–300 (2013)Google Scholar
 18.Su, J.H., Chang, W.Y., Tseng, V.S.: Effective social contentbased collaborative filtering for music recommendation. Intell. Data Anal. 21(S1), S195–S216 (2017)CrossRefGoogle Scholar
 19.Sarwar, B.M., Karypis, G., Konstan, J., Riedl, J.: Itembased collaborative filtering recommendation algorithms. In: The International Conference on World Wide Web, pp. 285–295 (2001)Google Scholar
 20.Ukkonen, E.: Online construction of suffix tree. Algorithmica 14(3), 249–260 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
 21.Wang, M., Lu, L., Zhang, H.H.: Repeating pattern discovery from acoustic musical signals. In: The 2004 IEEE international conference on multimedia and expo, vol. 3, pp. 2019–2022 (2004)Google Scholar
 22.Xue, H.J., Dai, X.Y., Zhang, J., Huang, S., Chen, J.: Deep matrix factorization models for recommender systems. In: The TwentySixth International Joint Conference on Artificial Intelligence (2017)Google Scholar
 23.Xiao, R.G., Wang, Y.Y., Pan, H., Wu, F.: Automatic video summarization by spatiotemporal analysis and nontrivial repeating pattern detection. In: The 2008 IEEE Congress on Image and Signal Processing, pp. 555–559 (2008)Google Scholar
 24.Zhao, P., Fang, T., Xiao, J., Zhang, H., Zhao, Q., Quan, L.: Rectilinear parsing of architecture in urban environment. In: The 2010 IEEE Computer Vision and Pattern Recognition, pp. 342–349 (2010)Google Scholar
 25.https://www.csie.ntu.edu.tw/~cjlin/liblinear/. Accessed 5 Aug 2017
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.