A Two-Step Iterative Procedure for Clustering of Binary Sequences
Association Rules (AR) are a well known data mining tool aiming to detect patterns of association in data bases. The major drawback to knowledge extraction through AR mining is the huge number of rules produced when dealing with large amounts of data. Several proposals in the literature tackle this problem with different approaches. In this framework, the general aim of the present proposal is to identify patterns of association in large binary data. We propose an iterative procedure combining clustering and dimensionality reduction techniques: each iteration involves a quantification of the starting binary attributes and an agglomerative algorithm on the obtained quantitative variables. The objective is to find a quantification that emphasizes the presence of groups of co-occurring attributes in data.
- Iodice D’Enza, A., Palumbo, F., & Greenacre, M. (2007). Exploratory data analysis leading towards the most interesting simple association rules. Computational Statistics and Data Analysis, doi:10.1016/j.csda.2007.10.006.Google Scholar
- Lenca, P., Vaillant, B., Meyer, P., & Lallich, S. (2007). Association rule interestingness measures: Experimental and theoretical studies. In G. Guillet & H. J. Hamilton (Eds.), Quality measures in data mining. Berlin: Springer.Google Scholar
- MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press.Google Scholar
- Palumbo, F., & Verde, R. (1996). Analisi Fattoriale Discriminante Non-Simmetrica su Predittori Qualitativi (in italian). In Atti della XXXVIII Riunione scientifica della Societ Italiana di Statistica, Rimini, Italy.Google Scholar
- Plasse, M., Niang, N., Saporta, G., Villeminot, A., & Leblond, L. (2007). Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large data set. Computational Statististics Data Analysis, doi:10.1016/j.csda.2007.02.020.zbMATHMathSciNetGoogle Scholar