Abstract
The ever-growing amount of digital data stored in relational databases resulted in the need for new approaches to extract useful information from these databases. One of those approaches, the DARA algorithm, is designed to transform data stored in relational databases into a vector space representation utilising information retrieval theory. The DARA algorithm has shown to produce improvements over other state-of-the-art approaches. However, the DARA suffers a major drawback when the cardinality of attributes in relations are very high. This is because the size of the vector space representation depends on the number of unique values of all attributes in the dataset. This issue can be solved by reducing the number of features generated from the DARA transformation process by selecting only part of the relevant features to be processed. Since relational data is transformed into a vector space representation (in the form of TF-IDF), only numerical values will be used to represent each record. As a result, discretizing these numerical attributes may also reduce the dimensionality of the transformed dataset. When clustering is applied to these datasets, clustering results of various dimensions may be produced as the number of bins used to discretize these numerical attributes is varied. From these clustering results, a final consensus clustering can be applied to produce a single clustering result which is a better fit, in some sense, than the existing clusterings. In this study, an ensemble DARA clustering approach that provides a mechanism to represent the consensus across multiple runs of a clustering algorithm on the relational datasets is proposed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alfred, R.: The Study of Dynamic Aggregation of Relational Attributes on Relational Data Mining. In: Alhajj, R., Gao, H., Li, X., Li, J., Zaïane, O.R. (eds.) ADMA 2007. LNCS (LNAI), vol. 4632, pp. 214–226. Springer, Heidelberg (2007)
Alfred, R.: Optimizing feature construction process for dynamic aggregation of relational attributes. J. Comput. Sci. 5, 864–877 (2009), doi:10.3844/jcssp.2009.864.877
Alfred, R., Kazakov, D.: Discretization Numbers for Multiple-Instances Problem in Relational Database. In: Ioannidis, Y., Novikov, B., Rachev, B. (eds.) ADBIS 2007. LNCS, vol. 4690, pp. 55–65. Springer, Heidelberg (2007)
Kheau, C.S., Alfred, R., Keng, L.H.: Dimensionality reduction in data summarization approach to learning relational data. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013, Part I. LNCS, vol. 7802, pp. 166–175. Springer, Heidelberg (2013)
Karunaratne, T., Bostrom, H., Norinder, U.: Pre-Processing Structured Data for Standard Machine Learning Algorithms by Supervised Graph Propositionalization – a Case Study with Medicinal Chemistry Datasets. In: Ninth International Conference on Machine Learning and Applications, pp. 828–833 (2010)
Li, Y., Luan, L., Sheng, Y., Yuan, Y.: Multi-relational Classification Based on the Contribution of Tables. In: International Conference on Artificial Intelligence and Computational Intelligence, pp. 370–374 (2009)
Pan, C., Wang, H.-Y.: Multi-relational Classification on the Basic of the Attribute Reduction Twice. Communication and Computer 6(11), 49–52 (2009)
He, J., Liu, H., Hu, B., Du, X., Wang, P.: Selecting Effective Features and Relations For Efficient Multi-Relational Classification. Computational Intelligence 26(3), 1467–8640 (2010)
Wrobel, S.: Inductive Logic Programming for Knowledge Discovery in Databases: Relational Data Mining, pp. 74–101. Springer, Berlin (2001)
Emce, W., Wettschereck, D.: Relational instance-based learning. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 122–130. Morgan Kaufmann, San Matco (1996)
Kirsten, M., Wrobel, S., Horvath, T.: Relational Distance Based Clustering. In: Page, D.L. (ed.) ILP 1998. LNCS, vol. 1446, pp. 261–270. Springer, Heidelberg (1998)
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and Unsupervised Discretisation of Continuous Features. In: ICML, pp. 194–202 (1995)
Knobbe, A.J., de Haas, M., Siebes, A.: Propositionalisation and Aggregates. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 277–288. Springer, Heidelberg (2001)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill Book Company (1984)
Strehl, A., Ghosh, J.: Cluster Ensembles A Knowledge Reuse Framework for Combining Multiple Partitions. Journal Machine Learning Resarch, 583–617 (February 2002)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann (1999)
Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: Model of consensus and weak partitions. IEEE Transaction on Pattern Analysis and Machine Intelligence 27(12), 1866–1881 (2005)
Topchy, A., Jain, A.K., Punch, W.: A mixture model for clustering ensembles. In: SIAM International Conference on Data Mining, Michigan State University (2004)
Topchy, A., Minaei Bidgoli, B., Jain, A.K., Punch, W.: Adaptive clustering ensembles. In: Proceeding International Conference on Pattern Recognition (ICPR), Cambridge, UK, pp. 272–275 (2004)
Topchy, A., Jain, A.K., Punch, W.: Combining multiple weak clusterings. In: Proceeding of the Third IEEE International Conference on Data Mining (2003)
Ahuja, S.: Regionalization of River Basins Using Cluster Ensemble. Journal of Water Resource and Protection, 560–566 (2012)
Yi, J., Yang, T., Jin, R., Jain, A.K., Mahdavi, M.: Robust Ensemble Clustering By Matrix Completion. In: IEEE 12th International Conference on Data Mining, pp. 1176–1181 (2012)
Nguyen, D.P., Hiemstra, D.: Ensemble clustering for result diversification. NIST Special Publications (2012)
Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics Oxford University 19(9), 1090–1099 (2003)
Gablentz, W., Koppen, M.: Robust clustering by evolutionary computation. In: Proceeding of the Fifth Online World Conference Soft Computing in Industrial Applications, WSC5 (2000)
Luo, H., Jing, F., Xie, X.: Combining multiple clusterings using information theory based genetic algorithm. In: IEEE International Conference on Computational Intelligence and Security, vol. 1, pp. 84–89 (2006)
Fern, X.Z., Brodley, C.E.: Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the 21st International Conference on Machine Learning, Canada (2004)
Hong, Y., Kwong, S., Chang, Y., Ren, Q.: Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognition Society 41(9), 2742–2756 (2008)
Quinlan, R.J.: C4.5: Programs for Machine Learning. Morgan Kaufmann Series in Machine Learning (January 1993)
Srinivasan, A., Muggleton, S., Sternberg, M.J.E., King, R.D.: Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction. Artificial Intelligence 85(1-2), 277–299 (1996)
Hong, Y., Kwong, S., Chang, Y., Ren, Q.: Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognition 41(9), 2742–2756 (2008)
Zhang, X., Jiao, L., Liu, F., Bo, L., Gong, M.: Spectral Clustering Ensemble Applied to SAR Image Segmentation. IEEE Transactions on Geoscience and Remote Sensing 46(7) (July 2008)
Karypis, G., Kumar, V.: Solving cluster ensemble problems by correlation’s matrix & GA. VLSI Design 11(3), 285–300 (2000)
Analoui, M., Sadighian, N.: Multilevel k-way Hypergraph Partitioning. IFIP International Federation for Information Processing, vol. 228, pp. 227–231 (2006)
Fred, A.L.N., Jain, A.K.: Data clustering using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 835–850 (2002)
Tayanov, V.: Some questions of consensus building using co-association. In: Proceedings of the 11th WSEAS international conference on Artificial Intelligence, Knowledge Engineering and Data Bases, AIKED 2012, pp. 61–66 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kheau, C.S., Alfred, R., Lau, H. (2013). eDARA: Ensembles DARA. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-53917-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53916-9
Online ISBN: 978-3-642-53917-6
eBook Packages: Computer ScienceComputer Science (R0)