eDARA: Ensembles DARA

Kheau, Chung Seng; Alfred, Rayner; Lau, HuiKeng

doi:10.1007/978-3-642-53917-6_7

Chung Seng Kheau²⁵,
Rayner Alfred²⁵ &
HuiKeng Lau²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8347))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

Abstract

The ever-growing amount of digital data stored in relational databases resulted in the need for new approaches to extract useful information from these databases. One of those approaches, the DARA algorithm, is designed to transform data stored in relational databases into a vector space representation utilising information retrieval theory. The DARA algorithm has shown to produce improvements over other state-of-the-art approaches. However, the DARA suffers a major drawback when the cardinality of attributes in relations are very high. This is because the size of the vector space representation depends on the number of unique values of all attributes in the dataset. This issue can be solved by reducing the number of features generated from the DARA transformation process by selecting only part of the relevant features to be processed. Since relational data is transformed into a vector space representation (in the form of TF-IDF), only numerical values will be used to represent each record. As a result, discretizing these numerical attributes may also reduce the dimensionality of the transformed dataset. When clustering is applied to these datasets, clustering results of various dimensions may be produced as the number of bins used to discretize these numerical attributes is varied. From these clustering results, a final consensus clustering can be applied to produce a single clustering result which is a better fit, in some sense, than the existing clusterings. In this study, an ensemble DARA clustering approach that provides a mechanism to represent the consensus across multiple runs of a clustering algorithm on the relational datasets is proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

k-Nearest Neighbour Using Ensemble Clustering Based on Feature Selection Approach to Learning Relational Data

k-NN Ensemble DARA Approach to Learning Relational

A Multi-objectives Genetic Algorithm Clustering Ensembles Based Approach to Summarize Relational Data

References

Alfred, R.: The Study of Dynamic Aggregation of Relational Attributes on Relational Data Mining. In: Alhajj, R., Gao, H., Li, X., Li, J., Zaïane, O.R. (eds.) ADMA 2007. LNCS (LNAI), vol. 4632, pp. 214–226. Springer, Heidelberg (2007)
Chapter Google Scholar
Alfred, R.: Optimizing feature construction process for dynamic aggregation of relational attributes. J. Comput. Sci. 5, 864–877 (2009), doi:10.3844/jcssp.2009.864.877
Article Google Scholar
Alfred, R., Kazakov, D.: Discretization Numbers for Multiple-Instances Problem in Relational Database. In: Ioannidis, Y., Novikov, B., Rachev, B. (eds.) ADBIS 2007. LNCS, vol. 4690, pp. 55–65. Springer, Heidelberg (2007)
Chapter Google Scholar
Kheau, C.S., Alfred, R., Keng, L.H.: Dimensionality reduction in data summarization approach to learning relational data. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013, Part I. LNCS, vol. 7802, pp. 166–175. Springer, Heidelberg (2013)
Chapter Google Scholar
Karunaratne, T., Bostrom, H., Norinder, U.: Pre-Processing Structured Data for Standard Machine Learning Algorithms by Supervised Graph Propositionalization – a Case Study with Medicinal Chemistry Datasets. In: Ninth International Conference on Machine Learning and Applications, pp. 828–833 (2010)
Google Scholar
Li, Y., Luan, L., Sheng, Y., Yuan, Y.: Multi-relational Classification Based on the Contribution of Tables. In: International Conference on Artificial Intelligence and Computational Intelligence, pp. 370–374 (2009)
Google Scholar
Pan, C., Wang, H.-Y.: Multi-relational Classification on the Basic of the Attribute Reduction Twice. Communication and Computer 6(11), 49–52 (2009)
MathSciNet Google Scholar
He, J., Liu, H., Hu, B., Du, X., Wang, P.: Selecting Effective Features and Relations For Efficient Multi-Relational Classification. Computational Intelligence 26(3), 1467–8640 (2010)
Article MathSciNet Google Scholar
Wrobel, S.: Inductive Logic Programming for Knowledge Discovery in Databases: Relational Data Mining, pp. 74–101. Springer, Berlin (2001)
Google Scholar
Emce, W., Wettschereck, D.: Relational instance-based learning. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 122–130. Morgan Kaufmann, San Matco (1996)
Google Scholar
Kirsten, M., Wrobel, S., Horvath, T.: Relational Distance Based Clustering. In: Page, D.L. (ed.) ILP 1998. LNCS, vol. 1446, pp. 261–270. Springer, Heidelberg (1998)
Chapter Google Scholar
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and Unsupervised Discretisation of Continuous Features. In: ICML, pp. 194–202 (1995)
Google Scholar
Knobbe, A.J., de Haas, M., Siebes, A.: Propositionalisation and Aggregates. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 277–288. Springer, Heidelberg (2001)
Chapter Google Scholar
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill Book Company (1984)
Google Scholar
Strehl, A., Ghosh, J.: Cluster Ensembles A Knowledge Reuse Framework for Combining Multiple Partitions. Journal Machine Learning Resarch, 583–617 (February 2002)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann (1999)
Google Scholar
Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: Model of consensus and weak partitions. IEEE Transaction on Pattern Analysis and Machine Intelligence 27(12), 1866–1881 (2005)
Article Google Scholar
Topchy, A., Jain, A.K., Punch, W.: A mixture model for clustering ensembles. In: SIAM International Conference on Data Mining, Michigan State University (2004)
Google Scholar
Topchy, A., Minaei Bidgoli, B., Jain, A.K., Punch, W.: Adaptive clustering ensembles. In: Proceeding International Conference on Pattern Recognition (ICPR), Cambridge, UK, pp. 272–275 (2004)
Google Scholar
Topchy, A., Jain, A.K., Punch, W.: Combining multiple weak clusterings. In: Proceeding of the Third IEEE International Conference on Data Mining (2003)
Google Scholar
Ahuja, S.: Regionalization of River Basins Using Cluster Ensemble. Journal of Water Resource and Protection, 560–566 (2012)
Google Scholar
Yi, J., Yang, T., Jin, R., Jain, A.K., Mahdavi, M.: Robust Ensemble Clustering By Matrix Completion. In: IEEE 12th International Conference on Data Mining, pp. 1176–1181 (2012)
Google Scholar
Nguyen, D.P., Hiemstra, D.: Ensemble clustering for result diversification. NIST Special Publications (2012)
Google Scholar
Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics Oxford University 19(9), 1090–1099 (2003)
Article Google Scholar
Gablentz, W., Koppen, M.: Robust clustering by evolutionary computation. In: Proceeding of the Fifth Online World Conference Soft Computing in Industrial Applications, WSC5 (2000)
Google Scholar
Luo, H., Jing, F., Xie, X.: Combining multiple clusterings using information theory based genetic algorithm. In: IEEE International Conference on Computational Intelligence and Security, vol. 1, pp. 84–89 (2006)
Google Scholar
Fern, X.Z., Brodley, C.E.: Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the 21st International Conference on Machine Learning, Canada (2004)
Google Scholar
Hong, Y., Kwong, S., Chang, Y., Ren, Q.: Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognition Society 41(9), 2742–2756 (2008)
Article MATH Google Scholar
Quinlan, R.J.: C4.5: Programs for Machine Learning. Morgan Kaufmann Series in Machine Learning (January 1993)
Google Scholar
Srinivasan, A., Muggleton, S., Sternberg, M.J.E., King, R.D.: Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction. Artificial Intelligence 85(1-2), 277–299 (1996)
Article Google Scholar
Hong, Y., Kwong, S., Chang, Y., Ren, Q.: Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognition 41(9), 2742–2756 (2008)
Article MATH Google Scholar
Zhang, X., Jiao, L., Liu, F., Bo, L., Gong, M.: Spectral Clustering Ensemble Applied to SAR Image Segmentation. IEEE Transactions on Geoscience and Remote Sensing 46(7) (July 2008)
Google Scholar
Karypis, G., Kumar, V.: Solving cluster ensemble problems by correlation’s matrix & GA. VLSI Design 11(3), 285–300 (2000)
Article Google Scholar
Analoui, M., Sadighian, N.: Multilevel k-way Hypergraph Partitioning. IFIP International Federation for Information Processing, vol. 228, pp. 227–231 (2006)
Google Scholar
Fred, A.L.N., Jain, A.K.: Data clustering using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 835–850 (2002)
Google Scholar
Tayanov, V.: Some questions of consensus building using co-association. In: Proceedings of the 11th WSEAS international conference on Artificial Intelligence, Knowledge Engineering and Data Bases, AIKED 2012, pp. 61–66 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering and Information Technology, Universiti Malaysia Sabah, Jalan UMS, 88400, Kota Kinabalu, Sabah, Malaysia
Chung Seng Kheau, Rayner Alfred & HuiKeng Lau

Authors

Chung Seng Kheau
View author publications
You can also search for this author in PubMed Google Scholar
Rayner Alfred
View author publications
You can also search for this author in PubMed Google Scholar
HuiKeng Lau
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

US Air Force Office of Scientific Research, 106-0032, Tokyo, Japan
Hiroshi Motoda
School of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
Zhaohui Wu
Faculty of Engineering and Information Technology, University of Technology, Chippendale, 2008, Sydney, NSW, Australia
Longbing Cao
Department of Computing Science, Edmonton, University of Alberta, T6G 2E8, Canada
Osmar Zaiane
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Min Yao
School of Computer Science, Fudan University, 200433, Shanghai, China
Wei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kheau, C.S., Alfred, R., Lau, H. (2013). eDARA: Ensembles DARA. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-53917-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53916-9
Online ISBN: 978-3-642-53917-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

eDARA: Ensembles DARA

Abstract

Access this chapter

Preview

Similar content being viewed by others

k-Nearest Neighbour Using Ensemble Clustering Based on Feature Selection Approach to Learning Relational Data

k-NN Ensemble DARA Approach to Learning Relational

A Multi-objectives Genetic Algorithm Clustering Ensembles Based Approach to Summarize Relational Data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

eDARA: Ensembles DARA

Abstract

Access this chapter

Preview

Similar content being viewed by others

k-Nearest Neighbour Using Ensemble Clustering Based on Feature Selection Approach to Learning Relational Data

k-NN Ensemble DARA Approach to Learning Relational

A Multi-objectives Genetic Algorithm Clustering Ensembles Based Approach to Summarize Relational Data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation