Skip to main content

eDARA: Ensembles DARA

  • Conference paper
  • 3168 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8347))

Abstract

The ever-growing amount of digital data stored in relational databases resulted in the need for new approaches to extract useful information from these databases. One of those approaches, the DARA algorithm, is designed to transform data stored in relational databases into a vector space representation utilising information retrieval theory. The DARA algorithm has shown to produce improvements over other state-of-the-art approaches. However, the DARA suffers a major drawback when the cardinality of attributes in relations are very high. This is because the size of the vector space representation depends on the number of unique values of all attributes in the dataset. This issue can be solved by reducing the number of features generated from the DARA transformation process by selecting only part of the relevant features to be processed. Since relational data is transformed into a vector space representation (in the form of TF-IDF), only numerical values will be used to represent each record. As a result, discretizing these numerical attributes may also reduce the dimensionality of the transformed dataset. When clustering is applied to these datasets, clustering results of various dimensions may be produced as the number of bins used to discretize these numerical attributes is varied. From these clustering results, a final consensus clustering can be applied to produce a single clustering result which is a better fit, in some sense, than the existing clusterings. In this study, an ensemble DARA clustering approach that provides a mechanism to represent the consensus across multiple runs of a clustering algorithm on the relational datasets is proposed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alfred, R.: The Study of Dynamic Aggregation of Relational Attributes on Relational Data Mining. In: Alhajj, R., Gao, H., Li, X., Li, J., Zaïane, O.R. (eds.) ADMA 2007. LNCS (LNAI), vol. 4632, pp. 214–226. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  2. Alfred, R.: Optimizing feature construction process for dynamic aggregation of relational attributes. J. Comput. Sci. 5, 864–877 (2009), doi:10.3844/jcssp.2009.864.877

    Article  Google Scholar 

  3. Alfred, R., Kazakov, D.: Discretization Numbers for Multiple-Instances Problem in Relational Database. In: Ioannidis, Y., Novikov, B., Rachev, B. (eds.) ADBIS 2007. LNCS, vol. 4690, pp. 55–65. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Kheau, C.S., Alfred, R., Keng, L.H.: Dimensionality reduction in data summarization approach to learning relational data. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013, Part I. LNCS, vol. 7802, pp. 166–175. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  5. Karunaratne, T., Bostrom, H., Norinder, U.: Pre-Processing Structured Data for Standard Machine Learning Algorithms by Supervised Graph Propositionalization – a Case Study with Medicinal Chemistry Datasets. In: Ninth International Conference on Machine Learning and Applications, pp. 828–833 (2010)

    Google Scholar 

  6. Li, Y., Luan, L., Sheng, Y., Yuan, Y.: Multi-relational Classification Based on the Contribution of Tables. In: International Conference on Artificial Intelligence and Computational Intelligence, pp. 370–374 (2009)

    Google Scholar 

  7. Pan, C., Wang, H.-Y.: Multi-relational Classification on the Basic of the Attribute Reduction Twice. Communication and Computer 6(11), 49–52 (2009)

    MathSciNet  Google Scholar 

  8. He, J., Liu, H., Hu, B., Du, X., Wang, P.: Selecting Effective Features and Relations For Efficient Multi-Relational Classification. Computational Intelligence 26(3), 1467–8640 (2010)

    Article  MathSciNet  Google Scholar 

  9. Wrobel, S.: Inductive Logic Programming for Knowledge Discovery in Databases: Relational Data Mining, pp. 74–101. Springer, Berlin (2001)

    Google Scholar 

  10. Emce, W., Wettschereck, D.: Relational instance-based learning. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 122–130. Morgan Kaufmann, San Matco (1996)

    Google Scholar 

  11. Kirsten, M., Wrobel, S., Horvath, T.: Relational Distance Based Clustering. In: Page, D.L. (ed.) ILP 1998. LNCS, vol. 1446, pp. 261–270. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  12. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and Unsupervised Discretisation of Continuous Features. In: ICML, pp. 194–202 (1995)

    Google Scholar 

  13. Knobbe, A.J., de Haas, M., Siebes, A.: Propositionalisation and Aggregates. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 277–288. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  14. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill Book Company (1984)

    Google Scholar 

  15. Strehl, A., Ghosh, J.: Cluster Ensembles A Knowledge Reuse Framework for Combining Multiple Partitions. Journal Machine Learning Resarch, 583–617 (February 2002)

    Google Scholar 

  16. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann (1999)

    Google Scholar 

  17. Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: Model of consensus and weak partitions. IEEE Transaction on Pattern Analysis and Machine Intelligence 27(12), 1866–1881 (2005)

    Article  Google Scholar 

  18. Topchy, A., Jain, A.K., Punch, W.: A mixture model for clustering ensembles. In: SIAM International Conference on Data Mining, Michigan State University (2004)

    Google Scholar 

  19. Topchy, A., Minaei Bidgoli, B., Jain, A.K., Punch, W.: Adaptive clustering ensembles. In: Proceeding International Conference on Pattern Recognition (ICPR), Cambridge, UK, pp. 272–275 (2004)

    Google Scholar 

  20. Topchy, A., Jain, A.K., Punch, W.: Combining multiple weak clusterings. In: Proceeding of the Third IEEE International Conference on Data Mining (2003)

    Google Scholar 

  21. Ahuja, S.: Regionalization of River Basins Using Cluster Ensemble. Journal of Water Resource and Protection, 560–566 (2012)

    Google Scholar 

  22. Yi, J., Yang, T., Jin, R., Jain, A.K., Mahdavi, M.: Robust Ensemble Clustering By Matrix Completion. In: IEEE 12th International Conference on Data Mining, pp. 1176–1181 (2012)

    Google Scholar 

  23. Nguyen, D.P., Hiemstra, D.: Ensemble clustering for result diversification. NIST Special Publications (2012)

    Google Scholar 

  24. Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics Oxford University 19(9), 1090–1099 (2003)

    Article  Google Scholar 

  25. Gablentz, W., Koppen, M.: Robust clustering by evolutionary computation. In: Proceeding of the Fifth Online World Conference Soft Computing in Industrial Applications, WSC5 (2000)

    Google Scholar 

  26. Luo, H., Jing, F., Xie, X.: Combining multiple clusterings using information theory based genetic algorithm. In: IEEE International Conference on Computational Intelligence and Security, vol. 1, pp. 84–89 (2006)

    Google Scholar 

  27. Fern, X.Z., Brodley, C.E.: Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the 21st International Conference on Machine Learning, Canada (2004)

    Google Scholar 

  28. Hong, Y., Kwong, S., Chang, Y., Ren, Q.: Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognition Society 41(9), 2742–2756 (2008)

    Article  MATH  Google Scholar 

  29. Quinlan, R.J.: C4.5: Programs for Machine Learning. Morgan Kaufmann Series in Machine Learning (January 1993)

    Google Scholar 

  30. Srinivasan, A., Muggleton, S., Sternberg, M.J.E., King, R.D.: Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction. Artificial Intelligence 85(1-2), 277–299 (1996)

    Article  Google Scholar 

  31. Hong, Y., Kwong, S., Chang, Y., Ren, Q.: Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognition 41(9), 2742–2756 (2008)

    Article  MATH  Google Scholar 

  32. Zhang, X., Jiao, L., Liu, F., Bo, L., Gong, M.: Spectral Clustering Ensemble Applied to SAR Image Segmentation. IEEE Transactions on Geoscience and Remote Sensing 46(7) (July 2008)

    Google Scholar 

  33. Karypis, G., Kumar, V.: Solving cluster ensemble problems by correlation’s matrix & GA. VLSI Design 11(3), 285–300 (2000)

    Article  Google Scholar 

  34. Analoui, M., Sadighian, N.: Multilevel k-way Hypergraph Partitioning. IFIP International Federation for Information Processing, vol. 228, pp. 227–231 (2006)

    Google Scholar 

  35. Fred, A.L.N., Jain, A.K.: Data clustering using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 835–850 (2002)

    Google Scholar 

  36. Tayanov, V.: Some questions of consensus building using co-association. In: Proceedings of the 11th WSEAS international conference on Artificial Intelligence, Knowledge Engineering and Data Bases, AIKED 2012, pp. 61–66 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kheau, C.S., Alfred, R., Lau, H. (2013). eDARA: Ensembles DARA. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-53917-6_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-53916-9

  • Online ISBN: 978-3-642-53917-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics