A Multi-objectives Genetic Algorithm Clustering Ensembles Based Approach to Summarize Relational Data

  • Rayner AlfredEmail author
  • Gabriel Jong Chiye
  • Yuto Lim
  • Chin Kim On
  • Joe Henry Obit
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 652)


In learning relational data, the Dynamic Aggregation of Relational Attributes algorithm is capable to transform a multi-relational database into a vector space representation, in which a traditional clustering algorithm can then be applied directly to summarize relational data. However, the performance of the algorithm is highly dependent on the quality of clusters produced. A small change in the initialization of the clustering algorithm parameters may cause adverse effects to the clusters quality produced. In optimizing the quality of clusters, a Genetic Algorithm is used to find the best combination of initializations in order to produce the optimal clusters. The proposed method involves the task of finding the best initialization with respect to the number of clusters, proximity distance measurements, fitness functions, and classifiers used for the evaluation. Based on the results obtained, clustering coupled with Euclidean distance is found to perform better in the classification stage compared to using clustering coupled with Cosine similarity. Based on the findings, the cluster entropy is the best fitness function, followed by multi-objectives fitness function used in the genetic algorithm. This is most probably because of the involvement of external measurement that takes the class label into consideration in optimizing the structure of the cluster results. In short, this paper shows the influence of varying the initialization values on the predictive performance.


Relational data mining k-means Clustering Ensembles Genetic algorithm Multi-objectives 


  1. 1.
    Cattral, R., Oppacher, F., Graham, K.J.L.: Techniques for evolutionary rule discovery in data mining. In: Conference on Evolutionary Computation, pp. 1737–1744 (2009)Google Scholar
  2. 2.
    Xu, L., Jiang, C., Wang, J., Yuan, J., Ren, Y.: Information security in big data: privacy and data mining. In: IEEE 2014, pp. 1149–1176 (2014)Google Scholar
  3. 3.
    Dzeroski, S.: Relational data mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 887–911. Springer US (2010)Google Scholar
  4. 4.
    Ling, P., Rong, X.: Double-Phase Locality Sensitive Hashing of neighborhood development for multi-relational data. In: 13th UK Workshop on Computational Intelligence (UKCI), pp. 206–213 (2013)Google Scholar
  5. 5.
    Mistry, U., Thakkar, A.R.: Link-based classification for Multi-Relational database. In: Recent Advances and Innovations in Engineering (ICRAIE), pp. 1–6 (2014)Google Scholar
  6. 6.
    Zhang, W.: Multi-relational data mining based on higher-order inductive logic. In: WRI Global Congress in Intelligent Systems, Xiamen, pp. 453–458 (2009)Google Scholar
  7. 7.
    Roth, D., Yih, W.-T.: Propositionalization of relational learning: an information extraction case study. In: 17th International Joint Conference on Artificial Intelligence, Seattle (2001)Google Scholar
  8. 8.
    Nguyen, T.-S., Duong, T.-A., Kheau, C.S., Alfred, R., Keng, L.H.: Dimensionality reduction in data summarization approach to learning relational data. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013, Part I. LNCS, vol. 7802, pp. 166–175. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  9. 9.
    Lu, B., Ju, F.: An optimized genetic K-means clustering algorithm. In: International Conference on Computer Science and Information Processing, pp. 1296–1299 (2012)Google Scholar
  10. 10.
    Li, T., Chen, Y.: A weight entropy k-means algorithm for clustering dataset with mixed numeric and categorical data. In: Fifth International Conference on Fuzzy Systems and Knowledge Discovery, Shandong, pp. 36–41(2008)Google Scholar
  11. 11.
    Cui, X., Potok, T.E., Palathingal, P.: Document clustering using particle swarm optimization. In: IEEE Swarm Intelligence Symposium 2005, pp. 185–191(2005)Google Scholar
  12. 12.
    Abdel-Kader, R.F.: Genetically improved PSO algorithm for efficient data clustering. In: 2nd International Conference on Machine Learning and Computing (ICMLC), pp. 71–75 (2010)Google Scholar
  13. 13.
    Bharwad, N.D., Goswami, M.M.: Proposed efficient approach for classification for multi-relational data mining using Bayesian Belief Network. In: 2014 International Conference on Green Computing Communication and Electrical Engineering, pp. 1–4 (2004)Google Scholar
  14. 14.
    Muggleton, S.: Inductive Logic Programming. New Gener. Comput. 8(4), 295–318 (1991)CrossRefzbMATHGoogle Scholar
  15. 15.
    Guo, J., Zheng, L., Li, T.: An efficient graph-based multi-relational data mining algorithm. In: International Conference on Computational Intelligence and Security, pp. 176–180 (2007)Google Scholar
  16. 16.
    Dutta, D., Dutta, P., Sil, J.: Data clustering with mixed features by multi objective generic algorithm. In: 12th International Conference on Hybrid Intelligent Systems, Pune, pp. 336–341 (2012)Google Scholar
  17. 17.
    Shah, N., Mahajan, S.: Document clustering: a detailed review. Int. J. Appl. Inf. Syst. 4, 30–38 (2012)Google Scholar
  18. 18.
    Chen, C.-L., Tseng, F.S.C., Liang, T.: An integration of WordNet and fuzzy association rule mining for multi-label document clustering. Data Knowl. Eng. 69(11), 1208–1226 (2010)CrossRefGoogle Scholar
  19. 19.
    Pettinger, D., Di Fatta, G.: Space partitioning for scalable k-means. In: 9th International Conference in Machine Learning and Apps (ICMLA), pp. 319–324 (2010)Google Scholar
  20. 20.
    Rendon, E., Abundez, A.A.I., Quiroz, E.M.: Internal versus External cluster validation indexes. Int. J. Comput. and Commun. 5(1), 27–32 (2011)Google Scholar
  21. 21.
    Bilal, M., Masud, S., Athar, S.: FPGA design for statistics-inspired approximate sum-of-squared-error computation in multimedia applications. IEEE Trans. Circ. Syst. II: Express Briefs 59(8), 506–510 (2012)CrossRefGoogle Scholar
  22. 22.
    Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, London (1999)zbMATHGoogle Scholar
  23. 23.
    Razali, N.M., Geraghty, J.: Genetic algorithm performance with different selection strategies in solving TSP. In: Proceedings of the World Congress on Engineering 2011, London, vol. II (2011)Google Scholar
  24. 24.
    Wahid, A., Gao, X., Peter, A.: Multi-view clustering of web documents using multi-objective genetic algorithm. In: 2014 IEEE Congress Evolutionary Computation (CEC), pp. 2625–2632 (2014)Google Scholar
  25. 25.
    Wen, X., Li, X., Gao, L., Wan, L., Wang, W.: Multi-objective genetic algorithm for integrated process planning and scheduling with fuzzy processing time. In: 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI), pp. 293–298 (2013)Google Scholar
  26. 26.
    Konak, A., Coit, D.W., Smith, A.E.: Multi-objective optimization using genetic algorithms: a tutorial. Reliab. Eng. Syst. Saf. 9(9), 992–1007 (2006)CrossRefGoogle Scholar
  27. 27.
    Ismail, F.S., Yusof, R., Waqiyuddin, S.M.M.: Multi-objective optimization problems: method and application. In: 2011 4th International Conference on Modeling, Simulation and Applied Optimization (ICMSAO), pp. 1–6 (2011)Google Scholar
  28. 28.
    Zeghichi, N., Assas, M., Mouss, L.H.: Genetic algorithm with pareto fronts for multi-criteria optimization case study milling parameters optimization. In: 2011 5th International Conference on Software, Knowledge Information, Industrial Management and Applications (SKIMA), Benevento, pp. 1–5 (2011)Google Scholar
  29. 29.
    Atashkari, K., NarimanZadeh, N., Ghavimi, A.R., Mahmoodabadi, M.J., Aghaienezhad, F.: Multi-objective optimization of power and heating system based on artificial bee colony. In: International Symposium on Innovations in Intelligent Systems and Applications (INISTA), Istanbul, pp. 64–68 (2011)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2016

Authors and Affiliations

  • Rayner Alfred
    • 1
    Email author
  • Gabriel Jong Chiye
    • 1
  • Yuto Lim
    • 2
  • Chin Kim On
    • 1
  • Joe Henry Obit
    • 1
  1. 1.Faculty of Computing and InformaticsUniversiti Malaysia SabahKota KinabaluMalaysia
  2. 2.School of Information ScienceJapan Advanced Institute of Science and TechnologyNomiJapan

Personalised recommendations