Cluster Computing

, Volume 22, Supplement 3, pp 7287–7311 | Cite as

Euclidean space based hierarchical clusterers combinations: an application to software clustering

  • Rashid NaseemEmail author
  • Mustafa Mat Deris
  • Onaiza Maqbool
  • Sara Shahzad


Hierarchical clustering groups similar entities on the basis of some similarity (or distance) association and results in a tree like structure, called dendrogram. Dendrograms represent clusters in a nested manner, where at each step an entity makes a new cluster or merges into an existing cluster. Hierarchical clustering has many applications, therefore researchers have made efforts to come up with improved hierarchical clustering approaches. An approach that has received attention is based on combining clustering results, since different hierarchical clustering algorithms produce different dendrograms and their combination has produced more promising results as compared to individual hierarchical clustering. This paper proposes the hierarchical clustering combination (HCC) approach which uses the different types of structural features present in the dendrogram. Firstly, the dendrograms are represented in a 4+N (4 is the extracted number of features and can be extended to N number) dimensional euclidean space (4+NDES) which results in vector matrices. 4+NDES is the structural representation of the dendrogram which contains not only the relative features but also the absolute features of the entities in the dendrogram. Then the vector matrices are aggregated and the distance is calculated between each two vector using the Euclidean distance measure. The final hierarchy is obtained using a recovery tool like individual hierarchical clustering. 4+NDES-HCC utilizes the structural contents of the dendrogram and has the flexibility to handle an increasing number of features. The proposed approach is tested for software clustering which plays an important role in maintenance of software systems. The experimental results of the proposed approach and comparative analysis with existing approaches reveal the effectiveness of the HCC for software clustering.


Hierarchical clusterers combinations Euclidean space Software clustering 


  1. 1.
    Abi-Antoun, M., Ammar, N., Hailat, Z.: Extraction of ownership object graphs from object-oriented code. In: Proceedings of the 8th International ACM SIGSOFT Conference on Quality of Software Architectures—QoSA ’12, p. 133. ACM Press, New York (2012).
  2. 2.
    Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. CRC Press, Boca Raton (2013)CrossRefGoogle Scholar
  3. 3.
    Amarjeet, Chhabra, J.K.: Harmony search based remodularization for object-oriented software systems. Comput. Lang. Syst. Struct. 47, 153–169 (2017). CrossRefGoogle Scholar
  4. 4.
    Amarjeet, Chhabra, J.K.: TA-ABC: two-archive artificial bee colony for multi-objective software module clustering problem. J. Intell. Syst. (2017). CrossRefGoogle Scholar
  5. 5.
    Andritsos, P., Tzerpos, V.: Information-theoretic software clustering. IEEE Trans. Softw. Eng. 31(2), 150–165 (2005). CrossRefGoogle Scholar
  6. 6.
    Anquetil, N., Lethbridge, T.C.: Experiments with clustering as a software remodularization method. In: Working Conference on Reverse Engineering, pp. 235–255. IEEE (1999).
  7. 7.
    Anquetil, N., Lethbridge, T.C.: Comparative study of clustering algorithms and abstract representations for software remodularisation. IEE Proc. Softw. 150(3), 185–201 (2003). CrossRefGoogle Scholar
  8. 8.
    Bittencourt, R.A., Guerrero, D.D.S.: Comparison of graph clustering algorithms for recovering software architecture module views. In: 2009 13th European Conference on Software Maintenance and Reengineering, pp. 251–254. IEEE (2009).
  9. 9.
    Candela, I., Bavota, G., Russo, B., Oliveto, R.: Using cohesion and coupling for software remodularization. ACM Trans. Softw. Eng. Methodol. 25(3), 1–28 (2016). CrossRefGoogle Scholar
  10. 10.
    Choi, S.S., Sung-Hyuk, C., Tappert, C.C.: A survey of binary similarity and distance measures. J. Syst. Cybern. Inform. 8(1), 43–48 (2010).
  11. 11.
    Chong, C.Y., Lee, S.P., Ling, T.C.: Efficient software clustering technique using an adaptive and preventive dendrogram cutting approach. Inf. Softw. Technol. 55(11), 1994–2012 (2013). CrossRefGoogle Scholar
  12. 12.
    Corazza, A., Di Martino, S., Scanniello, G.: A probabilistic based approach towards software system clustering. In: 2010 14th European Conference on Software Maintenance and Reengineering, pp. 88–96. IEEE (2010).
  13. 13.
    Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.: Weighing lexical information for software clustering in the context of architecture recovery. Empir. Softw. Eng. 21(1), 72–103 (2016). CrossRefGoogle Scholar
  14. 14.
    Cui, J.F., Chae, H.S.: Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems. Inf. Softw. Technol. 53(6), 601–614 (2011). CrossRefGoogle Scholar
  15. 15.
    Davey, J., Burd, E.: Evaluating the suitability of data clustering for software remodularisation. In: Working Conference on Reverse Engineering, pp. 268–276. IEEE (2000).
  16. 16.
    Deursen, A.V., Kuipers, T.: Finding classes in legacy code using cluster analysis. In: Workshop on Object Oriented Reengineering, pp. 1–5 (1997)Google Scholar
  17. 17.
    Dugerdil, P., Jossi, S.: Reverse-architecting legacy software based on roles: an industrial experiment. In: Software and Data Technologies, pp. 114–127. Springer, Berlin (2009). Google Scholar
  18. 18.
    El-Ramly, M., Iglinski, P., Stroulia, E., Sorenson, P., Matichuk, B.: Modeling the system-user dialog using interaction traces. In: Proceedings of the Eighth Working Conference on Reverse Engineering, pp. 208–217 (2001).
  19. 19.
    François-Joseph Lapointe, P.L.: Comparison tests for dendrograms: a comparative evaluation. J. Classif. 12(2), 265–282 (1995). CrossRefGoogle Scholar
  20. 20.
    Garcia, J., Ivkovic, I., Medvidovic, N.: A comparative analysis of software architecture recovery techniques. In: IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 486–496. IEEE (2013).
  21. 21.
    Glorie, M., Zaidman, A., van Deursen, A., Hofland, L.: Splitting a large software repository for easing future software evolution-an industrial experience report. J. Softw. Maint. Evol. Res. Pract. 21(2), 113–141 (2009). CrossRefGoogle Scholar
  22. 22.
    Gueheneuc, Y.G., Antoniol, G.: DeMIMA: a multilayered approach for design pattern identification. IEEE Trans. Softw. Eng. 34(5), 667–684 (2008). CrossRefGoogle Scholar
  23. 23.
    Hall, M., Walkinshaw, N., McMinn, P.: Supervised software modularisation. In: IEEE International Conference on Software Maintenance (ICSM), pp. 472–481. IEEE (2012).
  24. 24.
    Hoffman, K.: Analysis in Euclidean Space. Courier Corporation, Mineola (2013)zbMATHGoogle Scholar
  25. 25.
    Huang, J., Liu, J., Yao, X.: A multi-agent evolutionary algorithm for software module clustering problems. Soft Comput. 21(12), 3415–3428 (2017). CrossRefGoogle Scholar
  26. 26.
    Ibrahim, A., Rayside, D., Kashef, R.: Cooperative based software clustering on dependency graphs. In: Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1–6. IEEE, Canada (2014).
  27. 27.
    Izadkhah, H., Elgedawy, I., Isazadeh, A.: E-CDGM: an evolutionary call-dependency graph modularization approach for software systems. Cybern. Inf. Technol. 16(3), 70–90 (2016). CrossRefGoogle Scholar
  28. 28.
    Jahnke, J.: Reverse engineering software architecture using rough clusters. In: IEEE Annual Meeting of the Fuzzy Information, vol. 1, pp. 4–9. IEEE (2004).
  29. 29.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River (1988)zbMATHGoogle Scholar
  30. 30.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999). CrossRefGoogle Scholar
  31. 31.
    Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 32(3), 241–254 (1967). CrossRefzbMATHGoogle Scholar
  32. 32.
    Kanellopoulos, Y., Antonellis, P., Tjortjis, C., Makris, C.: k-Attractors: a clustering algorithm for software measurement data analysis. In: IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007), pp. 358–365. IEEE (2007).
  33. 33.
    Kashef, R.F., Kamel, M.S.: Cooperative clustering. Pattern Recogn. 43(6), 2315–2329 (2010). CrossRefzbMATHGoogle Scholar
  34. 34.
    Koschke, R., Eisenbarth, T.: A framework for experimental evaluation of clustering techniques. In: International Workshop on Program Comprehension, pp. 201–210. IEEE Computer Society (2000).
  35. 35.
    Kramer, H.H., Uchoa, E., Fampa, M., Köhler, V., Vanderbeck, F.: Column generation approaches for the software clustering problem. Comput. Optim. Appl. 64(3), 843–864 (2016). MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    Kumari, A.C., Srinivas, K.: Hyper-heuristic approach for multi-objective software module clustering. J. Syst. Softw. 117, 384–401 (2016). CrossRefGoogle Scholar
  37. 37.
    Lakhotia, A.: A unified framework for expressing software subsystem classification techniques. J. Syst. Softw. 36(3), 211–231 (1997). CrossRefGoogle Scholar
  38. 38.
    Lutellier, T., Chollak, D., Garcia, J., Tan, L., Rayside, D., Medvidovic, N., Kroeger, R.: Comparing software architecture recovery techniques using accurate dependencies. In: IEEE International Conference on Software Engineering (ICSE), vol. 2, pp. 69–78. IEEE, ACM, USA, Canada (2015).
  39. 39.
    Lutellier, T., Chollak, D., Garcia, J., Tan, L., Rayside, D., Medvidovic, N., Kroeger, R.: Measuring the impact of code dependencies on software architecture recovery techniques. IEEE Trans. Softw. Eng. (2017). CrossRefGoogle Scholar
  40. 40.
    Lutellier, T.: Measuring the impact of code dependencies on software architecture recovery techniques. Ph.D. thesis, University of Waterloo (2015)Google Scholar
  41. 41.
    Mahmoud, A., Niu, N.: Evaluating software clustering algorithms in the context of program comprehension. In: International Conference on Program Comprehension (ICPC), pp. 162–171. IEEE, USA (2013).
  42. 42.
    Maqbool, O., Babri, H.: The weighted combined algorithm: a linkage algorithm for software clustering. In: Eighth European Conference on Software Maintenance and Reengineering, pp. 15–24. IEEE (2004).
  43. 43.
    Maqbool, O., Babri, H.: Hierarchical clustering for software architecture recovery. IEEE Trans. Softw. Eng. 33(11), 759–780 (2007). CrossRefGoogle Scholar
  44. 44.
    Mirzaei, A., Rahmati, M.: A novel hierarchical-clustering-combination scheme based on fuzzy-similarity relations. IEEE Trans. Fuzzy Syst. 18(1), 27–39 (2010). CrossRefGoogle Scholar
  45. 45.
    Mirzaei, A., Rahmati, M., Ahmadi, M.: A new method for hierarchical clustering combination. Intell. Data Anal. 12, 549–571 (2008)CrossRefGoogle Scholar
  46. 46.
    Mitchell, B.S., Mancoridis, S.: Comparing the decompositions produced by software clustering algorithms using similarity measurements. In: International Conference on Software Maintenance, pp. 744–753. IEEE (2001).
  47. 47.
    Mitchell, B.S., Mancoridis, S.: On the automatic modularization of software systems using the Bunch tool. IEEE Trans. Softw. Eng. 32(3), 193–208 (2006). CrossRefGoogle Scholar
  48. 48.
    Muhammad, S., Maqbool, O., Abbasi, A.Q.: Evaluating relationship categories for clustering object-oriented software systems. IET Softw. 6(3), 260 (2012). CrossRefGoogle Scholar
  49. 49.
    Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4), 354–359 (1983). CrossRefzbMATHGoogle Scholar
  50. 50.
    Naseem, R., Maqbool, O., Muhammad, S.: An improved similarity measure for binary features in software clustering. In: 2010 Second International Conference on Computational Intelligence, Modelling and Simulation, pp. 111–116. IEEE (2010).
  51. 51.
    Naseem, R., Maqbool, O., Muhammad, S.: Improved similarity measures for software clustering. In: European Conference on Software Maintenance and Reengineering (CSMR), pp. 45–54. IEEE, Pakistan (2011).
  52. 52.
    Naseem, R., Maqbool, O., Muhammad, S.: Cooperative clustering for software modularization. J. Syst. Softw. 86(8), 2045–2062 (2013). CrossRefGoogle Scholar
  53. 53.
    Naseem, R., Deris, M.B.M., Li, J., Shahzad, S.: Improved binary similarity measures for software modularization. Front. Inf. Technol. Electron. Eng. 18(8), 1–28 (2017)CrossRefGoogle Scholar
  54. 54.
    Patel, C., Hamou-Lhadj, A., Rilling, J.: Software clustering using dynamic analysis and static dependencies. In: 2009 13th European Conference on Software Maintenance and Reengineering, pp. 27–36. IEEE (2009).
  55. 55.
    Paulson, J., Succi, G., Eberlein, A.: An empirical study of open-source and closed-source software products. IEEE Trans. Softw. Eng. 30(4), 246–256 (2004). CrossRefGoogle Scholar
  56. 56.
    Podani, J.: Simulation of random dendrograms and comparison tests: some comments. J. Classif. 17(1), 123–142 (2000). MathSciNetCrossRefzbMATHGoogle Scholar
  57. 57.
    Praditwong, K., Harman, M., Yao, X.: Software module clustering as a multi-objective search problem. IEEE Trans. Softw. Eng. 37(2), 264–282 (2011). CrossRefGoogle Scholar
  58. 58.
    Pukelsheim, F.: The three sigma rule. Am. Stat. 48(2), 88–91 (1994). MathSciNetGoogle Scholar
  59. 59.
    Rashedi, E., Mirzaei, A.: A hierarchical clusterer ensemble method based on boosting theory. Knowl. Based Syst. 45, 83–93 (2013). CrossRefGoogle Scholar
  60. 60.
    Rashedi, E., Mirzaei, A., Rahmati, M.: An information theoretic approach to hierarchical clustering combination. Neurocomputing 148, 487–497 (2015). CrossRefGoogle Scholar
  61. 61.
    Saeed, M., Maqbool, O., Babri, H., Hassan, S., Sarwar, S.: Software clustering techniques and the use of combined algorithm. In: Seventh European Conference on Software Maintenance and Reengineering, pp. 301–306. IEEE Computer Society (2003).
  62. 62.
    Scanniello, G., Risi, M., Tortora, G.: Architecture recovery using latent semantic indexing and K-means: an empirical evaluation. In: 2010 8th IEEE International Conference on Software Engineering and Formal Methods, pp. 103–112. IEEE (2010).
  63. 63.
    Seriai, A., Sadou, S., Sahraoui, H.A.: Enactment of components extracted from an object-oriented application. In: The European Conference on Software Architecture (ECSA), pp. 234–249 (2014). CrossRefGoogle Scholar
  64. 64.
    Shah, Z., Naseem, R., Orgun, M., Mahmood, A.N., Shahzad, S.: Software clustering using automated feature subset selection. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) International Conference on Advanced Data Mining and Applications (ADMA). Lecture Notes in Computer Science, vol. 8347, pp. 47–58. Springer, Italy (2013). CrossRefGoogle Scholar
  65. 65.
    Shtern, M., Tzerpos, V.: On the comparability of software clustering algorithms. In: 2010 IEEE 18th International Conference on Program Comprehension, pp. 64–67. IEEE (2010).
  66. 66.
    Siddique, F., Maqbool, O.: Enhancing comprehensibility of software clustering results. IET Softw. 6(4), 283 (2012). CrossRefGoogle Scholar
  67. 67.
    Stavropoulou, I., Grigoriou, M., Kontogiannis, K.: Case study on which relations to use for clustering-based software architecture recovery. Empir. Softw. Eng. 2017, 1–46 (2017). CrossRefGoogle Scholar
  68. 68.
    Synytskyy, N., Holt, R.C., Davis, I.: Browsing software architectures with LSEdit. In: 13th International Workshop on Program Comprehension, pp. 176–178. IEEE (2005).
  69. 69.
    Tonella, P.: Concept analysis for module restructuring. IEEE Trans. Softw. Eng. 27(4), 351–363 (2001). CrossRefGoogle Scholar
  70. 70.
    Tzerpos, V., Holt, R.C.: ACDC: an algorithm for comprehension-driven clustering. In: Working Conference on Reverse Engineering, pp. 258–267. IEEE (2000).
  71. 71.
    Tzerpos, V., Holt, R.C.: MoJo: a distance metric for software clusterings. In: Working Conference on Reverse Engineering, pp. 187–193. IEEE (1999).
  72. 72.
    Tzerpos, V., Holt, R.C.: Software botryology. Automatic clustering of software systems. In: International Workshop on Database and Expert Systems Applications, pp. 811–818. IEEE (1998).
  73. 73.
    Tzerpos, V.: An optimal algorithm for MoJo distance. In: Proceedings of the 11th IEEE International Workshop on Program Comprehension, pp. 227–235. IEEE Computer Society (2003).
  74. 74.
    Vasconcelos, A., Werner, C.: Architecture recovery and evaluation aiming at. In: Software Architectures, Components, and Applications, pp. 72–89. Springer, Berlin (2007). Google Scholar
  75. 75.
    Wang, Y., Liu, P., Guo, H., Li, H., Chen, X.: Improved hierarchical clustering algorithm for software architecture recovery. In: 2010 International Conference on Intelligent Computing and Cognitive Informatics, pp. 247–250 (2010).
  76. 76.
    Wen, Z., Tzerpos, V.: An effectiveness measure for software clustering algorithms. In: Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004, pp. 194–203. IEEE (2004).
  77. 77.
    Wen, Z., Tzerpos, V.: Evaluating similarity measures for software decompositions. In: Proceedings of the 20th IEEE International Conference on Software Maintenance, pp. 368–377. IEEE (2004).
  78. 78.
    Wiggerts, T.: Using clustering algorithms in legacy systems remodularization. In: Working Conference on Reverse Engineering, pp. 33–43. IEEE (1997).
  79. 79.
    Wu, J., Hassan, A., Holt, R.C.: Comparison of clustering algorithms in the context of software evolution. In: 21st IEEE International Conference on Software Maintenance, pp. 525–535. IEEE (2005).
  80. 80.
    Xanthos, S., Goodwin, N.: Clustering object-oriented software systems using spectral graph partitioning. Urbana 51(1), 1–5 (2006)Google Scholar
  81. 81.
    Zheng, L.I., Li, T.A.O., Ding, C.: A framework for hierarchical ensemble clustering. ACM Trans. Knowl. Discov. Data 9(2), 1–23 (2014)CrossRefGoogle Scholar
  82. 82.
    Zhong, L., Xue, L., Zhang, N., Xia, J., Chen, J.: A tool to support software clustering using the software evolution information. In: 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp. 304–307. IEEE (2016).

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceCity University of Science and Information TechnologyPeshawarPakistan
  2. 2.Faculty of Computer Science and Information TechnologyUniversiti Tun Hussein Onn MalaysiaBatu PahatMalaysia
  3. 3.Department of Computer ScienceQuaid-I-Azam UniversityIslamabadPakistan
  4. 4.Department of Computer ScienceUniversity of PeshawarPeshawarPakistan

Personalised recommendations