Information-Theoretic Remodularization of Object-Oriented Software Systems

  • Amarjeet PrajapatiEmail author
  • Jitender Kumar Chhabra


Software remodularization consists in reorganizing software entities into modules such that pairs of entities belonging to the same modules are more similar than those belonging to different modules. In recent years, Search-Based Software Engineering (SBSE) approach has gained unprecedented growth for solving software remodularization problem. Most of the previous studies remodularize the software system by optimizing the structural coupling and cohesion metrics as objective functions. These metrics are defined in terms of the number of structural relationships counts, rather than taking patterns of relationships. It has been observed that the computation of coupling and cohesion based on patterns of relationships (i.e., information-theory based) are more accurate than the number of relationships. This paper proposes an information-theoretic software remodularization where an entropy-based similarity measure is introduced as an objective function along with other objective functions i.e., inter-module class change coupling, intra-module class change coupling, module size index (MSI), and module count index (MCI) and is further optimized using many-objective meta-heuristic algorithm. To evaluate the effectiveness of the proposed approach, seven object-oriented software systems have been remodularized using NSGA-III, MOEA/D, IBEA, and TAA algorithms. The results are compared with existing multi-objective formulation of remodularization problem in terms of authoritative software remodularization, non-extreme distribution, and stability. The experimentation results suggest that the proposed approach can be a good alternative to improve the quality of software systems. The findings suggest that the approach is more suitable for generating remodularization solution good from both quality metrics and developers perspective.


Search- based software engineering Software remodularization Software entropy Information theoretic technique 



  1. Abdeen, H., Ducasse, S., Sahraoui, H.A., Alloui, I. (2009). Automatic package coupling and cycle minimization, in: Proceedings of the 16th working conference on re- verse engineering, 103–112.Google Scholar
  2. Aldana-Bobadilla, E., & Kuri-Morales, A. (2011). A methodology to find clusters in the data based on Shannon's entropy and genetic algorithms. In Proceedings of the 10th WSEAS international conference on communications, electrical & computer engineering, world scientific and engineering academy and society (WSEAS) (pp. 272–280). Wisconsin, USA: Stevens Point.Google Scholar
  3. Andritsos, P., & Tzerpos, V. (2005). Information-theoretic software clustering. IEEE Transaction on software engineering., 31(2), 150–165.CrossRefGoogle Scholar
  4. Anquetil, N., Lethbridge, T. (1999). Experiments with clustering as a software remodularization method. In Proceedings of 6th Working Conference on Reverse Engineering, Atlanta, GA, USA 235-255.Google Scholar
  5. Arcuri, A., Briand, L. (2011). A practical guide for using statistical tests to assess randomized algorithms in software engineering, 2011 33rd International Conference on Software Engineering (ICSE), Honolulu, HI, 1–10.Google Scholar
  6. Barros, M. (2012). An analysis of the effects of composite objectives in multi-objective software module clustering. in: Proceedings of the fourteenth international conference on Genetic and evolutionary computation, 1205-1212.Google Scholar
  7. Bavota, G., Lucia, A. D., Marcus, A., & Oliveto, R. (2010). Software re-modularization based on structural and semantic metrics. In Proceedings of WCRE, 2010, 195–204.Google Scholar
  8. Bavota, G., Lucia, A. D., Marcus, A., & Oliveto, R. (2013). Using structural and semantic measures to improve software modularization. Empirical Software Engineering, 18, 901–932.CrossRefGoogle Scholar
  9. Bavota, G., Dit, B., Oliveto, R., Penta, M. D., Poshyvanyk, D., Lucia, A.D. (2013a). An empirical study on the developers' perception of software coupling, 2013 35th International Conference on Software Engineering (ICSE), San Francisco, CA, 692–701.Google Scholar
  10. Bavota, G., Gethers, M., Oliveto, R., Poshyvanyk, D., & Lucia, A. D. (2014). Improving software modularization via automated analysis of latent topics and dependencies. ACM Transaction on Software Engineering and Methodology, 4(1), 1–33.CrossRefGoogle Scholar
  11. Bingdong, L., Jinlong, L., Tang, K., & Xin, Y. (2015). Many-objective evolutionary algorithms: A survey. ACM Computing Survey, 48(1), 1–37.Google Scholar
  12. Bittencourt, R. A., & Guerrero, D. D. S. (2009). Comparison of graph clustering algorithms for recovering software architecture module views (pp. 251–254). In: Proceedings of the European Conference on Software Maintenance and Reengineering, IEEE CS Press.Google Scholar
  13. Corazza, A., Martino, S. D., Maggio, V., & Scanniello, G. (2016). Weighing lexical information for software clustering in the context of architecture recovery. Empirical Software Engineering, 21(1), 72–103.CrossRefGoogle Scholar
  14. Corne, D.W., Jerram, N.R., Knowles, J.D., Oates, M. J. (2001). PESA-II: Region-based selection in evolutionary multiobjective optimization. In Proc. 3rd Annual Conference on Genetic Evolutionary Computation. 283–290.Google Scholar
  15. Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. Wiley and Sons, 1991.Google Scholar
  16. Cui, J. F., & Chae, H. S. (2011). Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems. Information and Software Technology (IST), 53(6), 601–614.CrossRefGoogle Scholar
  17. Deb, K., & Jain, H. (2014). An evolutionary many-objective optimization algorithm using reference-point based non-dominated sorting approach, part I: Solving problems with box constraints. IEEE Transaction on Evolutionary Computing, 18(4), 577–599.CrossRefGoogle Scholar
  18. Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist multi-objective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computing, 6(2), 182–197.CrossRefGoogle Scholar
  19. Doval, D., Mancoridis, S., & Mitchell, B. S. (1999). Automatic clustering of software systems using a genetic algorithm (pp. 73–81). In: Proceedings of IEEE conference on software technology and engineering practice.Google Scholar
  20. Erdemir, U., & Buzluca, F. (2014). A learning-based module extraction method for object-oriented systems. Journal of Systems and Software, 97, 156–177.CrossRefGoogle Scholar
  21. Fowler, M., Beck, K., Brant, J., Opdyke, Q., & Roberts, D. (1999). Refactoring – Improving the Design of Existing Code (1st ed.). Addison-Wesley.Google Scholar
  22. Glorie, M., Zaidman, A., Deursen, A., & Hofland, L. (2009). Splitting a large software repository for easing future software evolution-an industrial experience report. Journal of Software Maintenance and Evolution: Research and Practice, 21(2), 113–141.CrossRefGoogle Scholar
  23. Gokcay, E., & Principe, J. C. (2002). Information theoretic clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 158–171.CrossRefGoogle Scholar
  24. Harman, M., Hierons, R., & Proctor, M. (2002). A new representation and crossover operator for search-based optimization of software modularization (pp. 1351–1358). In: Proc. genetic and evolutionary computation conference.Google Scholar
  25. Harman, M., Mansouri, S. A., & Zhang, Y. (2012). Search-based software engineering: Trends, techniques and applications. ACM Computing Survey, 45(1), 1–61.CrossRefGoogle Scholar
  26. Hino, H., & Murata, N. (2014). A nonparametric clustering algorithm with a quintile-based likelihood estimator. Neural Computing., 26, 2074–2101.CrossRefGoogle Scholar
  27. Jaimes, A.L., Coello Coello, C.A., Barrientos, J.E.U. (2009). Online objective reduction to Deal with many-objective problems. In the 5th international conference on Evolutionary Multicriterion Optimization. 423–437.Google Scholar
  28. Jain, H., & Deb, K. (2014). An evolutionary many-objective optimization algorithm using reference-point based nondominated sorting approach, part II: Handling constraints and extending to an adaptive approach. IEEE Transactions on Evolutionary Computation, 18(4), 602–622.CrossRefGoogle Scholar
  29. Jinhuang, H., & Jing, L. (2016). A similarity-based modularization quality measure for software module clustering problems. Information Sciences, 342(10), 96–110.Google Scholar
  30. Joyce, J. (2008). Bayes theorem. The Stanford encyclopedia of philosophy, fall 2008 edition, Eds: Zalta, Edward N.Google Scholar
  31. Kumari, A. C., Srinivas, K., Gupta, M. P. (2013). Software module clustering using a hyper-heuristic based multi-objective genetic algorithm. 2013 IEEE 3rd international advance computing conference (IACC), Ghaziabad, 813-818.Google Scholar
  32. Mahdavi, K., Harman, M., & Hierons, R. M. (2003). A multiple hill climbing approach to software module clustering (pp. 315–324). In: Proceedings of the international conference on software maintenance.Google Scholar
  33. Mamaghani, A. S., & Meybodi, M. R. (2009). Clustering of software systems using new hybrid algorithms (pp. 20–25). In: Proceedings of the ninth IEEE international conference on computer and information technology.Google Scholar
  34. Mancoridis, S., Mitchell, B. S., Chen, Y. F., Rorres, C., & Gansner, E. R. (1998). Using automatic clustering to produce high-level system organizations of source code (pp. 45–53). In: Proc. int’l workshop program comprehension.Google Scholar
  35. Mancoridis, S., Mitchell, B. S., Chen, Y. F., & Gansner, E. R. (1999). Bunch: A clustering tool for the recovery and maintenance of software system structures. Proc. IEEE Int’l Conf. Software Maintenance, 50–59.Google Scholar
  36. Mitchell, B. S., & Mancoridis, S. (2002). Using heuristic search techniques to extract design abstractions from source code. Proc. Genetic and Evolutionary Computation Conf., 1375–1382.Google Scholar
  37. Mkaouer, M. W., Kessentini, M., & Bechikh, S. (2015). On the use of many quality attributes for software refactoring: A many-objective search-based software engineering approach. Empirical Software Engineering, 21(6), 2503–2545.CrossRefGoogle Scholar
  38. Mkaouer, M. W., Kessentini, M., Shaout, A., Koligheu, P., Bechikh, S., Deb, K., & Ouni, A. (2015a). Many objective software remodularization using NSGA-III. ACM Transaction on software engineering and methodology, 24(3), 1–17.CrossRefGoogle Scholar
  39. Ouni, A., Kessentini, M., Sahraoui, H., & Boukadoum, M. (2013). Maintainability defects detection and correction: A multi-objective approach. Journal of Automated Software Engineering (ASE), 20(1), 47–79.CrossRefGoogle Scholar
  40. Ouni, A., Kessentini, M., & Sahraoui, H. (2014). Multiobjective optimization for software refactoring and evolution. Advances in Computers, 94, 103–167.CrossRefGoogle Scholar
  41. Ouni, A., Kessentini, M., Sahraoui, H., Inoue, K., & Hamdi, M. S. (2015). Improving multi-objective code-smells correction using development history. Journal of Systems and Software, 105, 18–39.CrossRefGoogle Scholar
  42. Ouni, A., Kula, R. G., Kessentini, M., Ishio, T., Germán, D. M., & Inoue, K. (2016). Search-based software library recommendation using multi-objective optimization. Journal of Information and Software Technology, Elsevier, 83, 2016.Google Scholar
  43. Ouni, A., Kessentini, M., Sahraoui, H., Cinneide, M.O., Deb, K., Inoue, K. (2016a). MORE: A multi-objective refactoring recommendation approach to introducing design patterns and fixing code smells. Journal of Software: Evolution and Process (JSEP), John Wiley & Sons.Google Scholar
  44. Ouni, A., Kessentini, M., Sahraoui, H., Inoue, K., & Deb, K. (2016b). Multi-criteria code refactoring using search-based software engineering: An industrial case study. ACM Transactions on Software Engineering and Methodology, 25(3), 1–23.CrossRefGoogle Scholar
  45. Ouni, A., Kessentini, M., Inoue, K., & Cinnéide, M. (2017). Search-based web service anti patterns detection. IEEE Transactions on Services Computing, 10(4), 603–617.CrossRefGoogle Scholar
  46. Parashar, A., & Chhabra, J. K. (2016). Mining software change data stream to predict changeability of classes of object-oriented software system. Evolving Systems, 7(2), 117–128.CrossRefGoogle Scholar
  47. Praditwong, K., Yao, X. (2006). A new multi-objective evolutionary optimization algorithm: The two-archive algorithm. In: Cheung Y-M, Wang Y, Liu H (eds) Proceedings of the international conference computational intelligence and security, vol 1, 286–291.Google Scholar
  48. Praditwong, K., Harman, M., & Yao, X. (2011). Software module clustering as a multi-objective search problem. IEEE Transaction on Software Engineering, 37(2), 264–282.CrossRefGoogle Scholar
  49. Prajapati, A., & Chhabra, J. K. (2014). An empirical study of the sensitivity of quality indicator for software module clustering. In 2014 Seventh International Conference on Contemporary Computing (IC3), Noida, (2014) (pp. 206–211).Google Scholar
  50. Prajapati, A., & Chhabra, J. K. (2017). Improving package structure of object-oriented software using multi-objective optimization and weighted class connections. Journal of King Saud University - Computer and Information Sciences, 29(3), 349–364.CrossRefGoogle Scholar
  51. Prajapati, A., & Chhabra, J. K. (2017a). Improving modular structure of software system using structural and lexical dependency. Information and Software Technology, 82, 96–120.CrossRefGoogle Scholar
  52. Prajapati, A., & Chhabra, J. K. (2017b). Harmony search based remodularization for object-oriented software systems. Computer Languages, Systems & Structures, 47, 153–169.CrossRefGoogle Scholar
  53. Prajapati, A., & Chhabra, J. K. (2018). Many-objective artificial bee colony algorithm for large-scale software module clustering problem, soft computing., 22(19), 6341–6361.Google Scholar
  54. Prajapati, A., & Chhabra, J. K. (2018a). FP-ABC: Fuzzy-Pareto dominance driven artificial bee colony algorithm for many-objective software module clustering. Computer Languages, Systems & Structures, 51, 1–21.CrossRefGoogle Scholar
  55. Rachmawati, L., & Srinivasan, D. (2009). Multiobjective evolutionary algorithm with controllable focus on the knees of the Pareto front. IEEE Transaction on Evolutionary Computation., 13(4), 810–824.CrossRefGoogle Scholar
  56. Sartipi, K., & Kontogiannis, K. (2003). On modeling software architecture recovery as graph matching. In International Conference on Software Maintenance, ICSM 2003. Proceedings., Amsterdam, the Netherlands (pp. 224–234).CrossRefGoogle Scholar
  57. Sugiyama, M., Niu, G., Yamada, M., Kimura, M., & Hachiya, H. (2014). Information-maximization clustering based on squared-loss mutual information. Neural Computing, 26, 84–131.CrossRefGoogle Scholar
  58. Tzerpos, V., & Holt, R. C. (1999). MoJo: A distance metric for software clustering. In Proceedings of the 6th working conference on reverse engineering (pp. 187–193). GA, USA, October: Atlanta.Google Scholar
  59. Wang, Y., Liu, P., Guo, H., Li, H., & Chen, X. (2010). Improved hierarchical clustering algorithm for software architecture recovery. In International conference on intelligent computing and cognitive informatics (pp. 247–250).CrossRefGoogle Scholar
  60. Wang, H., Jiao, L., & Yao, X. (2015). Two_Arch2: An improved Two-archive algorithm for many-objective optimization. IEEE Transactions on Evolutionary Computation, 19(4), 524–541.CrossRefGoogle Scholar
  61. Wohlin, C., Runeson, P., Host, M., Ohlsson, M. C., Regnell, B., & Wesslen, A. (2000). Experimentation in software engineering: An introduction. Kluwer Academic Publishers.Google Scholar
  62. Wu, J., Hassan, A. E., & Holt, R. C. (2005). Comparison of clustering algorithms in the context of software evolution. In In: Proceedings of the 21st IEEE International Conference on Software Maintenance (pp. 525–535).Google Scholar
  63. Yates, R.B., Neto, B.R. (1999). Modern information retrieval. Addison-Wesley-Longman.Google Scholar
  64. Zhang, Q., & Li, H. (2007). MOEA/D: A multi-objective evolutionary algorithm based on decomposition. IEEE Transaction on Evolutionary Computing, 11(6), 712–731.CrossRefGoogle Scholar
  65. Zitzler, E., & Kunzli, S. (2004). Indicator-based selection in multi-objective search. In In Proceedings of the 8 th International Conference on Parallel Problem Solving from Nature. Springer (pp. 832–842).Google Scholar
  66. Zitzler, E., Laumanns, M., & Thiele, L. (2002). SPEA2: Improving the strength Pareto evolutionary algorithm. In Proc. Evolutionary. Methods Design Optimization. Control Application, 95–100.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Engineering & ITJIIT NoidaNoidaIndia
  2. 2.Department of Computer EngineeringNIT KurukshetraKurukshetraIndia

Personalised recommendations