Skip to main content
Log in

Automating extract class refactoring: an improved method and its evaluation

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

During software evolution the internal structure of the system undergoes continuous modifications. These continuous changes push away the source code from its original design, often reducing its quality, including class cohesion. In this paper we propose a method for automating the Extract Class refactoring. The proposed approach analyzes (structural and semantic) relationships between the methods in a class to identify chains of strongly related methods. The identified method chains are used to define new classes with higher cohesion than the original class, while preserving the overall coupling between the new classes and the classes interacting with the original class. The proposed approach has been first assessed in an artificial scenario in order to calibrate the parameters of the approach. The data was also used to compare the new approach with previous work. Then it has been empirically evaluated on real Blobs from existing open source systems in order to assess how good and useful the proposed refactoring solutions are considered by software engineers and how well the proposed refactorings approximate refactorings done by the original developers. We found that the new approach outperforms a previously proposed approach and that developers find the proposed solutions useful in guiding refactorings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. These approaches do not provide support to extract class refactoring.

  2. http://www.eclipse.org/.

  3. http://www.jetbrains.com/idea/features/refactoring.html.

  4. If a private field needs to be shared by two or more of the extracted classes, the implementation of the needed getter and/or setter methods is left to the developer.

  5. It is worth noting that while the general experimental design is the same, the Max Flow-Min Cut approach (Bavota et al. 2011) was evaluated on artificial Blobs created merging only two classes, as it is only able to split a Blob in two classes.

  6. The complete results achieved with all possible combinations of parameters can be found in Bavota et al. (2012).

  7. The interested reader can find the interaction plots for all systems in our online appendix (Bavota et al. 2012).

  8. All students voluntarily took part to the study.

  9. In the number of methods we do not count the constructors (for both pre- and post-refactoring) and any getters and setters methods that would be added after the refactoring. In this way the sum of methods of the extracted classes is equals to the number of methods of the Blob class.

  10. The interested reader can find the results by the Max Flow-Min Cut approach for each Blob in our online appendix (Bavota et al. 2012).

  11. A fine grained analysis of the scores assigned by the students is reported in our online Appendix (Bavota et al. 2012).

  12. To avoid bias in the experiment none of the authors have been involved in this evaluation.

  13. None of the 50 students involved in the user study reported in Section 5 has been involved in this experiment.

  14. This data was provided by the subjects when sending their results to us.

References

  • Abadi A, Ettinger R, Feldman YA (2009) Fine slicing for advanced method extraction. In: 3rd workshop on refactoring tools

  • Abdeen H, Ducasse S, Sahraoui HA, Alloui I (2009) Automatic package coupling and cycle minimization. In: Proceedings of the 16th working conference on reverse engineering. IEEE CS Press, Lille, pp 103–112

    Google Scholar 

  • Anquetil N, Fourrier C, Lethbridge TC (1999) Experiments with clustering as a software remodularization method. In: Proceedings of the 6th working conference on reverse engineering. IEEE CS Press, Atlanta, GA, pp 235–255

    Google Scholar 

  • Arisholm E, Sjoberg D (2004) Evaluating the effect of a delegated versus centralized control style on the maintainability of object-oriented software. IEEE Trans Softw Eng 30(8):521–534

    Article  Google Scholar 

  • Atkinson DC, King T (2005) Lightweight detection of program refactorings. In: Proceedings of the 12th Asia-Pacific software engineering conference. IEEE CS Press, Taipei, pp 663–670

    Google Scholar 

  • Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley

  • Basili VR, Briand L, Melo WL (1995) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761

    Article  Google Scholar 

  • Bavota G, De Lucia A, Oliveto R (2011) Identifying extract class refactoring opportunities using structural and semantic cohesion measures. J Syst Softw 84:397–414

    Article  Google Scholar 

  • Bavota G, Lucia AD, Marcus A, Oliveto R (2010) A two-step technique for extract class refactoring. In: Proceedings of 25th IEEE international conference on automated software engineering, pp 151–154

  • Bavota G, Lucia AD, Marcus A, Oliveto R (2012) Automating extract class refactoring: an improved approach and its evaluation. Online appendix https://dl.dropbox.com/u/20652688/emseappendix.zip

  • Binkley AB, Schach SR (1998) Validation of the coupling dependency metric as a predictor of run-time failures and maintenance measures. In: Proceedings of the 20th international conference on software engineering. Kyoto, Japan, pp 452–455

    Chapter  Google Scholar 

  • Bodhuin T, Canfora G, Troiano L (2007) SORMASA: a tool for suggesting model refactoring actions by metrics-led genetic algorithm. In: Proceedings of 1st workshop on refactoring tools. Berlin, Germany, pp 23–24

  • Briand LC, Wuest J, Lounis H (1999a) Using coupling measurement for impact analysis in object-oriented systems. In: Proceedings of the 15th IEEE international conference on software maintenance. IEEE Press, Oxford, pp 475–482

    Google Scholar 

  • Briand LC, Wüst J, Ikonomovski SV, Lounis H (1999b) Investigating quality factors in object-oriented designs: an industrial case study. In: Proceedings of the 21st international conference on software engineering. ACM Press, Los Angeles, CA, pp 345–354

    Chapter  Google Scholar 

  • Brown WJ, Malveau RC, Brown WH, McCormick III HW, Mowbray TJ (1998) Anti patterns: refactoring software, architectures, and projects in crisis, 1st edn. John Wiley and Sons

  • Canfora G, Cimitile A, De Lucia A, Di Lucca GA (2001) Decomposing legacy systems into objects: an eclectic approach. Inf Softw Technol 43(6):401–412

    Article  Google Scholar 

  • Casais E (1992) An incremental class reorganization approach. In: Proceedings of the 6th European conference on object-oriented programming. Utrecht, the Netherlands, pp 114–132

    Chapter  Google Scholar 

  • Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493

    Article  Google Scholar 

  • Christl A, Koschke R, Storey MA (2007) Automated clustering to support the reflexion method. Inf Softw Technol 49(3):255–274

    Article  Google Scholar 

  • Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Earlbaum Associates

  • Conover WJ (1998) Practical nonparametric statistics, 3rd edn. Wiley

  • Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms, 2nd edn, chap 26 (maximum flow). MIT Press and McGraw-Hill

  • Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407

    Article  Google Scholar 

  • van Deursen A, Kuipers T (1999) Identifying objects using cluster and concept analysis. In: Proceedings of the 21st international conference on software engineering. ACM Press, Los Angeles, CA, pp 246–255

    Chapter  Google Scholar 

  • Du Bois B, Demeyer S, Verelst J (2004) Refactoring—improving coupling and cohesion of existing code. In: Proceedings of 11th working conference on reverse engineering. IEEE CS Press, Delft, pp 144–151

    Google Scholar 

  • Fokaefs M, Tsantalis N, Chatzigeorgiou A, Sander J (2009) Decomposing object-oriented class modules using an agglomerative clustering technique. In: Proceedings of the 25th international conference on software maintenance. Edmonton, Canada, pp 93–101

    Google Scholar 

  • Fowler M (1999) Refactoring: improving the design of existing code. Addison-Wesley

  • Girard JF, Koschke R (2000) A comparison of abstract data types and objects recovery techniques. Sci Comput Program 36(2–3):149–181

    Article  Google Scholar 

  • Gui G, Scott PD (2006) Coupling and cohesion measures for evaluation of component reusability. In: Proceedings of the 5th international workshop on mining software repositories. ACM Press, Shanghai, pp 18–21

    Google Scholar 

  • Gyimóthy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31(10):897–910

    Article  Google Scholar 

  • Joshi P, Joshi RK (2009) Concept analysis for class cohesion. In: Proceedings of the 13th European conference on software maintenance and reengineering. Kaiserslautern, Germany, pp 237–240

    Google Scholar 

  • Khomh F, Vaucher S, Guéhéneuc YG, Sahraoui H (2009) A bayesian approach for the detection of code and design smells. In: Proceedings of the 9th international conference on quality software. IEEE CS Press, Hong Kong, pp 305–314

    Google Scholar 

  • Khomh F, Vaucher S, Guéhéneuc YG, Sahraoui H (2009) A bayesian approach for the detection of code and design smells. In: Proceedings of the 2009 ninth international conference on quality software. IEEE Computer Society, Washington, DC, pp 305–314

    Chapter  Google Scholar 

  • Koschke R, Canfora G, Czeranski J (2006) Revisiting the delta ic approach to component recovery. Sci Comput Program 60(2):171–188

    Article  MathSciNet  MATH  Google Scholar 

  • Kuhn A, Ducasse S, Gîrba T (2007) Semantic clustering: identifying topics in source code. Inf Softw Technol 49(3):230–243

    Article  Google Scholar 

  • Lee Y, Liang B, Wu S, Wang F (1995) Measuring the coupling and cohesion of an object-oriented program based on information flow. In: Proceedings of the international conference on software quality. Maribor, Slovenia, pp 81–90

    Google Scholar 

  • Li W, Henry S (1993) Maintenance metrics for the object oriented paradigm. In: Proceedings of the first international software metrics symposium, pp 52–60

  • Liu Y, Poshyvanyk D, Ferenc R, Gyimóthy T, Chrisochoides N (2009) Modelling class cohesion as mixtures of latent topics. In: Proceedings of the 25th IEEE international conference on software maintenance. IEEE Press, Edmonton, pp 233–242

    Google Scholar 

  • Maletic JI, Marcus A (2001) Supporting program comprehension using semantic and structural information. In: Proceedings of the 23rd international conference on software engineering. IEEE CS Press, Toronto, ON, pp 103–112

    Google Scholar 

  • Marcus A, Poshyvanyk D, Ferenc R (2008) Using the conceptual cohesion of classes for fault prediction in object-oriented systems. IEEE Trans Softw Eng 34(2):287–300

    Article  Google Scholar 

  • Marinescu R (2004) Detection strategies: metrics-based rules for detecting design flaws. In: Proceedings of the 20th IEEE international conference on software maintenance. IEEE Computer Society, Washington, DC, pp 350–359

    Google Scholar 

  • Maruyama K, Shima K (1999) Automatic method refactoring using weighted dependence graphs. In: Proceedings of 21st international conference on software engineering. ACM Press, Los Alamitos, CA, pp 236–245

    Chapter  Google Scholar 

  • Mens T, Tourwe T (2004) A survey of software refactoring. IEEE Trans Softw Eng 30(2):126–139

    Article  Google Scholar 

  • Moha N, Gueheneuc YG, Duchien L, Le Meur AF (2010) Decor: a method for the specification and detection of code and design smells. IEEE Trans Softw Eng 36(1):20–36

    Article  Google Scholar 

  • Moore I (1996) Automatic inheritance hierarchy restructuring and method refactoring. In: Proceedings of 11th ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications. ACM Press, San Jose, CA, pp 235–250

    Google Scholar 

  • O’Keeffe M, O’Cinneide M (2006) Search-based software maintenance. In: Proceedings of 10th European conference on software maintenance and reengineering. IEEE CS Press, Bari, pp 249–260

    Google Scholar 

  • Olbrich S, Cruzes DS, Basili, V, Zazworka N (2009) The evolution and impact of code smells: a case study of two open source systems. In: Proceedings of the 2009 3rd international symposium on empirical software engineering and measurement, ESEM ’09, pp 390–400

  • Oliveto R, Gethers M, Bavota G, Poshyvanyk D, Lucia A (2011) Identifying method friendships to remove the feature envy bad smell (nier track). In: 33rd IEEE/ACM international conference on software engineering—NIER Track. ACM Press, Hawaii, USA, pp 820–823

    Google Scholar 

  • Oppenheim AN (1992) Questionnaire design, interviewing and attitude measurement. Pinter Publishers

  • Poshyvanyk D, Marcus A, Ferenc R, Gyimóthy T (2009) Using information retrieval based coupling measures for impact analysis. Empir Software Eng 14(1):5–32

    Article  Google Scholar 

  • Praditwong K, Harman M, Yao X (2011) Software module clustering as a multi-objective search problem. IEEE Trans Softw Eng 37(2):264–282

    Article  Google Scholar 

  • Prete K, Rachatasumrit N, Sudan N, Kim M (2010) Template-based reconstruction of complex refactorings. In: 26th IEEE international conference on software maintenance (ICSM 2010). IEEE Computer Society, Timisoara, 12–18 September 2010, pp 1–10

  • Sartipi K, Kontogiannis K (2001) Component clustering based on maximal association. In: Proceedings of the 8th working conference on reverse engineering. Stuttgart, Germany, pp 103–114

    Chapter  Google Scholar 

  • Seng O, Bauer M, Biehl M, Pache G (2005) Search-based improvement of subsystem decompositions. In: Proceedings of the genetic and evolutionary computation conference. ACM Press, Washington, DC, pp 1045–1051

    Google Scholar 

  • Seng O, Stammel J, Burkhart D (2006) Search-based determination of refactorings for improving the class structure of object-oriented systems. In: Proceedings of the genetic and evolutionary computation conference. Seattle, Washington, USA, pp 1909–1916

  • Simon F, Steinbr F, Lewerentz C (2001) Metrics based refactoring. In: Proceedings of the 5th European conference on software maintenance and reengineering. IEEE CS Press, Lisbon, pp 30–38

    Chapter  Google Scholar 

  • Stevens W, Myers G, Constantine L (1974) Structured design. IBM Syst J 13(2):115–139

    Article  Google Scholar 

  • Stewart KJ, Darcy DP, Daniel SL (2006) Opportunities and challenges applying functional data analysis to the study of open source software evolution. Stat Sci 21(2):167–178

    Article  MathSciNet  MATH  Google Scholar 

  • Tahvildari L, Kontogiannis K (2003) A metric-based approach to enhance design quality through meta-pattern transformation. In: Proceedings of the 7st European conference on software maintenance and reengineering. Benevento, Italy, pp 183–192

    Google Scholar 

  • Tonella P (2001) Concept analysis for module restructuring. IEEE Trans Softw Eng 27(4):351–363

    Article  Google Scholar 

  • Trifu A, Marinescu R (2005) Diagnosing design problems in object oriented systems. In: Proceedings of the 12th working conference on reverse engineering. IEEE Press, Pittsburgh, PA, pp 155–164

    Chapter  Google Scholar 

  • Tsantalis N, Chatzigeorgiou A (2009) Identification ofmove method refactoring opportunities. IEEE Trans Softw Eng 35(3):347–367

    Article  Google Scholar 

  • Wen Z, Tzerpos V (2004) An effectiveness measure for software clustering algorithms. In: Proceedings of the 12th IEEE international workshop on program comprehension, IWPC ’04. IEEE Computer Society, pp 194–203

  • Wiggerts TA (1997) Using clustering algorithms in legacy systems remodularization. In: Proceedings of the 4th working conference on reverse engineering. IEEE CS Press, Amsterdam, pp 33–43

    Chapter  Google Scholar 

  • WRT (2011) 2011 International Workshop on Refactoring Tools. http://refactoring.info/WRT11. Accessed 22 April 2013

Download references

Acknowledgements

We would like to thank all the students who participated to our studies. We would also like to thank anonymous reviewers for their careful reading of our manuscript and high-quality feedback. Their detailed comments have helped us to substantially revise, extend, and improve the original version of this paper. Andrian Marcus was supported in part by grants from the US National Science Foundation (CCF-0845706 and CCF-1017263).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rocco Oliveto.

Additional information

Communicated by: Arie van Deursen

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bavota, G., De Lucia, A., Marcus, A. et al. Automating extract class refactoring: an improved method and its evaluation. Empir Software Eng 19, 1617–1664 (2014). https://doi.org/10.1007/s10664-013-9256-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-013-9256-x

Keywords

Navigation