Skip to main content

Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs

  • Chapter
  • First Online:
Data-Centric Business and Applications

Abstract

Background: Defining code smell is not a trivial task. Their recognition tends to be highly subjective. Nevertheless some code smells detection tools have been proposed. Other recent approaches incline towards machine learning (ML) techniques to overcome disadvantages of using automatic detection tools. Objectives: We aim to develop a research infrastructure and reproduce the process of code smell prediction proposed by Arcelli Fontana et al. We investigate ML algorithms performance for samples including major modern Java language features. Those such as lambdas can shorten the code causing code smell presence not as obvious to detect and thus pose a challenge to both existing code smell detection tools and ML algorithms. Method: We extend the study with dataset consisting of 281 Java projects. For driving samples selection we define metrics considering lambdas and method reference, derived using custom JavaParser-based solution. Tagged samples with new constructions are used as an input for the utilized detection techniques. Results: Detection rules derived from the best performing algorithms like J48 and JRip incorporate newly introduced metrics. Conclusions: Presence of certain new Java language constructs may hide Long Method code smell or indicate a God Class. On the other hand, their absence or low number can suggest a Data Class.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Datasetssources.tar.gz for full datasets.

  2. 2.

    https://www.kdnuggets.com/2015/06/top-20-r-machine-learning-packages.html, access: 2019-04-09.

  3. 3.

    https://cran.r-project.org/web/packages/RWeka/index.html, access: 2019-04-09.

  4. 4.

    http://topepo.github.io/caret/index.html, access: 2019-04-09.

  5. 5.

    http://madeyski.e-informatyka.pl/download/GrodzickaEtAl19DataSet.zip.

  6. 6.

    https://javaparser.org, access: 2019-06-10.

  7. 7.

    https://projectlombok.org, access: 2019-06-12.

References

  1. Arcelli Fontana F, Mäntylä MV, Zanoni M, Marino A (2016) Comparing and experimenting machine learning techniques for code smell detection. Empir Softw Eng 21(3):1143–1191

    Article  Google Scholar 

  2. Fontana FA, Mariani E, Mornioli A, Sormani R, Tonello A (2011) An experience report on using code smells detection tools. In: 2011 IEEE fourth international conference on software testing, verification and validation workshops, pp 450–457.https://doi.org/10.1109/ICSTW.2011.12

  3. Fowler M (1999) Refactoring: improving the design of existing code. Addison-Wesley, Boston, MA, USA

    Google Scholar 

  4. Grodzicka H, Ziobrowski A, Łakomiak Z, Kawa M, Madeyski L (2019) Appendix to the paper “Code smell prediction employing machine learning meets emerging Java language constructs”. http://madeyski.e-informatyka.pl/download/GrodzickaEtAl19.pdf

  5. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11(1):10–18. https://doi.org/10.1145/1656274.1656278

    Article  Google Scholar 

  6. Hornik K, Buchta C, Zeileis A (2009) Open-source machine learning: R meets Weka. Comput Stat 24(2):225–232. https://doi.org/10.1007/s00180-008-0119-7

    Article  MathSciNet  MATH  Google Scholar 

  7. Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification. Technical report, Department of computer science, National Taiwan University. http://www.csie.ntu.edu.tw/~cjlin/papers.html

  8. Madeyski L, Kitchenham B (2019) Reproducer: reproduce statistical analyses and meta-analyses. http://madeyski.e-informatyka.pl/reproducible-research/. R package version 0.3.0 (http://CRAN.R-project.org/package=reproducer)

  9. Palomba F (2015) Textual analysis for code smell detection. IEEE Int Conf Softw Eng 37(16):769–771

    Google Scholar 

  10. Palomba F, Bavota G, Penta MD, Oliveto R, Poshyvanyk D, Lucia AD (2015) Mining version histories for detecting code smells. IEEE Trans Softw Eng 41(5):462–489

    Article  Google Scholar 

  11. Palomba F, Nucci DD, Tufano M, Bavota G, Oliveto R, Poshyvanyk D, De Lucia A (2015) Landfill: an open dataset of code smells with public evaluation. In: Proceedings of the 12th working conference on mining software repositories, MSR ’15. IEEE Press, Piscataway, NJ, USA, pp 482–485

    Google Scholar 

  12. Sharma T (2017) Designite: a customizable tool for smell mining in c# repositories. SATToSE41

    Google Scholar 

  13. Tempero E, Anslow C, Dietrich J, Han T, Li, J, Lumpe M, Melton H, Noble J (2010) Qualitas corpus: a curated collection of java code for empirical studies. In: 2010 Asia pacific software engineering conference (APSEC2010), pp 336–345. https://doi.org/10.1109/APSEC.2010.46

  14. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco

    Google Scholar 

Download references

Acknowledgements

This work has been conducted as a part of research and development project POIR.01.01.01-00-0792/16 supported by the National Centre for Research and Development (NCBiR). We would like to thank Tomasz Lewowski, Tomasz Korzeniowski, Marek Skrajnowski and the entire team from code quest sp. z o.o. for tagging code smells and for all of the comments and feedback from the real-world software engineering environment.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lech Madeyski .

Editor information

Editors and Affiliations

Appendix: Reproduction classifier comparison

Appendix: Reproduction classifier comparison

Following tables present comparison of our reproduction results to Arcelli Fontana et al. [1] (Tables 17, 18, 19).

Table 17 RWeka results for Data Class (grey) compared with Arcelli Fontana’s (white)
Table 18 RWeka results for Feature Envy (grey) compared with Arcelli Fontana’s (white)
Table 19 RWeka results for God Class (grey) compared with Arcelli Fontana’s (white)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Grodzicka, H., Ziobrowski, A., Łakomiak, Z., Kawa, M., Madeyski, L. (2020). Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs. In: Poniszewska-Marańda, A., Kryvinska, N., Jarząbek, S., Madeyski, L. (eds) Data-Centric Business and Applications. Lecture Notes on Data Engineering and Communications Technologies, vol 40. Springer, Cham. https://doi.org/10.1007/978-3-030-34706-2_8

Download citation

Publish with us

Policies and ethics