Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs

Grodzicka, Hanna; Ziobrowski, Arkadiusz; Łakomiak, Zofia; Kawa, Michał; Madeyski, Lech

doi:10.1007/978-3-030-34706-2_8

Hanna Grodzicka⁶,
Arkadiusz Ziobrowski⁶,
Zofia Łakomiak⁶,
Michał Kawa⁶ &
…
Lech Madeyski⁶

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 40))

884 Accesses
6 Citations
1 Altmetric

Abstract

Background: Defining code smell is not a trivial task. Their recognition tends to be highly subjective. Nevertheless some code smells detection tools have been proposed. Other recent approaches incline towards machine learning (ML) techniques to overcome disadvantages of using automatic detection tools. Objectives: We aim to develop a research infrastructure and reproduce the process of code smell prediction proposed by Arcelli Fontana et al. We investigate ML algorithms performance for samples including major modern Java language features. Those such as lambdas can shorten the code causing code smell presence not as obvious to detect and thus pose a challenge to both existing code smell detection tools and ML algorithms. Method: We extend the study with dataset consisting of 281 Java projects. For driving samples selection we define metrics considering lambdas and method reference, derived using custom JavaParser-based solution. Tagged samples with new constructions are used as an input for the utilized detection techniques. Results: Detection rules derived from the best performing algorithms like J48 and JRip incorporate newly introduced metrics. Conclusions: Presence of certain new Java language constructs may hide Long Method code smell or indicate a God Class. On the other hand, their absence or low number can suggest a Data Class.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Datasets—sources.tar.gz for full datasets.
2.
https://www.kdnuggets.com/2015/06/top-20-r-machine-learning-packages.html, access: 2019-04-09.
3.
https://cran.r-project.org/web/packages/RWeka/index.html, access: 2019-04-09.
4.
http://topepo.github.io/caret/index.html, access: 2019-04-09.
5.
http://madeyski.e-informatyka.pl/download/GrodzickaEtAl19DataSet.zip.
6.
https://javaparser.org, access: 2019-06-10.
7.
https://projectlombok.org, access: 2019-06-12.

References

Arcelli Fontana F, Mäntylä MV, Zanoni M, Marino A (2016) Comparing and experimenting machine learning techniques for code smell detection. Empir Softw Eng 21(3):1143–1191
Article Google Scholar
Fontana FA, Mariani E, Mornioli A, Sormani R, Tonello A (2011) An experience report on using code smells detection tools. In: 2011 IEEE fourth international conference on software testing, verification and validation workshops, pp 450–457.https://doi.org/10.1109/ICSTW.2011.12
Fowler M (1999) Refactoring: improving the design of existing code. Addison-Wesley, Boston, MA, USA
Google Scholar
Grodzicka H, Ziobrowski A, Łakomiak Z, Kawa M, Madeyski L (2019) Appendix to the paper “Code smell prediction employing machine learning meets emerging Java language constructs”. http://madeyski.e-informatyka.pl/download/GrodzickaEtAl19.pdf
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11(1):10–18. https://doi.org/10.1145/1656274.1656278
Article Google Scholar
Hornik K, Buchta C, Zeileis A (2009) Open-source machine learning: R meets Weka. Comput Stat 24(2):225–232. https://doi.org/10.1007/s00180-008-0119-7
Article MathSciNet MATH Google Scholar
Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification. Technical report, Department of computer science, National Taiwan University. http://www.csie.ntu.edu.tw/~cjlin/papers.html
Madeyski L, Kitchenham B (2019) Reproducer: reproduce statistical analyses and meta-analyses. http://madeyski.e-informatyka.pl/reproducible-research/. R package version 0.3.0 (http://CRAN.R-project.org/package=reproducer)
Palomba F (2015) Textual analysis for code smell detection. IEEE Int Conf Softw Eng 37(16):769–771
Google Scholar
Palomba F, Bavota G, Penta MD, Oliveto R, Poshyvanyk D, Lucia AD (2015) Mining version histories for detecting code smells. IEEE Trans Softw Eng 41(5):462–489
Article Google Scholar
Palomba F, Nucci DD, Tufano M, Bavota G, Oliveto R, Poshyvanyk D, De Lucia A (2015) Landfill: an open dataset of code smells with public evaluation. In: Proceedings of the 12th working conference on mining software repositories, MSR ’15. IEEE Press, Piscataway, NJ, USA, pp 482–485
Google Scholar
Sharma T (2017) Designite: a customizable tool for smell mining in c# repositories. SATToSE41
Google Scholar
Tempero E, Anslow C, Dietrich J, Han T, Li, J, Lumpe M, Melton H, Noble J (2010) Qualitas corpus: a curated collection of java code for empirical studies. In: 2010 Asia pacific software engineering conference (APSEC2010), pp 336–345. https://doi.org/10.1109/APSEC.2010.46
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Google Scholar

Download references

Acknowledgements

This work has been conducted as a part of research and development project POIR.01.01.01-00-0792/16 supported by the National Centre for Research and Development (NCBiR). We would like to thank Tomasz Lewowski, Tomasz Korzeniowski, Marek Skrajnowski and the entire team from code quest sp. z o.o. for tagging code smells and for all of the comments and feedback from the real-world software engineering environment.

Author information

Authors and Affiliations

Faculty of Computer Science and Management, Wroclaw University of Science and Technology, Wroclaw, Poland
Hanna Grodzicka, Arkadiusz Ziobrowski, Zofia Łakomiak, Michał Kawa & Lech Madeyski

Authors

Hanna Grodzicka
View author publications
You can also search for this author in PubMed Google Scholar
Arkadiusz Ziobrowski
View author publications
You can also search for this author in PubMed Google Scholar
Zofia Łakomiak
View author publications
You can also search for this author in PubMed Google Scholar
Michał Kawa
View author publications
You can also search for this author in PubMed Google Scholar
Lech Madeyski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lech Madeyski .

Editor information

Editors and Affiliations

Institute of Information Technology, Lodz University of Technology, Łódź, Poland
Aneta Poniszewska-Marańda
Department of e-Business, Faculty of Business, Economics and Statistics, University of Vienna, Vienna, Wien, Austria
Natalia Kryvinska
Faculty of Computer Science, Bialystok University of Technology, Białystok, Poland
Stanisław Jarząbek
Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Lech Madeyski

Appendix: Reproduction classifier comparison

Following tables present comparison of our reproduction results to Arcelli Fontana et al. [1] (Tables 17, 18, 19).

Table 17 RWeka results for Data Class (grey) compared with Arcelli Fontana’s (white)

Full size table

Table 18 RWeka results for Feature Envy (grey) compared with Arcelli Fontana’s (white)

Full size table

Table 19 RWeka results for God Class (grey) compared with Arcelli Fontana’s (white)

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Grodzicka, H., Ziobrowski, A., Łakomiak, Z., Kawa, M., Madeyski, L. (2020). Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs. In: Poniszewska-Marańda, A., Kryvinska, N., Jarząbek, S., Madeyski, L. (eds) Data-Centric Business and Applications. Lecture Notes on Data Engineering and Communications Technologies, vol 40. Springer, Cham. https://doi.org/10.1007/978-3-030-34706-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-34706-2_8
Published: 15 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34705-5
Online ISBN: 978-3-030-34706-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Reproduction classifier comparison

Appendix: Reproduction classifier comparison

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation