Enabling Empirical Research: A Corpus of Large-Scale Python Systems

Omari, Safwan; Martinez, Gina

doi:10.1007/978-3-030-32523-7_49

Safwan Omari¹⁷ &
Gina Martinez¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1070))

Included in the following conference series:

Proceedings of the Future Technologies Conference

1530 Accesses
3 Citations
1 Altmetric

Abstract

The Python programming language has been picking up traction in Industry for the past few years in virtually all application domains. Python is known for its high calibre and passionate community of developers. Empirical research on Python systems has potential to promote a healthy environment, where claims and beliefs held by the community are supported by data. To facilitate such research, a corpus of 132 open source python projects have been identified, basic information, quality as well as complexity metrics has been collected and organized into CSV files. Collectively, the list consists of 36, 635 python modules, 59, 532 classes, 253, 954 methods and 84, 892 functions. Projects in the selected list span various application domains including Web/APIs, Scientific Computing, Security and more.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

An empirical study of dynamic types for python projects. In: 8th International Conference (SATE)), November 2018
Google Scholar
Akerblom, B., Wrigstad, T.: Measuring polymorphism in Python programs. In: Proceedings of the 11th Symposium on Dynamic Languages, DLS 2015. ACM (2015)
Google Scholar
Alexandru, C.V., Merchante, J.J., Panichella, S., Proksch, S., Gall, H.C., Robles, G.: On the usage of Pythonic idioms. In: Proceedings of the 2018 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Onward! 2018. ACM (2018)
Google Scholar
Chen, Z., Ma, W., Lin, W., Chen, L., Xu, B.: Tracking down dynamic feature code changes against Python software evolution. In: 2016 Third International Conference on Trustworthy Systems and their Applications (TSA), September 2016
Google Scholar
Destefanis, G., Ortu, M., Porru, S., Swift, S., Marchesi, M.: A statistical comparison of Java and Python software metric properties. In: 2016 IEEE/ACM 7th International Workshop on Emerging Trends in Software Metrics (WETSoM), May 2016
Google Scholar
Destefanis, G., Counsell, S., Concas, G., Tonelli, R.: Software metrics in Agile Software: an empirical study. In: Agile Processes in Software Engineering and Extreme Programming, pp. 157–170. Springer, Heidelberg (2014)
Google Scholar
Guo, P.: Python is now the most popular introductory teaching language at top U.S. universities (2014). https://cacm.acm.org/blogs/blog-cacm/176450-python-is-now-the-most-popular-introductory-teaching-language-at-top-u-s-universities/fulltext
Lin, W., Chen, Z., Ma, W., Chen, L., Xu, L., Xu, B.: An empirical study on the characteristics of Python fine-grained source code change types. In: 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), October 2016
Google Scholar
Malloy, B.A., Power, J.F.: Quantifying the transition from Python 2 to 3: an empirical study of Python applications. In: 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), November 2017
Google Scholar
Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceedings of the 28th International Conference on Software Engineering, ICSE 2006. ACM (2006)
Google Scholar
Nanz, S., Furia, C.A.: A comparative study of programming languages in Rosetta code. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 1, May 2015
Google Scholar
Orrú, M., Tempero, E.D., Marchesi, M., Tonelli, R., Destefanis, G.: A curated benchmark collection of Python systems for empirical studies on software engineering. In: Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering, PROMISE 2015. ACM (2015)
Google Scholar
The software quality company. Python is TOIBE’s programming language of the year 2018 (2019)
Google Scholar
In, H., Lee, T., Lee, J.B.: A study of different coding styles affecting code readability. Int. J. Softw. Eng. Appl. 7(5), 413–422 (2013)
Google Scholar
Tempero, E., Anslow, C., Dietrich, J., Han, T., Li, J., Lumpe, M., Melton, H., Noble, J.: The Qualitas Corpus: a curated collection of Java code for empirical studies. In: 2010 Asia Pacific Software Engineering Conference (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Mathematical Sciences, Lewis University, Romeoville, IL, USA
Safwan Omari & Gina Martinez

Authors

Safwan Omari
View author publications
You can also search for this author in PubMed Google Scholar
Gina Martinez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Safwan Omari .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Omari, S., Martinez, G. (2020). Enabling Empirical Research: A Corpus of Large-Scale Python Systems. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Proceedings of the Future Technologies Conference (FTC) 2019. FTC 2019. Advances in Intelligent Systems and Computing, vol 1070. Springer, Cham. https://doi.org/10.1007/978-3-030-32523-7_49

Download citation

DOI: https://doi.org/10.1007/978-3-030-32523-7_49
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32522-0
Online ISBN: 978-3-030-32523-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics