Skip to main content

Enabling Empirical Research: A Corpus of Large-Scale Python Systems

  • Conference paper
  • First Online:
Proceedings of the Future Technologies Conference (FTC) 2019 (FTC 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1070))

Included in the following conference series:

Abstract

The Python programming language has been picking up traction in Industry for the past few years in virtually all application domains. Python is known for its high calibre and passionate community of developers. Empirical research on Python systems has potential to promote a healthy environment, where claims and beliefs held by the community are supported by data. To facilitate such research, a corpus of 132 open source python projects have been identified, basic information, quality as well as complexity metrics has been collected and organized into CSV files. Collectively, the list consists of 36, 635 python modules, 59, 532 classes, 253, 954 methods and 84, 892 functions. Projects in the selected list span various application domains including Web/APIs, Scientific Computing, Security and more.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. An empirical study of dynamic types for python projects. In: 8th International Conference (SATE)), November 2018

    Google Scholar 

  2. Akerblom, B., Wrigstad, T.: Measuring polymorphism in Python programs. In: Proceedings of the 11th Symposium on Dynamic Languages, DLS 2015. ACM (2015)

    Google Scholar 

  3. Alexandru, C.V., Merchante, J.J., Panichella, S., Proksch, S., Gall, H.C., Robles, G.: On the usage of Pythonic idioms. In: Proceedings of the 2018 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Onward! 2018. ACM (2018)

    Google Scholar 

  4. Chen, Z., Ma, W., Lin, W., Chen, L., Xu, B.: Tracking down dynamic feature code changes against Python software evolution. In: 2016 Third International Conference on Trustworthy Systems and their Applications (TSA), September 2016

    Google Scholar 

  5. Destefanis, G., Ortu, M., Porru, S., Swift, S., Marchesi, M.: A statistical comparison of Java and Python software metric properties. In: 2016 IEEE/ACM 7th International Workshop on Emerging Trends in Software Metrics (WETSoM), May 2016

    Google Scholar 

  6. Destefanis, G., Counsell, S., Concas, G., Tonelli, R.: Software metrics in Agile Software: an empirical study. In: Agile Processes in Software Engineering and Extreme Programming, pp. 157–170. Springer, Heidelberg (2014)

    Google Scholar 

  7. Guo, P.: Python is now the most popular introductory teaching language at top U.S. universities (2014). https://cacm.acm.org/blogs/blog-cacm/176450-python-is-now-the-most-popular-introductory-teaching-language-at-top-u-s-universities/fulltext

  8. Lin, W., Chen, Z., Ma, W., Chen, L., Xu, L., Xu, B.: An empirical study on the characteristics of Python fine-grained source code change types. In: 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), October 2016

    Google Scholar 

  9. Malloy, B.A., Power, J.F.: Quantifying the transition from Python 2 to 3: an empirical study of Python applications. In: 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), November 2017

    Google Scholar 

  10. Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceedings of the 28th International Conference on Software Engineering, ICSE 2006. ACM (2006)

    Google Scholar 

  11. Nanz, S., Furia, C.A.: A comparative study of programming languages in Rosetta code. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 1, May 2015

    Google Scholar 

  12. Orrú, M., Tempero, E.D., Marchesi, M., Tonelli, R., Destefanis, G.: A curated benchmark collection of Python systems for empirical studies on software engineering. In: Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering, PROMISE 2015. ACM (2015)

    Google Scholar 

  13. The software quality company. Python is TOIBE’s programming language of the year 2018 (2019)

    Google Scholar 

  14. In, H., Lee, T., Lee, J.B.: A study of different coding styles affecting code readability. Int. J. Softw. Eng. Appl. 7(5), 413–422 (2013)

    Google Scholar 

  15. Tempero, E., Anslow, C., Dietrich, J., Han, T., Li, J., Lumpe, M., Melton, H., Noble, J.: The Qualitas Corpus: a curated collection of Java code for empirical studies. In: 2010 Asia Pacific Software Engineering Conference (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Safwan Omari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Omari, S., Martinez, G. (2020). Enabling Empirical Research: A Corpus of Large-Scale Python Systems. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Proceedings of the Future Technologies Conference (FTC) 2019. FTC 2019. Advances in Intelligent Systems and Computing, vol 1070. Springer, Cham. https://doi.org/10.1007/978-3-030-32523-7_49

Download citation

Publish with us

Policies and ethics