Skip to main content

Model-Based Mining of Source Code Repositories

  • Conference paper
System Analysis and Modeling: Models and Reusability (SAM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8769))

Included in the following conference series:

Abstract

The Mining Software Repositories (MSR) field analyzes the rich data available in source code repositories (SCR) to uncover interesting and actionable information about software system evolution. Major obstacles in MSR are the heterogeneity of software projects and the amount of data that is processed. Model-driven software engineering (MDSE) can deal with heterogeneity by abstraction as its core strength, but only recent efforts in adopting NoSQL-databases for persisting and processing very large models made MDSE a feasible approach for MSR. This paper is a work in progress report on srcrepo: a model-based MSR system. Srcrepo uses the NoSQL-based EMF-model persistence layer EMF-Fragments and Eclipse’s MoDisco reverse engineering framework to create EMF-models of whole SCRs that comprise all code of all revisions at an abstract syntax tree (AST) level. An OCL-like language is used as an accessible way to finally gather information such as software metrics from these SCR models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altmanninger, K., Seidl, M., Wimmer, M.: A survey on model versioning approaches. Intl. Journal of Web Information Systems (IJWIS) 5(3), 271–304 (2009)

    Article  Google Scholar 

  2. Bajracharya, S., Ossher, J., Lepos, C.: Sourcerer: An internet-scale software repository. In: Proceedings of Search-Driven Development-Users, Infrastructure, Tools and Evaluation (SUITE 2009), an ICSE 2009 Workshop, pp. 1–4. IEEE Computer Society, Vancouver (2009)

    Chapter  Google Scholar 

  3. Barmpis, K., Kolovos, D.S.: Comparative analysis of data persistence technologies for large-scale models. In: Proceedings of the 2012 Extreme Modeling Workshop, XM 2012, pp. 33–38. ACM, New York (2012)

    Chapter  Google Scholar 

  4. Barmpis, K., Kolovos, D.: Hawk: Towards a scalable model indexing architecture. In: Proceedings of the Workshop on Scalability in Model Driven Engineering, BigMDE 2013, pp. 6:1–6:9. ACM, New York (2013)

    Google Scholar 

  5. Basili, V.R., Briand, L.C., Melo, W.L.: A validation of object-oriented design metrics as quality indicators. IEEE Trans. Softw. Eng. 22(10), 751–761 (1996)

    Article  Google Scholar 

  6. Benelallam, A., Gómez, A., Sunyé, G., Tisi, M., Launay, D.: Neo4EMF, a scalable persistence layer for EMF models. In: Cabot, J., Rubin, J. (eds.) ECMFA 2014. LNCS, vol. 8569, pp. 230–241. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  7. Bruneliere, H., Cabot, J., Jouault, F., Madiot, F.: Modisco: A generic and extensible framework for model driven reverse engineering. In: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, ASE 2010, pp. 173–174. ACM (2010)

    Google Scholar 

  8. Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 20(6), 476–493 (1994)

    Article  Google Scholar 

  9. Cox, A., Clarke, C., Sim, S.: A model independent source code repository. In: Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative Research, CASCON 1999, p. 1. IBM Press (1999)

    Google Scholar 

  10. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. In: Proceedings of 21st ACM SIGOPS Symposium on Operating Systems Principles, SOSP 2007, pp. 205–220. ACM, New York (2007)

    Google Scholar 

  11. Dyer, R., Nguyen, H.A., Rajan, H., Nguyen, T.N.: Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE 2013, pp. 422–431. IEEE Press, Piscataway (2013)

    Google Scholar 

  12. Espinazo Pagán, J., Sánchez Cuadrado, J., García Molina, J.: Morsa: A scalable approach for persisting and accessing large models. In: Whittle, J., Clark, T., Kühne, T. (eds.) MODELS 2011. LNCS, vol. 6981, pp. 77–92. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  13. FLOSSMetrics consortium: Flossmetrics final report: Free/libre/open source metrics and benchmarking. Tech. Rep. FP6-033982, FLOSSMetrics consortium (March 2010), http://www.flossmetrics.org/docs/fm3-final-report_en.pdf

  14. George, L., Wider, A., Scheidgen, M.: Type-safe model transformation languages as internal dSLs in scala. In: Hu, Z., de Lara, J. (eds.) ICMT 2012. LNCS, vol. 7307, pp. 160–175. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  15. Gousios, G., Spinellis, D.: A platform for software engineering research. In: Godfrey, M.W., Whitehead, J. (eds.) 6th IEEE International Working Conference on Mining Software Repositories, MSR 2009, pp. 31–40. IEEE (2009)

    Google Scholar 

  16. Gyimothy, T., Ferenc, R., Siket, I.: Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans. Softw. Eng. 31(10), 897–910 (2005)

    Article  Google Scholar 

  17. Kagdi, H., Collard, M.L., Maletic, J.I.: A survey and taxonomy of approaches for mining software repositories in the context of software evolution. Journal of Software Maintenance and Evolution: Research and Practice 19(2), 77–131 (2007)

    Article  Google Scholar 

  18. Kagdi, H.H., Collard, M.L., Maletic, J.I.: Towards a taxonomy of approaches for mining of source code repositories. ACM SIGSOFT Software Engineering Notes 30(4), 1–5 (2005)

    Article  Google Scholar 

  19. Khetrapal, A., Ganesh, V.: HBase and Hypertable for large scale distributed storage systems a performance evaluation for open source Big-table implementations. Tech. rep., Purdue University (2008)

    Google Scholar 

  20. Lakshman, A., Malik, P.: Cassandra: Structured storage system on a P2P network. In: Proceedings of the 28th ACM Symposium on Principles of Distributed Computing, PODC 2009, p. 5. ACM, New York (2009)

    Google Scholar 

  21. Livshits, B., Zimmermann, T.: Dynamine: Finding common error patterns by mining software revision histories. In: Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-13, pp. 296–305. ACM, New York (2005)

    Google Scholar 

  22. Scheidgen, M.: EMFFrag – Meta-Model-based Model Fragmentation and Persistence Framework (2012), http://github.com/markus1978/emf-fragments

  23. Scheidgen, M.: Reference representation techniques for large models. In: Proceedings of the Workshop on Scalability in Model Driven Engineering, BigMDE 2013, pp. 5:1–5:9. ACM (2013)

    Google Scholar 

  24. Scheidgen, M., Zubow, A., Fischer, J., Kolbe, T.H.: Automated and transparent model fragmentation for persisting large models. In: France, R.B., Kazmeier, J., Breu, R., Atkinson, C. (eds.) MODELS 2012. LNCS, vol. 7590, pp. 102–118. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  25. Steinberg, D., Budinsky, F., Paternostro, M., Merks, E.: EMF: Eclipse Modeling Framework, 2nd edn. Addison-Wesley, Boston (2009)

    Google Scholar 

  26. Subramanyam, R., Krishnan, M.S.: Empirical analysis of CK metrics for object-oriented design complexity: Implications for software defects. IEEE Trans. Softw. Eng. 29(4), 297–310 (2003)

    Article  Google Scholar 

  27. Williams, C.C., Hollingsworth, J.K.: Automatic mining of source code repositories to improve bug finding techniques. IEEE Trans. Software Eng. 31(6), 466–480 (2005)

    Article  Google Scholar 

  28. Yu, P., Systä, T., Müller, H.A.: Predicting fault-proneness using OO metrics: An industrial case study. In: Proceedings of the 6th European Conference on Software Maintenance and Reengineering, CSMR 2002, pp. 99–107. IEEE Computer Society, Washington, DC (2002)

    Google Scholar 

  29. Zimmermann, T., Weißgerber, P., Diehl, S., Zeller, A.: Mining version histories to guide software changes. IEEE Trans. Software Eng. 31(6), 429–445 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Scheidgen, M., Fischer, J. (2014). Model-Based Mining of Source Code Repositories. In: Amyot, D., Fonseca i Casas, P., Mussbacher, G. (eds) System Analysis and Modeling: Models and Reusability. SAM 2014. Lecture Notes in Computer Science, vol 8769. Springer, Cham. https://doi.org/10.1007/978-3-319-11743-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11743-0_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11742-3

  • Online ISBN: 978-3-319-11743-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics