Abstract
The Mining Software Repositories (MSR) field analyzes the rich data available in source code repositories (SCR) to uncover interesting and actionable information about software system evolution. Major obstacles in MSR are the heterogeneity of software projects and the amount of data that is processed. Model-driven software engineering (MDSE) can deal with heterogeneity by abstraction as its core strength, but only recent efforts in adopting NoSQL-databases for persisting and processing very large models made MDSE a feasible approach for MSR. This paper is a work in progress report on srcrepo: a model-based MSR system. Srcrepo uses the NoSQL-based EMF-model persistence layer EMF-Fragments and Eclipse’s MoDisco reverse engineering framework to create EMF-models of whole SCRs that comprise all code of all revisions at an abstract syntax tree (AST) level. An OCL-like language is used as an accessible way to finally gather information such as software metrics from these SCR models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altmanninger, K., Seidl, M., Wimmer, M.: A survey on model versioning approaches. Intl. Journal of Web Information Systems (IJWIS) 5(3), 271–304 (2009)
Bajracharya, S., Ossher, J., Lepos, C.: Sourcerer: An internet-scale software repository. In: Proceedings of Search-Driven Development-Users, Infrastructure, Tools and Evaluation (SUITE 2009), an ICSE 2009 Workshop, pp. 1–4. IEEE Computer Society, Vancouver (2009)
Barmpis, K., Kolovos, D.S.: Comparative analysis of data persistence technologies for large-scale models. In: Proceedings of the 2012 Extreme Modeling Workshop, XM 2012, pp. 33–38. ACM, New York (2012)
Barmpis, K., Kolovos, D.: Hawk: Towards a scalable model indexing architecture. In: Proceedings of the Workshop on Scalability in Model Driven Engineering, BigMDE 2013, pp. 6:1–6:9. ACM, New York (2013)
Basili, V.R., Briand, L.C., Melo, W.L.: A validation of object-oriented design metrics as quality indicators. IEEE Trans. Softw. Eng. 22(10), 751–761 (1996)
Benelallam, A., Gómez, A., Sunyé, G., Tisi, M., Launay, D.: Neo4EMF, a scalable persistence layer for EMF models. In: Cabot, J., Rubin, J. (eds.) ECMFA 2014. LNCS, vol. 8569, pp. 230–241. Springer, Heidelberg (2014)
Bruneliere, H., Cabot, J., Jouault, F., Madiot, F.: Modisco: A generic and extensible framework for model driven reverse engineering. In: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, ASE 2010, pp. 173–174. ACM (2010)
Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 20(6), 476–493 (1994)
Cox, A., Clarke, C., Sim, S.: A model independent source code repository. In: Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative Research, CASCON 1999, p. 1. IBM Press (1999)
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. In: Proceedings of 21st ACM SIGOPS Symposium on Operating Systems Principles, SOSP 2007, pp. 205–220. ACM, New York (2007)
Dyer, R., Nguyen, H.A., Rajan, H., Nguyen, T.N.: Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE 2013, pp. 422–431. IEEE Press, Piscataway (2013)
Espinazo Pagán, J., Sánchez Cuadrado, J., García Molina, J.: Morsa: A scalable approach for persisting and accessing large models. In: Whittle, J., Clark, T., Kühne, T. (eds.) MODELS 2011. LNCS, vol. 6981, pp. 77–92. Springer, Heidelberg (2011)
FLOSSMetrics consortium: Flossmetrics final report: Free/libre/open source metrics and benchmarking. Tech. Rep. FP6-033982, FLOSSMetrics consortium (March 2010), http://www.flossmetrics.org/docs/fm3-final-report_en.pdf
George, L., Wider, A., Scheidgen, M.: Type-safe model transformation languages as internal dSLs in scala. In: Hu, Z., de Lara, J. (eds.) ICMT 2012. LNCS, vol. 7307, pp. 160–175. Springer, Heidelberg (2012)
Gousios, G., Spinellis, D.: A platform for software engineering research. In: Godfrey, M.W., Whitehead, J. (eds.) 6th IEEE International Working Conference on Mining Software Repositories, MSR 2009, pp. 31–40. IEEE (2009)
Gyimothy, T., Ferenc, R., Siket, I.: Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans. Softw. Eng. 31(10), 897–910 (2005)
Kagdi, H., Collard, M.L., Maletic, J.I.: A survey and taxonomy of approaches for mining software repositories in the context of software evolution. Journal of Software Maintenance and Evolution: Research and Practice 19(2), 77–131 (2007)
Kagdi, H.H., Collard, M.L., Maletic, J.I.: Towards a taxonomy of approaches for mining of source code repositories. ACM SIGSOFT Software Engineering Notes 30(4), 1–5 (2005)
Khetrapal, A., Ganesh, V.: HBase and Hypertable for large scale distributed storage systems a performance evaluation for open source Big-table implementations. Tech. rep., Purdue University (2008)
Lakshman, A., Malik, P.: Cassandra: Structured storage system on a P2P network. In: Proceedings of the 28th ACM Symposium on Principles of Distributed Computing, PODC 2009, p. 5. ACM, New York (2009)
Livshits, B., Zimmermann, T.: Dynamine: Finding common error patterns by mining software revision histories. In: Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-13, pp. 296–305. ACM, New York (2005)
Scheidgen, M.: EMFFrag – Meta-Model-based Model Fragmentation and Persistence Framework (2012), http://github.com/markus1978/emf-fragments
Scheidgen, M.: Reference representation techniques for large models. In: Proceedings of the Workshop on Scalability in Model Driven Engineering, BigMDE 2013, pp. 5:1–5:9. ACM (2013)
Scheidgen, M., Zubow, A., Fischer, J., Kolbe, T.H.: Automated and transparent model fragmentation for persisting large models. In: France, R.B., Kazmeier, J., Breu, R., Atkinson, C. (eds.) MODELS 2012. LNCS, vol. 7590, pp. 102–118. Springer, Heidelberg (2012)
Steinberg, D., Budinsky, F., Paternostro, M., Merks, E.: EMF: Eclipse Modeling Framework, 2nd edn. Addison-Wesley, Boston (2009)
Subramanyam, R., Krishnan, M.S.: Empirical analysis of CK metrics for object-oriented design complexity: Implications for software defects. IEEE Trans. Softw. Eng. 29(4), 297–310 (2003)
Williams, C.C., Hollingsworth, J.K.: Automatic mining of source code repositories to improve bug finding techniques. IEEE Trans. Software Eng. 31(6), 466–480 (2005)
Yu, P., Systä, T., Müller, H.A.: Predicting fault-proneness using OO metrics: An industrial case study. In: Proceedings of the 6th European Conference on Software Maintenance and Reengineering, CSMR 2002, pp. 99–107. IEEE Computer Society, Washington, DC (2002)
Zimmermann, T., Weißgerber, P., Diehl, S., Zeller, A.: Mining version histories to guide software changes. IEEE Trans. Software Eng. 31(6), 429–445 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Scheidgen, M., Fischer, J. (2014). Model-Based Mining of Source Code Repositories. In: Amyot, D., Fonseca i Casas, P., Mussbacher, G. (eds) System Analysis and Modeling: Models and Reusability. SAM 2014. Lecture Notes in Computer Science, vol 8769. Springer, Cham. https://doi.org/10.1007/978-3-319-11743-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-11743-0_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11742-3
Online ISBN: 978-3-319-11743-0
eBook Packages: Computer ScienceComputer Science (R0)