Skip to main content
Log in

Visual querying and analysis of large software repositories

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

We present a software framework for mining software repositories. Our extensible framework enables the integration of data extraction from repositories with data analysis and interactive visualization. We demonstrate the applicability of the framework by presenting several case studies performed on industry-size software repositories. In each study we use the framework to give answers to one or several software engineering questions addressing a specific project. Next, we validate the answers by comparing them with existing project documentation, by interviewing domain experts and by detailed analyses of the source code. The results show that our framework can be used both for supporting case studies on mining software repository techniques and for building end-user tools for software maintenance support.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. The entire project contains more than 850 versions, but we were only interested analyzing a subperiod of its entire evolution that covered these versions.

  2. The CVSgrab tool produces full-color visualizations. These have been converted to grayscale for printing purposes.

  3. The mediator makes it possible to couple CVSgrab visualizations with both CVS and Subversion repository data.

References

  • Ball T, Kim JM, Porter AA, Siy HP (1997) If your version control system could talk.... In: Proc. ICSE ’97 workshop on process modeling and empirical studies of software engineering

  • Bennett K, Burd E, Kemerer C, Lehman MM, Lee M, Madachy R, Mair C, Sjoberg D, Slaughter S (1999) Empirical studies of evolving systems. Empirical Soft Eng 4(4):370–380

    Article  Google Scholar 

  • Bieman JM, Andrews AA, Yang HJ (2003) Understanding change-proneness in oo software through visualization. In: IWPC’03: Proc. intl. workshop on program comprehension. IEEE CS Press, pp 44–53

  • Burch M, Diehl S, Weißgerber P (2005) Visual data mining in software archives. In: SoftVis ’05: Proc. ACM symposium on software visualization. ACM Press, pp 37–46

  • Collberg C, Kobourov S, Nagra J, Pitts J, Wampler K (2003) A system for graph-based visualization of the evolution of software. In: SoftVis’03: Proc. ACM symposium on software visualization. ACM Press, pp 77–86

  • Cubranic D, Murphy GC, Singer J, Booth KS (2005) Hipikat: a project memory for software development. IEEE Trans Softw Eng 31(6):446–465

    Article  Google Scholar 

  • Eick SG, Steffen JL, Sumner EE (1992) SeeSoft—a tool for visualizing line oriented software statistics. IEEE Trans Soft Eng 18(11):957–968

    Article  Google Scholar 

  • Everitt E, Landau S, Leese M (2001) Cluster analysis. Arnold Publishers, Inc

  • Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: ICSM’03: Proc. intl. conference on software maintenance. IEEE CS Press, pp 23–32

  • Froehlich J, Dourish P (2004) Unifying artifacts and activities in a visual tool for distributed software development teams. In: ICSE’04: Proc. intl. conference on software engineering. IEEE CS Press, pp 387–396

  • Gall H, Jazayeri M, Krajewski J (2003) CVS release history data for detecting logical couplings. In: IWPSE’03: Proc. intl. workshop on principles of software evolution. IEEE CS Press, pp 13–23

  • German D, Mockus A (2003) Automating the measurement of open source projects. In: Proc. ICSE’03 workshop on open source software engineering, pp 63–38

  • German D, Hindle A, Jordan N (2004) Visualizing the evolution of software using SoftChange. In: ICSEKE’04: Proc. 16th intl. conference on software engineering and knowledge engineering, pp 336–341

  • Greenwood RM, Warboys B, Harrison R, Henderson P (1998) An empirical study of the evolution of a software system. In: ASE’98: Proc. 13th conference on automated software engineering. IEEE CS Press, pp 293–296

  • Lanza M (2001) The evolution matrix: recovering software evolution using software visualization techniques. In: IWPSE’01: Proc. intl. workshop on principles of software evolution. ACM Press, pp 37–42

  • Lopez-Fernandez L, Robles G, Gonzalez-Barahona JM (2004) Applying social network analysis to the information in cvs repositories. In: MSR’04: Proc. intl. workshop on mining software repositories. IEEE CS Press

  • Microsoft Inc (2007) Age of empires game. www.microsoft.com/games/empires

  • Voinea L, Telea A (2006a) CVSgrab: mining the history of large software projects. In: EuroVis’06: Proc. eurographics/IEEE-VGTC symposium on visualization. IEEE CS Press, pp 187–194

  • Voinea L, Telea A (2006b) How do changes in buggy Mozilla files propagate? In: SoftVis ’06: Proc. ACM symposium on software visualization. ACM Press, pp 147–148

  • Voinea L, Telea A (2006c) Mining software repositories with CVSgrab. In: MSR ’06: Proc. intl. workshop on mining software repositories. ACM Press, pp 167–168

  • Voinea L, Telea A (2006d) Multiscale and multivariate visualizations of software evolution. In: SoftVis ’06: Proceedings of the 2006 ACM symposium on software visualization. ACM Press, pp 115–124

  • Voinea L, Telea A (2007) Visual data mining and analysis of software repositories. Comput Graph 31(3):410–428

    Article  MathSciNet  Google Scholar 

  • Voinea L, Telea A, van Wijk JJ (2005) Visualization of code evolution. In: SoftVis’05: Proc. ACM symposium on software visualization. ACM Press, pp 47–56

  • Wu J, Spitzer C, Hassan A, Holt R (2004a) Evolution spectrographs: visualizing punctuated change in software evolution. In: IWPSE’04: Proc. intl. workshop on principles of software evolution. IEEE CS Press, pp 57–66

  • Wu X, Murray A, Storey MA, Lintern R (2004b) A reverse engineering approach to support software maintenance: version control knowledge extraction. In: WCRE ’04: Proceedings of the 11th working conference on reverse engineering (WCRE’04). IEEE Computer Society, Washington, DC, USA, pp 90–99

    Google Scholar 

  • Ying ATT, Murphy GC, Ng R, Chu-Carroll MC (2004) Predicting source code changes by mining revision history. IEEE Trans Soft Eng 30(9):574–586

    Article  Google Scholar 

  • Zimmermann T, Weisgerber P (2004) Preprocessing CVS data for fine-grained analysis. In: MSR’04: Proc. intl. workshop on mining software repositories

  • Zimmermann T, Weisgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: ICSE ’04: Proc. intl. conference on software engineering. IEEE CS Press, pp 563–572

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lucian Voinea.

Additional information

Editors: Prof. Hassan, Prof. Diehl and Prof. Gall

Rights and permissions

Reprints and permissions

About this article

Cite this article

Voinea, L., Telea, A. Visual querying and analysis of large software repositories. Empir Software Eng 14, 316–340 (2009). https://doi.org/10.1007/s10664-008-9068-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-008-9068-6

Keywords

Navigation