Empirical Software Engineering

, Volume 14, Issue 3, pp 316–340 | Cite as

Visual querying and analysis of large software repositories

  • Lucian Voinea
  • Alexandru Telea


We present a software framework for mining software repositories. Our extensible framework enables the integration of data extraction from repositories with data analysis and interactive visualization. We demonstrate the applicability of the framework by presenting several case studies performed on industry-size software repositories. In each study we use the framework to give answers to one or several software engineering questions addressing a specific project. Next, we validate the answers by comparing them with existing project documentation, by interviewing domain experts and by detailed analyses of the source code. The results show that our framework can be used both for supporting case studies on mining software repository techniques and for building end-user tools for software maintenance support.


Software visualization Evolution visualization Repository mining 


  1. Ball T, Kim JM, Porter AA, Siy HP (1997) If your version control system could talk.... In: Proc. ICSE ’97 workshop on process modeling and empirical studies of software engineeringGoogle Scholar
  2. Bennett K, Burd E, Kemerer C, Lehman MM, Lee M, Madachy R, Mair C, Sjoberg D, Slaughter S (1999) Empirical studies of evolving systems. Empirical Soft Eng 4(4):370–380CrossRefGoogle Scholar
  3. Bieman JM, Andrews AA, Yang HJ (2003) Understanding change-proneness in oo software through visualization. In: IWPC’03: Proc. intl. workshop on program comprehension. IEEE CS Press, pp 44–53Google Scholar
  4. Burch M, Diehl S, Weißgerber P (2005) Visual data mining in software archives. In: SoftVis ’05: Proc. ACM symposium on software visualization. ACM Press, pp 37–46Google Scholar
  5. Collberg C, Kobourov S, Nagra J, Pitts J, Wampler K (2003) A system for graph-based visualization of the evolution of software. In: SoftVis’03: Proc. ACM symposium on software visualization. ACM Press, pp 77–86Google Scholar
  6. Cubranic D, Murphy GC, Singer J, Booth KS (2005) Hipikat: a project memory for software development. IEEE Trans Softw Eng 31(6):446–465CrossRefGoogle Scholar
  7. Eick SG, Steffen JL, Sumner EE (1992) SeeSoft—a tool for visualizing line oriented software statistics. IEEE Trans Soft Eng 18(11):957–968CrossRefGoogle Scholar
  8. Everitt E, Landau S, Leese M (2001) Cluster analysis. Arnold Publishers, IncGoogle Scholar
  9. Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: ICSM’03: Proc. intl. conference on software maintenance. IEEE CS Press, pp 23–32Google Scholar
  10. Froehlich J, Dourish P (2004) Unifying artifacts and activities in a visual tool for distributed software development teams. In: ICSE’04: Proc. intl. conference on software engineering. IEEE CS Press, pp 387–396Google Scholar
  11. Gall H, Jazayeri M, Krajewski J (2003) CVS release history data for detecting logical couplings. In: IWPSE’03: Proc. intl. workshop on principles of software evolution. IEEE CS Press, pp 13–23Google Scholar
  12. German D, Mockus A (2003) Automating the measurement of open source projects. In: Proc. ICSE’03 workshop on open source software engineering, pp 63–38Google Scholar
  13. German D, Hindle A, Jordan N (2004) Visualizing the evolution of software using SoftChange. In: ICSEKE’04: Proc. 16th intl. conference on software engineering and knowledge engineering, pp 336–341Google Scholar
  14. Greenwood RM, Warboys B, Harrison R, Henderson P (1998) An empirical study of the evolution of a software system. In: ASE’98: Proc. 13th conference on automated software engineering. IEEE CS Press, pp 293–296Google Scholar
  15. Lanza M (2001) The evolution matrix: recovering software evolution using software visualization techniques. In: IWPSE’01: Proc. intl. workshop on principles of software evolution. ACM Press, pp 37–42Google Scholar
  16. Lopez-Fernandez L, Robles G, Gonzalez-Barahona JM (2004) Applying social network analysis to the information in cvs repositories. In: MSR’04: Proc. intl. workshop on mining software repositories. IEEE CS PressGoogle Scholar
  17. Microsoft Inc (2007) Age of empires game.
  18. Voinea L, Telea A (2006a) CVSgrab: mining the history of large software projects. In: EuroVis’06: Proc. eurographics/IEEE-VGTC symposium on visualization. IEEE CS Press, pp 187–194Google Scholar
  19. Voinea L, Telea A (2006b) How do changes in buggy Mozilla files propagate? In: SoftVis ’06: Proc. ACM symposium on software visualization. ACM Press, pp 147–148Google Scholar
  20. Voinea L, Telea A (2006c) Mining software repositories with CVSgrab. In: MSR ’06: Proc. intl. workshop on mining software repositories. ACM Press, pp 167–168Google Scholar
  21. Voinea L, Telea A (2006d) Multiscale and multivariate visualizations of software evolution. In: SoftVis ’06: Proceedings of the 2006 ACM symposium on software visualization. ACM Press, pp 115–124Google Scholar
  22. Voinea L, Telea A (2007) Visual data mining and analysis of software repositories. Comput Graph 31(3):410–428CrossRefMathSciNetGoogle Scholar
  23. Voinea L, Telea A, van Wijk JJ (2005) Visualization of code evolution. In: SoftVis’05: Proc. ACM symposium on software visualization. ACM Press, pp 47–56Google Scholar
  24. Wu J, Spitzer C, Hassan A, Holt R (2004a) Evolution spectrographs: visualizing punctuated change in software evolution. In: IWPSE’04: Proc. intl. workshop on principles of software evolution. IEEE CS Press, pp 57–66Google Scholar
  25. Wu X, Murray A, Storey MA, Lintern R (2004b) A reverse engineering approach to support software maintenance: version control knowledge extraction. In: WCRE ’04: Proceedings of the 11th working conference on reverse engineering (WCRE’04). IEEE Computer Society, Washington, DC, USA, pp 90–99Google Scholar
  26. Ying ATT, Murphy GC, Ng R, Chu-Carroll MC (2004) Predicting source code changes by mining revision history. IEEE Trans Soft Eng 30(9):574–586CrossRefGoogle Scholar
  27. Zimmermann T, Weisgerber P (2004) Preprocessing CVS data for fine-grained analysis. In: MSR’04: Proc. intl. workshop on mining software repositoriesGoogle Scholar
  28. Zimmermann T, Weisgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: ICSE ’04: Proc. intl. conference on software engineering. IEEE CS Press, pp 563–572Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.SolidSource BVEV EindhovenThe Netherlands
  2. 2.Institute of Mathematics and Computer ScienceUniversity of GroningenAG GroningenThe Netherlands

Personalised recommendations