Evolution of Apache Open Source Software
Our modern infrastructure relies increasingly on computation and computers. Accompanying this is a rise in the prevalence and complexity of computer programs. Current software systems (composed of an interacting collection of programs, functions, classes, etc.) implement a tremendous range of functionality, from simple mathematical operations to intricate control systems. Software systems are inherently extendable and tend to gain new functionality over time. Modern computers and programming languages are Turing complete and, thus, capable of implementing any computable function no matter how complex. The interdependencies between the elements of a software system form a network, and, therefore, we believe software systems can provide useful prototypic examples of how to build complex networked systems which require minimal maintenance, are robust bugs to and yet are readily extendable. Thus we ask: What makes for good design in software systems?
We are particularly interested in open source software (OSS)—software with source code that is freely available for download and modification. A typical OSS project is a collaborative effort by volunteers, with no central authority assigning development tasks. Instead individuals, or self-organized teams of developers, fix bugs and maintain and extend the code. In OSS, modularity is essential (1; 2), and remarkably, the software resulting from an OSS process can rival or even surpass the quality of commercial software [3; 4].
KeywordsAkaike Information Criterion Random Graph Degree Distribution Open Source Software Call Graph
We are indebted to Christian Bird for supplying the call graph data which is central to our analysis and to Premkumar Devanbu for many useful discussions. This work was funded in part by the National Science Foundation under Grant No. IIS-0613949.
- 1.E. S. Raymond. The Cathedral & the Bazaar. O'Reilly and Associates, Sebastopol, CA, 1999.Google Scholar
- 2.T. O'Reilly. Lessons from open source software development. Communications of the ACM, 42(4), 1999.Google Scholar
- 3.P. Ball. Openness makes software better sooner. Nature, June 25, 2003.Google Scholar
- 4.D. Challet and Y. Le Du. Microscopic model of software bug dynamics: Closed source versus open source. International Journal of Reliability, Quality and Safety Engineering, 12(6), 2005.Google Scholar
- 5.M. Fowler. Refactoring: Improving the Design of Existing Programs. Addison-Wesley, Reading, MA, 1999.Google Scholar
- 8.Software Maintenance Costs and references therein, http://www.cs.jyu.fi/∼koskinen/smcosts.htm.
- 11.A. MacCormack, J. Rusnak, and C. Y. Baldwin. Exploring the structure of complex software designs: An empirical study of open source and proprietary code. Management Science, 52(7), 2006.Google Scholar
- 12.Z. M. Saul, V. Filkov, P. T. Devanbu, and C. Bird. Recommending random walks. In Proceedings ESEC/SIGSOFT FSE, pages 15–24, 2007.Google Scholar
- 15.B. Bollobas. The evolution of sparse graphs. In Graph Theory and Combinatorics, pages 35–57. Academic Press, New York, 1984.Google Scholar
- 17.S. Valverde and R. V. Solé. Hierarchical small worlds in software architecture. In Dynamics of Continuous Discrete and Impulsive Systems: Series B; Applications and Algorithms, volume 14, pages 1–11, 2007.Google Scholar
- 18.G. Baxter, M. Frean, J. Noble, M. Rickerby, H. Smith, M. Visser, H. Melton, and E. Tempero. Understanding the shape of Java software. In OOPSLA '06: Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 397–412, New York, NY, USA, 2006. ACM.Google Scholar
- 20.D. Sharman and A. Yassine. Characterizing complex product architectures. Systems Engineering Journal, 7(1), 2004.Google Scholar
- 22.P. Erdós and A. Rényi. On random graphs. Publicationes Mathematicae, 6:290–297, 1959.Google Scholar
- 23.P. Erdós and A. Rényi. On the evolution of random graphs. Publ. Math. Inst. Hungar. Acad. Sci., 5(17), 1960.Google Scholar
- 26.T. A. B. Snijders. Markov chain Monte Carlo estimation of exponential random graph models. Journal of Social Structure, 3(2), 2002.Google Scholar
- 28.D. Kaplan. The Sage Handbook of Quantitative Methodology for the Social Sciences. Sage Publications Inc., London, 2004.Google Scholar
- 30.The R Project for Statistical Computing, http://www.r-project.org.
- 31.M. S. Handcock, D. R. Hunter, C. T. Butts, S. M. Goodreau, and M. Morris. statnet: An r package for the statistical modeling of social networks, 2003. http://www.csde.washington.edu/statnet.
- 32.T. A. B. Snijders, P. E. Pattison, G. L. Robins, and M. S. Handcock. New specifications for exponential random graph models. Sociological Methodology, 99–153 2006.Google Scholar
- 33.S. Konishi and G. Kitagawa. Information Criteria and Statistical Modeling. Springer, New York, 2008.Google Scholar
- 36.D. R. Hunter and M. S. Handcock. Inference in curved exponential family models for networks. Technical report, Penn State Department of Statistics, 2004. Available from http://www.stat.psu.edu/reports/2004/.