Evolution of Apache Open Source Software

  • Haoran Wen
  • Raissa M. D’Souza
  • Zachary M. Saul
  • Vladimir Filkov
Part of the Modeling and Simulation in Science, Engineering and Technology book series (MSSET)

Our modern infrastructure relies increasingly on computation and computers. Accompanying this is a rise in the prevalence and complexity of computer programs. Current software systems (composed of an interacting collection of programs, functions, classes, etc.) implement a tremendous range of functionality, from simple mathematical operations to intricate control systems. Software systems are inherently extendable and tend to gain new functionality over time. Modern computers and programming languages are Turing complete and, thus, capable of implementing any computable function no matter how complex. The interdependencies between the elements of a software system form a network, and, therefore, we believe software systems can provide useful prototypic examples of how to build complex networked systems which require minimal maintenance, are robust bugs to and yet are readily extendable. Thus we ask: What makes for good design in software systems?

We are particularly interested in open source software (OSS)—software with source code that is freely available for download and modification. A typical OSS project is a collaborative effort by volunteers, with no central authority assigning development tasks. Instead individuals, or self-organized teams of developers, fix bugs and maintain and extend the code. In OSS, modularity is essential (1; 2), and remarkably, the software resulting from an OSS process can rival or even surpass the quality of commercial software [3; 4].


Akaike Information Criterion Random Graph Degree Distribution Open Source Software Call Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We are indebted to Christian Bird for supplying the call graph data which is central to our analysis and to Premkumar Devanbu for many useful discussions. This work was funded in part by the National Science Foundation under Grant No. IIS-0613949.


  1. 1.
    E. S. Raymond. The Cathedral & the Bazaar. O'Reilly and Associates, Sebastopol, CA, 1999.Google Scholar
  2. 2.
    T. O'Reilly. Lessons from open source software development. Communications of the ACM, 42(4), 1999.Google Scholar
  3. 3.
    P. Ball. Openness makes software better sooner. Nature, June 25, 2003.Google Scholar
  4. 4.
    D. Challet and Y. Le Du. Microscopic model of software bug dynamics: Closed source versus open source. International Journal of Reliability, Quality and Safety Engineering, 12(6), 2005.Google Scholar
  5. 5.
    M. Fowler. Refactoring: Improving the Design of Existing Programs. Addison-Wesley, Reading, MA, 1999.Google Scholar
  6. 6.
    A. A. Gorshenev and Yu. M. Pis'mak. Punctuated equilibrium in software evolution. Phys. Rev. E, 70(6):067103, 2004.CrossRefGoogle Scholar
  7. 7.
  8. 8.
    Software Maintenance Costs and references therein,∼koskinen/smcosts.htm.
  9. 9.
    S. Valverde, R. Ferrer Cancho, and R. V. Solé. Scale-free networks from optimal design. Europhys. Lett., 60(4):512–517, 2002.CrossRefGoogle Scholar
  10. 10.
    C. R. Myers. Software systems as complex networks: Structure, function, and evolvability of software collaboration graphs. Phys. Rev. E, 68:046116, 2003.CrossRefGoogle Scholar
  11. 11.
    A. MacCormack, J. Rusnak, and C. Y. Baldwin. Exploring the structure of complex software designs: An empirical study of open source and proprietary code. Management Science, 52(7), 2006.Google Scholar
  12. 12.
    Z. M. Saul, V. Filkov, P. T. Devanbu, and C. Bird. Recommending random walks. In Proceedings ESEC/SIGSOFT FSE, pages 15–24, 2007.Google Scholar
  13. 13.
  14. 14.
    S. B. Seidman. Network structure and minimum degree. Social Networks, 5:269–287, 1983.CrossRefMathSciNetGoogle Scholar
  15. 15.
    B. Bollobas. The evolution of sparse graphs. In Graph Theory and Combinatorics, pages 35–57. Academic Press, New York, 1984.Google Scholar
  16. 16.
  17. 17.
    S. Valverde and R. V. Solé. Hierarchical small worlds in software architecture. In Dynamics of Continuous Discrete and Impulsive Systems: Series B; Applications and Algorithms, volume 14, pages 1–11, 2007.Google Scholar
  18. 18.
    G. Baxter, M. Frean, J. Noble, M. Rickerby, H. Smith, M. Visser, H. Melton, and E. Tempero. Understanding the shape of Java software. In OOPSLA '06: Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 397–412, New York, NY, USA, 2006. ACM.Google Scholar
  19. 19.
    J. N. Warfield. Binary matrices in system modeling. IEEE Transactions on Systems, Man, and Cybernetics, 3:441–449, 1973.CrossRefMATHGoogle Scholar
  20. 20.
    D. Sharman and A. Yassine. Characterizing complex product architectures. Systems Engineering Journal, 7(1), 2004.Google Scholar
  21. 21.
  22. 22.
    P. Erdós and A. Rényi. On random graphs. Publicationes Mathematicae, 6:290–297, 1959.Google Scholar
  23. 23.
    P. Erdós and A. Rényi. On the evolution of random graphs. Publ. Math. Inst. Hungar. Acad. Sci., 5(17), 1960.Google Scholar
  24. 24.
    M. Molloy and B. Reed. A critical point for random graphs with a given degree sequence. Random Struct. Alg., 6:161–179, 1995.CrossRefMATHMathSciNetGoogle Scholar
  25. 25.
    M. E. J. Newman, S. H. Strogatz, and D. J. Watts. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E, 64:026118, 2001.CrossRefGoogle Scholar
  26. 26.
    T. A. B. Snijders. Markov chain Monte Carlo estimation of exponential random graph models. Journal of Social Structure, 3(2), 2002.Google Scholar
  27. 27.
    C. J. Anderson, S. Wasserman, and B. Crouch. A p* primer: Logit models for social networks. Social Networks, 21:37–66, 1999.CrossRefGoogle Scholar
  28. 28.
    D. Kaplan. The Sage Handbook of Quantitative Methodology for the Social Sciences. Sage Publications Inc., London, 2004.Google Scholar
  29. 29.
    C. Infante-Rivard, C. R. Weinberg, and M. Guiguet. Xenobiotic-metabolizing genes and small-for-gestational-age births: Interaction with maternal smoking. Epidemiology, 17(1):38–46, 2006.CrossRefGoogle Scholar
  30. 30.
    The R Project for Statistical Computing,
  31. 31.
    M. S. Handcock, D. R. Hunter, C. T. Butts, S. M. Goodreau, and M. Morris. statnet: An r package for the statistical modeling of social networks, 2003.
  32. 32.
    T. A. B. Snijders, P. E. Pattison, G. L. Robins, and M. S. Handcock. New specifications for exponential random graph models. Sociological Methodology, 99–153 2006.Google Scholar
  33. 33.
    S. Konishi and G. Kitagawa. Information Criteria and Statistical Modeling. Springer, New York, 2008.Google Scholar
  34. 34.
    R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: Simple building blocks of complex networks. Science, 298:824–827, 2002.CrossRefGoogle Scholar
  35. 35.
    S. Valverde and R. V. Solé. Network motifs in computational graphs: A case study in software architecture. Phys. Rev. E, 72:026107, 2005.CrossRefGoogle Scholar
  36. 36.
    D. R. Hunter and M. S. Handcock. Inference in curved exponential family models for networks. Technical report, Penn State Department of Statistics, 2004. Available from

Copyright information

© Birkhäuser Boston, a part of Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.University of CaliforniaDavis CAUSA

Personalised recommendations