Skip to main content

The Babel of Software Development: Linguistic Diversity in Open Source

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8238))

Abstract

Open source software (OSS) development communities are typically very specialised, on the one hand, and experience high turnover, on the other. Combination of specialization and turnover can cause parts of the system implemented in a certain programming language to become unmaintainable, if knowledge of that language has disappeared together with the retiring developers.

Inspired by measures of linguistic diversity from the study of natural languages, we propose a method to quantify the risk of not having maintainers for code implemented in a certain programming language. To illustrate our approach, we studied risks associated with different languages in Emacs, and found examples of low risk due to high popularity (e.g., C, Emacs Lisp); low risk due to similarity with popular languages (e.g., C++, Java, Python); or high risk due to both low popularity and low similarity with popular languages (e.g., Lex). Our results show that methods from the social sciences can be successfully applied in the study of information systems, and open numerous avenues for future research.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Very Large Data Bases, pp. 487–499. Morgan Kaufmann (1994)

    Google Scholar 

  2. Brijs, T., Vanhoof, K., Wets, G.: Defining interestingness for association rules. Information Theories & Applications 10(4), 370–375 (2003)

    Google Scholar 

  3. Capiluppi, A., Serebrenik, A., Youssef, A.: Developing an h-index for OSS developers. In: Lanza, M., Di Penta, M., Xi, T. (eds.) MSR, pp. 251–254. IEEE (2012)

    Google Scholar 

  4. Delorey, D., Knutson, C., Giraud-Carrier, C.: Programming language trends in open source development: An evaluation using data from all production phase Sourceforge projects. In: WoPDaSD (2007)

    Google Scholar 

  5. Doyle, J.R., Stretch, D.D.: The classification of programming languages by usage. Man-Machine Studies 26(3), 343–360 (1987)

    Article  Google Scholar 

  6. Ducheneaut, N.: Socialization in an open source software community: A socio-technical analysis. Computer Supported Cooperative Work 14(4), 323–368 (2005)

    Article  Google Scholar 

  7. Fearon, J.D.: Ethnic and cultural diversity by country. J. Econ. Growth 8(2), 195–222 (2003)

    Article  Google Scholar 

  8. Gelernter, D., Jagannathan, S.: Programming linguistics. MIT Press (1990)

    Google Scholar 

  9. Giuri, P., Ploner, M., Rullani, F., Torrisi, S.: Skills, division of labor and performance in collective inventions: Evidence from open source software. International Journal of Industrial Organization 28(1), 54–68 (2010)

    Article  Google Scholar 

  10. Goeminne, M., Mens, T.: Evidence for the Pareto principle in Open Source Software Activity. In: SQM. CEUR-WS workshop proceedings (2011)

    Google Scholar 

  11. Greenberg, J.: The measurement of linguistic diversity. Language 32(1), 109–115 (1956)

    Article  Google Scholar 

  12. Handel, Z.: What is Sino-Tibetan? Snapshot of a field and a language family in flux. Language and Linguistics Compass 2(3), 422–441 (2008)

    Article  Google Scholar 

  13. Heggarty, P.: Beyond lexicostatistics: How to get more out of “word lis” comparisons. Diachronica 27(2), 301–324 (2010)

    Article  Google Scholar 

  14. Hemetsberger, A., Reinhardt, C.: Learning and knowledge-building in open-source communities a social-experiential approach. Management Learning 37(2), 187–214 (2006)

    Article  Google Scholar 

  15. Jepsen, T.C.: Just what is an ontology, anyway? IT Professional 11(5), 22–27 (2009)

    Article  Google Scholar 

  16. Jones, C., Jones, T.: Estimating software costs, vol. 3. McGraw-Hill (1998)

    Google Scholar 

  17. Jones, C.: Applied Software Measurement: Global Analysis of Productivity and Quality. McGraw-Hill (2008)

    Google Scholar 

  18. Karus, S., Gall, H.: A study of language usage evolution in open source software. In: MSR, pp. 13–22. ACM (2011)

    Google Scholar 

  19. Kouters, E., Vasilescu, B., Serebrenik, A., van den Brand, M.G.J.: Who’s who in Gnome: Using LSA to merge software repository identities. In: ICSM, pp. 592–595. IEEE (2012)

    Google Scholar 

  20. Moberg, J., Gooskens, C., Nerbonne, J., Vaillette, N.: Conditional entropy measures intelligibility among related languages. In: Proceedings of Computational Linguistics in the Netherlands, pp. 51–66 (2007)

    Google Scholar 

  21. Mordal, K., Anquetil, N., Laval, J., Serebrenik, A., Vasilescu, B., Ducasse, S.: Software quality metrics aggregation in industry. Software: Evolution and Process (2012)

    Google Scholar 

  22. Nakakoji, K., Yamamoto, Y., Nishinaka, Y., Kishida, K., Ye, Y.: Evolution patterns of open-source software systems and communities. In: IWPSE, pp. 76–85. ACM (2002)

    Google Scholar 

  23. Neumann, D.E.: An enhanced neural network technique for software risk analysis. IEEE Trans. Softw. Eng 28(9), 904–912 (2002)

    Article  Google Scholar 

  24. Patil, G.P., Taillie, C.: Diversity as a concept and its measurement. Journal of the American Statistical Association 77(379), 548–561 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  25. Poncin, W., Serebrenik, A., van den Brand, M.G.J.: Process mining software repositories. In: CSMR, pp. 5–14. IEEE (2011)

    Google Scholar 

  26. Posnett, D., D’Souza, R., Devanbu, P., Filkov, V.: Dual ecological measures of focus in software development. In: ICSE, pp. 452–461. IEEE (2013)

    Google Scholar 

  27. Rechenberg, P.: Programming languages as thought models. Struct. Program. 11(3), 105–116 (1990)

    Google Scholar 

  28. Robles, G., González-Barahona, J.M.: Contributor turnover in libre software projects. In: Damiani, E., Fitzgerald, B., Scacchi, W., Scotto, M., Succi, G. (eds.) Open Source Systems, vol. 203, pp. 273–286. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  29. Robles, G., González-Barahona, J.M., Merelo, J.J.: Beyond source code: the importance of other artifacts in software development (a case study). Journal of Systems and Software 79(9), 1233–1248 (2006)

    Article  Google Scholar 

  30. Schildt, H.: C/C++ Programmer’s Reference, 2nd edn. McGraw-Hill (2000)

    Google Scholar 

  31. Serebrenik, A., van den Brand, M.G.J.: Theil Index for Aggregation of Software Metrics Values. In: ICSM, pp. 1–9. IEEE (2010)

    Google Scholar 

  32. Stallman, R.M.: EMACS the extensible, customizable self-documenting display editor. SIGPLAN Not 16(6), 147–156 (1981)

    Article  Google Scholar 

  33. Swadesh, M., Sherzer, J., Hymes, D.: The Origin and Diversification of Language. Adeline Transaction (1971)

    Google Scholar 

  34. Vasilescu, B., Filkov, V., Serebrenik, A.: StackOverflow and GitHub: associations between software development and crowdsourced knowledge. In: SocialCom, pp. 188–195. ASE/IEEE (accepted 2013)

    Google Scholar 

  35. Vasilescu, B., Serebrenik, A., van den Brand, M.G.J.: You can’t control the unfamiliar: A study on the relations between aggregation techniques for software metrics. In: ICSM, pp. 313–322. IEEE (2011)

    Google Scholar 

  36. Vasilescu, B., Serebrenik, A., Devanbu, P., Filkov, V.: How social Q&A sites are changing knowledge sharing in Open Source software communities. In: CSCW. ACM (accepted 2014)

    Google Scholar 

  37. Vasilescu, B., Serebrenik, A., Goeminne, M., Mens, T.: On the variation and specialisation of workload–A case study of the Gnome ecosystem community. In: Empirical Software Engineering, pp. 1–54 (2013)

    Google Scholar 

  38. Watt, D.A., Findlay, W.: Programming language design concepts. Wiley (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Vasilescu, B., Serebrenik, A., van den Brand, M.G.J. (2013). The Babel of Software Development: Linguistic Diversity in Open Source. In: Jatowt, A., et al. Social Informatics. SocInfo 2013. Lecture Notes in Computer Science, vol 8238. Springer, Cham. https://doi.org/10.1007/978-3-319-03260-3_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03260-3_34

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03259-7

  • Online ISBN: 978-3-319-03260-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics