Empirical Software Engineering

, Volume 12, Issue 4, pp 359–388 | Cite as

Quantifying identifier quality: an analysis of trends

  • Dawn Lawrie
  • Henry Feild
  • David Binkley


Identifiers, which represent the defined concepts in a program, account for, by some measures, almost three quarters of source code. The makeup of identifiers plays a key role in how well they communicate these defined concepts. An empirical study of identifier quality based on almost 50 million lines of code, covering thirty years, four programming languages, and both open and proprietary source is presented. For the purposes of the study, identifier quality is conservatively defined as the possibility of constructing the identifier out of dictionary words or known abbreviations. Four hypotheses related to identifier quality are considered using linear mixed effect regression models. For example, the first hypothesis is that modern programs include higher quality identifiers than older ones. In this case, the results show that better programming practices are producing higher quality identifies. Results also confirm some commonly held beliefs, such as proprietary code having more acronyms than open source code.


Software quality characterizations Program analysis Source code 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Anquetil N, Lethbridge T (1998a) Extracting concepts from file names; a new file clustering criterion. In: 20th IEEE International Conference and Software Engineering (ICSE 1998), Kyoto, Japan. IEEE Computer Society Press, Los Alamitos, CA, pp 84–93, AprilGoogle Scholar
  2. Anquetil N, Lethbridge T (1998b) Assessing the relevance of identifier names in a legacy software system. In: Proceedings of the 1998 conference of the centre for advanced studies on collaborative research, Toronto, Ontario, Canada, NovemberGoogle Scholar
  3. Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10):970–983, OctoberCrossRefGoogle Scholar
  4. Caprile B, Tonella P (1999) Nomen est omen: analyzing the language of function identifiers. In: Working conference on reverse engineering, Altanta, GA, OctoberGoogle Scholar
  5. Caprile B, Tonella P (2000) Restructuring program identifier names. In: Proc. of ICSM’2000, International conference on software maintenance, pp 97–107, San Jose, CA, 11–14 October, 2000Google Scholar
  6. Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New YorkMATHGoogle Scholar
  7. David A (2005) Wheeler. SLOC count user’s guide,
  8. Deißenböck F, Pizka M (2005) Concise and consistent naming. In: Proceedings of the 13th international workshop on program comprehension (IWPC 2005). IEEE Computer Society, St. Louis, MO, MayGoogle Scholar
  9. Jones D (2004) Memory for a short sequence of assignment statements. C Vu 16(6):15–19, DecemberGoogle Scholar
  10. Kawaguchi S, Garg PK, Matsushita MM, Inoue K (2003) Automatic categorization algorithm for evolvable software archive. In: Proceedings of international workshop on principles of software evolution, Helsinki, Finland, SeptemberGoogle Scholar
  11. Knuth D (2003) Selected papers on computer languages. In: Center for the Study of Language and Information (CSLI Lecture Notes, no. 139), Stanford, CAGoogle Scholar
  12. Lawrie D (2003) Language models for hierarchical summarization. PhD thesis, University of Massachusetts AmherstGoogle Scholar
  13. Lawrie D, Morrell C, Feild H, Binkley D (2006) What’s in a name? A study of identifiers. In: 14th International conference on program comprehension, Athens, Greece, pp 3–12Google Scholar
  14. Maarek YS, Berry DM, Kaiser GE (1991) An information retrieval approach for automatically constructing software libraries. IEEE Trans Softw Eng 17(8):800–813CrossRefGoogle Scholar
  15. Marcus A, Maletic J (2001) Identification of high-level concept clones in source code. In: Proceedings of automated software engineering, San Diego, CA, NovemberGoogle Scholar
  16. Marcus A, Sergeyev A, Rajlich V, Maletic J (2004) An information retrieval approach to concept location in source code. In: IEEE working conference on reverse engineering, Delft, The Netherlands, NovemberGoogle Scholar
  17. McMahon JG, Smith JF (1998) A review of statistical language processing techniques. Artif Intell Rev 12(5):347–391CrossRefGoogle Scholar
  18. Moonen Leon (2001) Generating robust parsers using island grammars. In: Proceedings of the 8th working conference on reverse engineering, October 2001. IEEE Computer Society Press, Los Alamitos, CA, pp 13–22Google Scholar
  19. Morrell C, Pearson J, Brant L (1997) Linear transformation of linear mixed effects models. Am Stat 51:338–343CrossRefGoogle Scholar
  20. Ratiu D, Deissenboeck F (2006) Programs are knowledge bases. In: 14th IEEE International Conference on Program Comprehension, pp 79–83, 14-16 June 2006Google Scholar
  21. Rilling J, Klemola T (2003) Identifying comprehension bottlenecks using program slicing and cognitive complexity metrics. In: Proceedings of the 11th IEEE international workshop on program comprehension, Portland, OR, MayGoogle Scholar
  22. Sjøberg D, Hannay J, Hansen O, Kampenes V, Karahasanovic A, Liborg N, Rekdal A (1993) A survey of controlled experiments in software engineering. IEEE Trans Softw Eng 19(4)Google Scholar
  23. Sneed H (1996) Object-oriented cobol recycling. In: Proceedings of 3rd IEEE working conference on reverse engineering, Monterey, CA. IEEE Computer Soceity, Los Alamitos, CA, pp 169–178Google Scholar
  24. Takang A, Grubb P, Macredie R (1996) The effects of comments and identifier names on program comprehensibility: an experiential study. J Program Lang 4(3):143–167Google Scholar
  25. Verbeke G, Molenberghs G (2001) Linear mixed models for longitudinal data, 2nd edn. Springer, Berlin Heidelberg New YorkGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2006

Authors and Affiliations

  1. 1.Loyola College in MarylandBaltimoreUSA

Personalised recommendations