Lexical Analysis

  • Torben Ægidius Mogensen
Part of the Undergraduate Topics in Computer Science book series (UTICS)


A lexical analyser, also called a lexer or scanner, will as its input take a string of individual letters and divide this string into word-like entities called tokens. Additionally, it will filter out whatever separates the tokens (the so-called white-space), i.e., lay-out characters (spaces, newlines etc.) and comments. For lexical analysis, specifications are traditionally written using regular expressions: An algebraic notation for describing sets of strings. The generated lexers are in a class of extremely simple programs called finite automata. This chapter will describe regular expressions and finite automata, their properties and how regular expressions can be converted to finite automata. Finally, we discuss some practical aspects of lexer generators.


Regular Expression Token Type Regular Language Finite Automaton Deterministic Finite Automaton 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading (1974) MATHGoogle Scholar
  2. 2.
    Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers; Principles, Techniques and Tools. Addison-Wesley, Reading (2007) Google Scholar
  3. 3.
    Appel, A.W.: Modern Compiler Implementation in ML. Cambridge University Press, Cambridge (1998) Google Scholar
  4. 4.
    Brzozowski, J.A.: Derivatives of regular expressions. Journal of the ACM 1(4), 481–494 (1964) MathSciNetCrossRefGoogle Scholar
  5. 5.
    Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages and Computation, 2nd edn. Addison-Wesley, Reading (2001) MATHGoogle Scholar
  6. 6.
    Keller, J.P., Paige, R.: Program derivation with verified transformations—a case study. Communications in Pure and Applied Mathematics 48(9–10), 1053–1113 (1996) MathSciNetGoogle Scholar
  7. 7.
    Lesk, M.E.: Lex: a lexical analyzer generator. Tech. Rep. 39, AT&T Bell Laboratories, Murray Hill, N.J. (1975) Google Scholar
  8. 8.
    McNaughton, R., Yamada, H.: Regular expressions and state graphs for automata. IEEE Transactions on Electronic Computers 9(1), 39–47 (1960) CrossRefGoogle Scholar
  9. 9.
    Milner, R.: Communication and Concurrency. Prentice-Hall, New York (1989) MATHGoogle Scholar
  10. 10.
    Owens, S., Reppy, J., Turon, A.: Regular-expression derivatives re-examined. J. Funct. Program. 19(2), 173–190 (2009). doi: 10.1017/S0956796808007090 MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Paxson, V.: Flex, version 2.5, a fast scanner generator (1995).

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of CopenhagenCopenhagenDenmark

Personalised recommendations