A lexical analyser, also called a lexer or scanner, will as its input take a string of individual letters and divide this string into word-like entities called tokens. Additionally, it will filter out whatever separates the tokens (the so-called white-space), i.e., lay-out characters (spaces, newlines etc.) and comments. For lexical analysis, specifications are traditionally written using regular expressions: An algebraic notation for describing sets of strings. The generated lexers are in a class of extremely simple programs called finite automata. This chapter will describe regular expressions and finite automata, their properties and how regular expressions can be converted to finite automata. Finally, we discuss some practical aspects of lexer generators.
KeywordsRegular Expression Token Type Regular Language Finite Automaton Deterministic Finite Automaton
- 2.Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers; Principles, Techniques and Tools. Addison-Wesley, Reading (2007) Google Scholar
- 3.Appel, A.W.: Modern Compiler Implementation in ML. Cambridge University Press, Cambridge (1998) Google Scholar
- 7.Lesk, M.E.: Lex: a lexical analyzer generator. Tech. Rep. 39, AT&T Bell Laboratories, Murray Hill, N.J. (1975) Google Scholar
- 11.Paxson, V.: Flex, version 2.5, a fast scanner generator (1995). http://www.gnu.org/software/flex/manual/html_mono/flex.html