Abstract
Lexical analysis has many applications beyond the first phase of compilation in programming language processing. We argue that extended regular expressions combined with the ability to extract submatch information significantly increase the expressiveness of lexer specifications. We show that such an expressive lexical analysis can be done efficiently using some novel automata-based methods. The approach has been implemented in an ML lexer tool which is compatible with ocamllex. Experimental results confirm that our approach is competitive with respect to existing ML lexer tools.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools, 2nd edn. Addison-Wesley Longman Publishing Co., Inc., Boston (2006)
Antimirov, V.M.: Partial derivatives of regular expressions and finite automaton constructions. Theoretical Computer Science 155(2), 291–319 (1996)
Brzozowski, J.A.: Derivatives of regular expressions. J. ACM 11(4), 481–494 (1964)
Caron, P., Champarnaud, J.-M., Mignot, L.: Partial derivatives of an extended regular expression. In: Dediu, A.-H., Inenaga, S., MartÃn-Vide, C. (eds.) LATA 2011. LNCS, vol. 6638, pp. 179–191. Springer, Heidelberg (2011)
Russ Cox. re2 – an efficient, principled regular expression library, http://code.google.com/p/re2/
Cox, R.: Regular expression matching can be simple and fast (but is slow in java, perl, php, python, ruby,...) (2007), http://swtch.com/~rsc/regexp/regexp1.html
Laurikari, V.: NFAs with tagged transitions, their conversion to deterministic automata and application to regular expressions. In: SPIRE, pp. 181–187 (2000)
ocamllex, http://caml.inria.fr/pub/docs/manual-ocaml-4.00/manual026.html
Owens, S., Reppy, J., Turon, A.: Regular-expression derivatives reexamined. Journal of Functional Programming 19(2), 173–190 (2009)
PCRE - Perl Compatible Regular Expressions, http://www.pcre.org/
re2ml: Code-based replacement for ocamllex without submatching support, https://github.com/pippijn/re2ml
Standard ML of New Jersey, http://www.smlnj.org/
Sulzmann, M., Lu, K.Z.M.: Regular expression sub-matching using partial derivatives. In: Proc. of PPDP 2012, pp. 79–90. ACM (2012)
Thompson, K.: Programming techniques: Regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sulzmann, M., van Steenhoven, P. (2014). A Flexible and Efficient ML Lexer Tool Based on Extended Regular Expression Submatching. In: Cohen, A. (eds) Compiler Construction. CC 2014. Lecture Notes in Computer Science, vol 8409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54807-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-54807-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54806-2
Online ISBN: 978-3-642-54807-9
eBook Packages: Computer ScienceComputer Science (R0)