Skip to main content

Certified Derivative-Based Parsing of Regular Expressions

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9889))

Abstract

We describe the formalization of a certified algorithm for regular expression parsing based on Brzozowski derivatives, in the dependently typed language Idris. The formalized algorithm produces a proof that an input string matches a given regular expression or a proof that no matching exists. A tool for regular expression based search in the style of the well known GNU grep has been developed with the certified algorithm, and practical experiments were conducted with this tool.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Associativity, Commutativity and Idempotence with Unit elements axioms for REs [7].

  2. 2.

    In Haskell, types are classified using kinds [28] instead of universe levels. The kind of types is denoted by \(\star \) and type operators have functional kinds: \(\kappa \rightarrow \kappa '\), where \(\kappa \) and \(\kappa '\) are kinds. As an example, in Haskell, type Bool has kind \(\star \) and the list type constructor has kind \(\star \rightarrow \star \).

  3. 3.

    Also known as Curry-Howard “isomorphism” [30].

  4. 4.

    Readers who know type theory probably have noticed that this equality encoding corresponds to the so-called heterogeneous equality [23], which is used in the Idris Prelude. Detailed discussions about equality in type theory can be found in [32].

References

  1. Almeida, J.B., Moreira, N., Pereira, D., de Sousa, S.M.: Partial derivative automata formalized in Coq. In: Domaratzki, M., Salomaa, K. (eds.) CIAA 2010. LNCS, vol. 6482, pp. 59–68. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  2. Antimirov, V.: Partial derivatives of regular expressions and finite automaton constructions. Theor. Comput. Sci. 155(2), 291–319 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bernardy, J.-P., Jansson, P.: Certified context-free parsing: a formalisation of valiant’s algorithm in agda. CoRR, abs/1601.07724 (2016)

    Google Scholar 

  4. Bertot, Y., Castran, P.: Interactive Theorem Proving and Program Development: Coq’Art The Calculus of Inductive Constructions, 1st edn. Springer Publishing Company, Incorporated, Heidelberg (2010)

    Google Scholar 

  5. Brady, E.: Idris, a general-purpose dependently typed programming language: design and implementation. J. Funct. Program. 23, 552–593 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  6. Brady, E.: Programming and reasoning with algebraic effects and dependent types. In: Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming, ICFP 2013, pp. 133–144. ACM, New York (2013)

    Google Scholar 

  7. Brzozowski, J.A.: Derivatives of regular expressions. J. ACM 11(4), 481–494 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  8. Danielsson, N.A.: Total parser combinators. SIGPLAN Not. 45(9), 285–296 (2010)

    Article  MATH  Google Scholar 

  9. Felleisen, M.D., Barski, C., Van Horn, D.: Realm of Racket: Learn to Program, One Game at a Time! Eight Students of Northeastern University, San Francisco (2013)

    Google Scholar 

  10. Firsov, D., Uustalu, T.: Certified parsing of regular languages. In: Gonthier, G., Norrish, M. (eds.) CPP 2013. LNCS, vol. 8307, pp. 98–113. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  11. Firsov, D., Uustalu, T.: Certified CYK parsing of context-free languages. J. Log. Algebr. Meth. Program. 83(5–6), 459–468 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  12. Firsov, D., Uustalu, T.: Certified CYK parsing of context-free languages. J. Log. Algebr. Methods Program. 83(5–6), 459–468 (2014). The 24th Nordic Workshop on Programming Theory (NWPT 2012)

    Article  MathSciNet  MATH  Google Scholar 

  13. Fischer, S., Huch, F., Wilke, T.: A play on regular expressions: Functional pearl. In: Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming, ICFP 2010, pp. 357–368. ACM, New York (2010)

    Google Scholar 

  14. Frisch, A., Cardelli, L.: Greedy regular expression matching. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 618–629. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  15. GNU Grep home page. https://www.gnu.org/software/grep/

  16. Hopcroft, J.E., Motwani, R., Rotwani, U., Ullman, J.D.: Introduction to Automata Theory Languages and Computability, 2nd edn. Addison-Wesley Longman Publishing Co., Inc., Boston (2000)

    Google Scholar 

  17. The Idris Tutorial. http://docs.idris-lang.org/en/latest/tutorial/

  18. Jourdan, J.-H., Pottier, F., Leroy, X.: Validating LR(1) parsers. In: Seidl, H. (ed.) Programming Languages and Systems. LNCS, vol. 7211, pp. 397–416. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  19. Lesk, M.E., Schmidt, E.: Lex: a lexical analyzer generator. In: Unix, vol. ii. pp. 375–387. W. B. Saunders Company, Philadelphia, PA, USA (1990)

    Google Scholar 

  20. The Lightyear Idris Parsing Combinator Library. https://github.com/ziman/lightyear/

  21. Lopes, R., Ribeiro, R., Camarão, C.: Certified derivative based parsing of regular expressions — on-linerepository (2016). https://github.com/raulfpl/idrisregexp

  22. Macedo, H.D., Oliveira, J.N.: Typing linear algebra: a biproduct-oriented approach. CoRR, abs/1312.4818 (2013)

    Google Scholar 

  23. McBride, C.: Dependently Typed Functional Programs and their Proofs. Ph.D. thesis, Department of Informatics, University of Edinburgh (1999)

    Google Scholar 

  24. McBride, C., McKinna, J.: The view from the left. J. Funct. Program. 14(1), 69–111 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  25. Might, M., Darais, D., Spiewak, D.: Parsing with derivatives: a functional pearl. SIGPLAN Not. 46(9), 189–195 (2011)

    Article  MATH  Google Scholar 

  26. Norell, U.: Dependently typed programming in agda. In: Proceedings of the 4th International Workshop on Types in Language Design and Implementation, TLDI 2009, pp. 1–2. ACM, New York (2009)

    Google Scholar 

  27. Owens, S., Reppy, J., Turon, A.: Regular-expression derivatives re-examined. J. Funct. Program. 19(2), 173–190 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  28. Pierce, B.C.: Types and Programming Languages, 1st edn. The MIT Press, Cambridge (2002)

    MATH  Google Scholar 

  29. Google Regular Expression Library - re2. https://github.com/google/re2

  30. Sørensen, M.H., Urzyczyn, P.: Lectures on the Curry-Howard Isomorphism. Studies in Logic and the Foundations of Mathematics, vol. 149. Elsevier Science Inc., New York (2006)

    MATH  Google Scholar 

  31. Sulzmann, M., Lu, K.Z.M.: POSIX regular expression parsing with derivatives. In: Codish, M., Sumii, E. (eds.) FLOPS 2014. LNCS, vol. 8475, pp. 203–220. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  32. The Univalent Foundatiosn Program. Homotopy Type Theory: Univalent Foundations of Mathematics (2013). http://homotopytypetheory.org/book/

  33. Valiant, L.G.: General context-free recognition in less than cubic time. J. Comput. Syst. Sci. 10(2), 308–315 (1975)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The first author thanks Fundação de Amparo a Pesquisa de Minas Gerais (FAPEMIG) for financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rodrigo Ribeiro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Lopes, R., Ribeiro, R., Camarão, C. (2016). Certified Derivative-Based Parsing of Regular Expressions. In: Castor, F., Liu, Y. (eds) Programming Languages. SBLP 2016. Lecture Notes in Computer Science(), vol 9889. Springer, Cham. https://doi.org/10.1007/978-3-319-45279-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45279-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45278-4

  • Online ISBN: 978-3-319-45279-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics