Simplifying Regular Expressions

A Quantitative Perspective
  • Hermann Gruber
  • Stefan Gulan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6031)


We consider the efficient simplification of regular expressions and suggest a quantitative comparison of heuristics for simplifying regular expressions. To this end, we propose a new normal form for regular expressions, which outperforms previous heuristics while still being computable in linear time. This allows us to determine an exact bound for the relation between the two prevalent measures for regular expression - size: alphabetic width and reverse polish notation length. In addition, we show that every regular expression of alphabetic width n can be converted into a nondeterministic finite automaton with ε-transitions of size at most \(4\frac25n+1\), and prove this bound to be optimal. This answers a question posed by Ilie and Yu, who had obtained lower and upper bounds of 4n − 1 and \(9n-\frac12\), respectively [15]. For reverse polish notation length as input size measure, an optimal bound was recently determined by Gulan and Fernau [14]. We prove that, under mild restrictions, their construction is also optimal when taking alphabetic width as input size measure.


Normal Form Regular Expression Unary Operator Regular Language Empty Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aceto, L., Fokkink, W., Ingólfsdóttir, A.: On a question of A. Salomaa: the equational theory of regular expressions over a singleton alphabet is not finitely axiomatizable. Theoretical Computer Science 209(1), 163–178 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Baader, F., Nipkow, T.: Term Rewriting and All That. Cambridge University Press, Cambridge (1998)Google Scholar
  3. 3.
    Bille, P., Thorup, M.: Faster regular expression matching. In: ICALP 2009. LNCS, vol. 5555, pp. 171–182. Springer, Heidelberg (2009)Google Scholar
  4. 4.
    Brüggemann-Klein, A.: Regular expressions into finite automata. Theoretical Computer Science 120(2), 197–213 (1993)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Caron, P., Champarnaud, J.M., Mignot, L.: Multi-tilde operators and their Glushkov automata. In: Dediu, A.H., Ionescu, A.M., Martín-Vide, C. (eds.) LATA 2009. LNCS, vol. 5457, pp. 290–301. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  6. 6.
    Champarnaud, J.M., Ouardi, F., Ziadi, D.: Normalized expressions and finite automata. International Journal of Algebra and Computation 17(1), 141–154 (2007)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Conway, J.H.: Regular Algebra and Finite Machines. Chapman and Hall, Boca Raton (1971)zbMATHGoogle Scholar
  8. 8.
    Ellul, K., Krawetz, B., Shallit, J., Wang, M.: Regular expressions: New results and open problems. Journal of Automata, Languages and Combinatorics 10(4), 407–437 (2005)zbMATHMathSciNetGoogle Scholar
  9. 9.
    Frishert, M., Cleophas, L.G., Watson, B.W.: The effect of rewriting regular expression on their accepting automata. In: Ibarra, O.H., Dang, Z. (eds.) CIAA 2003. LNCS, vol. 2759, pp. 304–305. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  10. 10.
    Gelade, W., Martens, W., Neven, F.: Optimizing schema languages for XML: Numerical constraints and interleaving. SIAM Journal on Computing 38(5), 2021–2043 (2009)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Gelade, W., Neven, F.: Succinctness of the complement and intersection of regular expressions. In: Symposium on Theoretical Aspects of Computer Science. Number 08001 in Dagstuhl Seminar Proceedings, pp. 325–336 (2008)Google Scholar
  12. 12.
    Gruber, H., Holzer, M.: Finite automata, digraph connectivity, and regular expression size. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008, Part II. LNCS, vol. 5126, pp. 39–50. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  13. 13.
    Gruber, H., Johannsen, J.: Optimal lower bounds on regular expression size using communication complexity. In: Amadio, R.M. (ed.) FOSSACS 2008. LNCS, vol. 4962, pp. 273–286. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Gulan, S., Fernau, H.: An optimal construction of finite automata from regular expressions. In: FSTTCS 2008. Number 08004 in Dagstuhl Seminar Proceedings, pp. 211–222 (2008)Google Scholar
  15. 15.
    Ilie, L., Yu, S.: Follow automata. Information and Computation 186(1), 140–162 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Lee, J., Shallit, J.: Enumerating regular expressions and their languages. In: Domaratzki, M., Okhotin, A., Salomaa, K., Yu, S. (eds.) CIAA 2004. LNCS, vol. 3317, pp. 2–22. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    Meyer, A.R., Stockmeyer, L.J.: The equivalence problem for regular expressions with squaring requires exponential space. In: FOCS 1972, pp. 125–129. IEEE Computer Society, Los Alamitos (1972)Google Scholar
  18. 18.
    Newman, M.: On theories with a combinatorial definition of “equivalence”. Annals of Mathematics 43(2), 223–243 (1942)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Thompson, K.: Regular expression search algorithm. Communications of the ACM 11(6), 419–422 (1968)zbMATHCrossRefGoogle Scholar
  20. 20.
    Wood, D.: Theory of Computation. John Wiley & Sons, Inc., Chichester (1987)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Hermann Gruber
    • 1
  • Stefan Gulan
    • 2
  1. 1.Institut für InformatikUniversität GießenGießenGermany
  2. 2.Fachbereich IV—InformatikUniversität TrierTrierGermany

Personalised recommendations