A Benchmark Production Tool for Regular Expressions

  • Angelo Borsotti
  • Luca BreveglieriEmail author
  • Stefano Crespi Reghizzi
  • Angelo Morzenti
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11601)


We describe a new tool, named REgen, that generates regular expressions (RE) to be used as test cases, and that generates also synthetic benchmarks for exercising and measuring the performance of RE-based software libraries and applications. Each group of REs is randomly generated and satisfies a user-specified set of constraints, such as length, nesting depth, operator arity, repetition depth, and syntax tree balancing. In addition to such parameters, other features are chosen by the tool. An RE group may include REs that are ambiguous, or that define the same regular language but differ with respect to their syntactic structure. A benchmark is a collection of RE groups that have a user-specified numerosity and distribution, together with a representative sample of texts for each RE in the collection. We present two generation algorithms for RE groups and for benchmarks. Experimental results are reported for a large benchmark we used to compare the performance of different RE parsing algorithms. The tool REgen and the RE benchmark are publicly available and fill a gap in supporting tools for the development and evaluation of RE applications.


Regular expression generation Benchmark for regular expressions Regular expression tool 


  1. 1.
    Borsotti, A., Breveglieri, L., Crespi Reghizzi, S., Morzenti, A.: From ambiguous regular expressions to deterministic parsing automata. In: Drewes, F. (ed.) CIAA 2015. LNCS, vol. 9223, pp. 35–48. Springer, Cham (2015). Scholar
  2. 2.
    Câmpeanu, C., Salomaa, K., Yu, S.: Regex and extended regex. In: Champarnaud, J.-M., Maurel, D. (eds.) CIAA 2002. LNCS, vol. 2608, pp. 77–84. Springer, Heidelberg (2003). Scholar
  3. 3.
    Celentano, A., Crespi Reghizzi, S., Della Vigna, P., Ghezzi, C., Granata, G., Savoretti, F.: Compiler testing using a sentence generator. Softw. Pract. Exp. 10, 897–918 (1980). Scholar
  4. 4.
    Héam, P.-C., Joly, J.-L.: On the uniform random generation of non deterministic automata up to isomorphism. In: Drewes, F. (ed.) CIAA 2015. LNCS, vol. 9223, pp. 140–152. Springer, Cham (2015). Scholar
  5. 5.
    Lee, J., Shallit, J.: Enumerating regular expressions and their languages. In: Domaratzki, M., Okhotin, A., Salomaa, K., Yu, S. (eds.) CIAA 2004. LNCS, vol. 3317, pp. 2–22. Springer, Heidelberg (2005). Scholar
  6. 6.
    Sulzmann, M., Lu, K.Z.M.: Derivative-based diagnosis of regular expression ambiguity. Int. J. Found. Comput. Sci. 28(5), 543–562 (2017)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Sutton, M., Greene, A., Amini, P.: Fuzzing: Brute Force Vulnerability Discovery. Addison-Wesley, Boston (2007)Google Scholar
  8. 8.
    Szilard, A., Yu, S., Zhang, K., Shallit, J.: Characterizing regular languages with polynomial densities. In: Havel, I.M., Koubek, V. (eds.) MFCS 1992. LNCS, vol. 629, pp. 494–503. Springer, Heidelberg (1992). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Angelo Borsotti
    • 1
  • Luca Breveglieri
    • 1
    Email author
  • Stefano Crespi Reghizzi
    • 1
    • 2
  • Angelo Morzenti
    • 1
  1. 1.Politecnico di MilanoMilanItaly
  2. 2.CNR-IEIITMilanItaly

Personalised recommendations