Advertisement

A Benchmark Production Tool for Regular Expressions

  • Angelo Borsotti
  • Luca BreveglieriEmail author
  • Stefano Crespi Reghizzi
  • Angelo Morzenti
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11601)

Abstract

We describe a new tool, named REgen, that generates regular expressions (RE) to be used as test cases, and that generates also synthetic benchmarks for exercising and measuring the performance of RE-based software libraries and applications. Each group of REs is randomly generated and satisfies a user-specified set of constraints, such as length, nesting depth, operator arity, repetition depth, and syntax tree balancing. In addition to such parameters, other features are chosen by the tool. An RE group may include REs that are ambiguous, or that define the same regular language but differ with respect to their syntactic structure. A benchmark is a collection of RE groups that have a user-specified numerosity and distribution, together with a representative sample of texts for each RE in the collection. We present two generation algorithms for RE groups and for benchmarks. Experimental results are reported for a large benchmark we used to compare the performance of different RE parsing algorithms. The tool REgen and the RE benchmark are publicly available and fill a gap in supporting tools for the development and evaluation of RE applications.

Keywords

Regular expression generation Benchmark for regular expressions Regular expression tool 

References

  1. 1.
    Borsotti, A., Breveglieri, L., Crespi Reghizzi, S., Morzenti, A.: From ambiguous regular expressions to deterministic parsing automata. In: Drewes, F. (ed.) CIAA 2015. LNCS, vol. 9223, pp. 35–48. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-22360-5_4CrossRefzbMATHGoogle Scholar
  2. 2.
    Câmpeanu, C., Salomaa, K., Yu, S.: Regex and extended regex. In: Champarnaud, J.-M., Maurel, D. (eds.) CIAA 2002. LNCS, vol. 2608, pp. 77–84. Springer, Heidelberg (2003).  https://doi.org/10.1007/3-540-44977-9_7CrossRefzbMATHGoogle Scholar
  3. 3.
    Celentano, A., Crespi Reghizzi, S., Della Vigna, P., Ghezzi, C., Granata, G., Savoretti, F.: Compiler testing using a sentence generator. Softw. Pract. Exp. 10, 897–918 (1980).  https://doi.org/10.1002/spe.4380101104CrossRefGoogle Scholar
  4. 4.
    Héam, P.-C., Joly, J.-L.: On the uniform random generation of non deterministic automata up to isomorphism. In: Drewes, F. (ed.) CIAA 2015. LNCS, vol. 9223, pp. 140–152. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-22360-5_12CrossRefzbMATHGoogle Scholar
  5. 5.
    Lee, J., Shallit, J.: Enumerating regular expressions and their languages. In: Domaratzki, M., Okhotin, A., Salomaa, K., Yu, S. (eds.) CIAA 2004. LNCS, vol. 3317, pp. 2–22. Springer, Heidelberg (2005).  https://doi.org/10.1007/978-3-540-30500-2_2CrossRefzbMATHGoogle Scholar
  6. 6.
    Sulzmann, M., Lu, K.Z.M.: Derivative-based diagnosis of regular expression ambiguity. Int. J. Found. Comput. Sci. 28(5), 543–562 (2017)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Sutton, M., Greene, A., Amini, P.: Fuzzing: Brute Force Vulnerability Discovery. Addison-Wesley, Boston (2007)Google Scholar
  8. 8.
    Szilard, A., Yu, S., Zhang, K., Shallit, J.: Characterizing regular languages with polynomial densities. In: Havel, I.M., Koubek, V. (eds.) MFCS 1992. LNCS, vol. 629, pp. 494–503. Springer, Heidelberg (1992).  https://doi.org/10.1007/3-540-55808-X_48CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Angelo Borsotti
    • 1
  • Luca Breveglieri
    • 1
    Email author
  • Stefano Crespi Reghizzi
    • 1
    • 2
  • Angelo Morzenti
    • 1
  1. 1.Politecnico di MilanoMilanItaly
  2. 2.CNR-IEIITMilanItaly

Personalised recommendations