Reporting exact and approximate regular expression matches

  • Eugene W. Myers
  • Paulo Oliva
  • Katia Guimarães
Session II
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1448)


While much work has been done on determining if a document or a line of a document contains an exact or approximate match to a regular expression, less effort has been expended in formulating and determining what to report as “the match” once such a “hit” is detected. For exact regular expression pattern matching, we give algorithms for finding a longest match, all symbols involved in some match, and finding optimal submatches to tagged parts of a pattern. For approximate regular expression matching, we develop notions of what constitutes a significant match, give algorithms for them, and also for finding a longest match and all symbols in a match.


Regular Expression Edit Graph Approximate Match Space Algorithm Longe Match 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [CC97]
    C.A. Clarke and G.V. Cormack. On the use of regular expressons for searching text. ACM Trans. on Prog. Languages and Systems, 19(3):413–426, 1997.Google Scholar
  2. [IEE92]
    IEEE. Portable Operating System Interface (POSIX). IEEE Std 1003.2, Inst. of EE Engineers, New York, 1992.Google Scholar
  3. [MJ96]
    E. Myers and M. Jain. Going against the grain. In Carleton University Press, editor, Proc. 3rd South American Workshop on String Processing, International Informatics Series #4, pages 203–213, 1996.Google Scholar
  4. [MM88]
    E. Myers and W. Miller. Approximate matching of regular expressions. Bulletin of Mathematical Biology, 51(1):5–37, 1988.Google Scholar
  5. [Ous94]
    J.K. Ousterhout. Tcl and the TK Toolkit. Addison-Wesley, Reading, Mass., 1994.Google Scholar
  6. [Sed83]
    R. Sedgewick. Algorithms. Addison-Wesley, Reading, Mass., 1983.Google Scholar
  7. [Se184]
    P.H. Sellers. Pattern recognition in genetic sequences by mismatch density. Bulletin of Mathematical Biology, 46:501–514, 1984.Google Scholar
  8. [SW81]
    T.F. Smith and M.S. Waterman. Identification of common molecular sequence. J. of Molecular Biology, 147:195–197, 1981.Google Scholar
  9. [Tho68]
    K. Thompson. Regular expression search algorithm. Comm. of ACM, 11(6):419–422, 1968.Google Scholar
  10. [WS91]
    L. Wall and R.L. Schwartz. Programming Perl. O'Reilly and Associates, Sebastopol, Calif., 1991.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Eugene W. Myers
    • 1
  • Paulo Oliva
    • 2
  • Katia Guimarães
    • 2
  1. 1.Dept. of Computer ScienceUniversity of Arizona Tucson
  2. 2.Dept. of InformaticsFederal University of PernambucoRecifeBrazil

Personalised recommendations