Skip to main content

Unambiguity of extended regular expressions in SGML document grammars

  • Conference paper
  • First Online:
Algorithms—ESA '93 (ESA 1993)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 726))

Included in the following conference series:

Abstract

In the Standard Generalized Markup Language (SGML), document types are defined by context-free grammars in an extended Backus-Naur form. The right-hand side of a production is called a content model. Content models are extended regular expressions that have to be unambiguous in the sense that “an element ... that occurs in the document instance must be able to satisfy only one primitive content token without looking ahead in the document instance.” In this paper, we present a linear-time algorithm that decides whether a given content model is unambiguous.

A similar result has previously been obtained not for content models but for the smaller class of standard regular expressions. It relies on the fact that the languages of marked regular expressions are local — a property that does not hold any more for content models that contain the new &-operator. Therefore, it is necessary to develop new techniques for content models.

Besides solving an interesting problem in formal language theory, our results are relevant for developers of SGML systems. In fact, our definitions are causing changes to the revised edition of the SGML standard, and the algorithm to test content models for unambiguity has been implemented in an SGML parser.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Barron. Why use SGML? Electronic Publishing —Origination, Dissemination and Design, 2(1):3–24, April 1989.

    Google Scholar 

  2. A. Brüggemann-Klein. Regular expressions into finite automata. In I. Simon, editor, Latin '92, pages 87–98, Berlin, 1992. Springer-Verlag. Lecture Notes in Computer Science 583.

    Google Scholar 

  3. A. Brüggemann-Klein. Regular expressions into finite automata, 1992. To appear in Theoretical Computer Science.

    Google Scholar 

  4. A. Brüggemann-Klein. Formal models in document processing. Habilitationsschrift. Submitted to the Faculty of Mathematics at the University of Freiburg, 1993.

    Google Scholar 

  5. A. Brüggemann-Klein and D. Wood. Deterministic regular languages. In A. Finkel and M. Jantzen, editors, STACS 92, pages 173–184, Berlin, 1992. Springer-Verlag. Lecture Notes in Computer Science 577.

    Google Scholar 

  6. J. Clark, 1992. Source code for SGMLS. Available by anonymous ftp from ftp.uu.net and sgml1.ex.ac.uk.

    Google Scholar 

  7. C. F. Goldfarb. The SGML Handbook. Clarendon Press, Oxford, 1990.

    Google Scholar 

  8. ISO 8879: Information processing—Text and office systems—Standard Generalized Markup Language (SGML), October 1986. International Organization for Standardization.

    Google Scholar 

  9. J.-E. Pin. Local languages and the Berry-Sethi algorithm. Unpublished Manuscript, 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Thomas Lengauer

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Brüggemann-Klein, A. (1993). Unambiguity of extended regular expressions in SGML document grammars. In: Lengauer, T. (eds) Algorithms—ESA '93. ESA 1993. Lecture Notes in Computer Science, vol 726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57273-2_45

Download citation

  • DOI: https://doi.org/10.1007/3-540-57273-2_45

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-57273-2

  • Online ISBN: 978-3-540-48032-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics