Abstract
In the Standard Generalized Markup Language (SGML), document types are defined by context-free grammars in an extended Backus-Naur form. The right-hand side of a production is called a content model. Content models are extended regular expressions that have to be unambiguous in the sense that “an element ... that occurs in the document instance must be able to satisfy only one primitive content token without looking ahead in the document instance.” In this paper, we present a linear-time algorithm that decides whether a given content model is unambiguous.
A similar result has previously been obtained not for content models but for the smaller class of standard regular expressions. It relies on the fact that the languages of marked regular expressions are local — a property that does not hold any more for content models that contain the new &-operator. Therefore, it is necessary to develop new techniques for content models.
Besides solving an interesting problem in formal language theory, our results are relevant for developers of SGML systems. In fact, our definitions are causing changes to the revised edition of the SGML standard, and the algorithm to test content models for unambiguity has been implemented in an SGML parser.
Preview
Unable to display preview. Download preview PDF.
References
D. Barron. Why use SGML? Electronic Publishing —Origination, Dissemination and Design, 2(1):3–24, April 1989.
A. Brüggemann-Klein. Regular expressions into finite automata. In I. Simon, editor, Latin '92, pages 87–98, Berlin, 1992. Springer-Verlag. Lecture Notes in Computer Science 583.
A. Brüggemann-Klein. Regular expressions into finite automata, 1992. To appear in Theoretical Computer Science.
A. Brüggemann-Klein. Formal models in document processing. Habilitationsschrift. Submitted to the Faculty of Mathematics at the University of Freiburg, 1993.
A. Brüggemann-Klein and D. Wood. Deterministic regular languages. In A. Finkel and M. Jantzen, editors, STACS 92, pages 173–184, Berlin, 1992. Springer-Verlag. Lecture Notes in Computer Science 577.
J. Clark, 1992. Source code for SGMLS. Available by anonymous ftp from ftp.uu.net and sgml1.ex.ac.uk.
C. F. Goldfarb. The SGML Handbook. Clarendon Press, Oxford, 1990.
ISO 8879: Information processing—Text and office systems—Standard Generalized Markup Language (SGML), October 1986. International Organization for Standardization.
J.-E. Pin. Local languages and the Berry-Sethi algorithm. Unpublished Manuscript, 1992.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brüggemann-Klein, A. (1993). Unambiguity of extended regular expressions in SGML document grammars. In: Lengauer, T. (eds) Algorithms—ESA '93. ESA 1993. Lecture Notes in Computer Science, vol 726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57273-2_45
Download citation
DOI: https://doi.org/10.1007/3-540-57273-2_45
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57273-2
Online ISBN: 978-3-540-48032-7
eBook Packages: Springer Book Archive