Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Document Representations (Inclusive Native and Relational)

  • Ethan V. Munson
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_138

Synonyms

Documents; Markup languages; Page representations; Semi-structured data

Definition

Native document representations are file formats designed for documents. They can be roughly divided into three types: page-oriented, stream-oriented, and tree-structured. Hybrid types can also be found. Within each type, document representations range from the simple to the complex. All native representations assume an implicit order of the document’s information, reflecting the linear reading order of conventional documents. The most important document representation is the Extensible Markup Language (XML), which is tree-structured and can have any level of complexity. It is seeing widespread use on the Web and in business and is also popular for non-document applications.

Relational databases use a variety of document representations that map to a native representation. Page-oriented and stream-oriented documents are best stored in a coarse-grained manner and do not appear to have stimulated...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Adobe Systems Incorporated. PDF reference. 6th edn. 2006.Google Scholar
  2. 2.
    Boag S, Chamberlin D, Fernández MF, Florescu D, Robie J, Siméon J. XQuery 1.0: an XML query language. Tokyo: World Wide Web Consortium (W3C); 2007.Google Scholar
  3. 3.
    Bray T, Paoli J, Sperberg-McQueen CM, Maler E, Yergeau F. Extensible markup language (XML) 1.0. World Wide Web Consortium (W3C). 4th edn. 2006.Google Scholar
  4. 4.
    Draper D. Mapping between XML and relational data. In: XQuery from the experts: a guide to the W3C XML query language. Chap. 6. Addison Wesley; 2003.Google Scholar
  5. 5.
    Fallside DC, Walmsley P. XML schema part 0: primer. World Wide Web Consortium (W3C). 2nd edn. 2004.Google Scholar
  6. 6.
    Furuta R, Scofield J, Shaw A. Document formatting systems: survey, concepts, and issues. ACM Comput Surv. 1982;14(3):417–72.CrossRefGoogle Scholar
  7. 7.
    Goldfarb CF, editor. Information processing – text and office systems – Standard Generalized Markup Language (SGML), International Standard ISO 8879. Geneva: International Organization for Standardization; 1986.Google Scholar
  8. 8.
    Kay M. XSL transformations (XSLT) version 2.0. World Wide Web Consortium (W3C). 2007.Google Scholar
  9. 9.
    Knuth DE, Plass MF. Breaking paragraphs into lines. Softw Pract Exp. 1982;11(11):1119–84.CrossRefzbMATHGoogle Scholar
  10. 10.
    Microsoft Office Word. 2007 Rich Text Format (RTF) specification. 2007. Version 1.9. Downloaded from microsoft.com, November 2007.
  11. 11.
    OASIS. Open document format for office applications (OpenDocument) v1.1. 2007. http://docs.oasis-open.org/office/v1.1/OS/. 2007.
  12. 12.
    Shanmugasundaram J, Shekita E, Barr R, Carey M, Lindsay B, Pirahesh H, Reinwald B. Efficiently publishing relational data as XML documents. VLDB J. 2001;10(2–3).Google Scholar
  13. 13.
    Simske SJ, Baggs SC. Digital capture for automated scanner workflows. In: Proceedings of the 4th ACM Symposium on Document Engineering; 2004. p. 171–7.Google Scholar
  14. 14.
    Tatarinov I, Viglas SD, Beyer K, Shanmugasundaram J, Shekita E, Zhang C. Storing and querying ordered XML using a relational database system. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2002. p. 204–15.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of EECSUniversity of Wisconsin-MilwaukeeMilwaukeeUSA

Section editors and affiliations

  • Frank Tompa
    • 1
  1. 1.David R. Cheriton School of Computer ScienceUniversity of WaterlooWaterlooCanada