Document Representations (Inclusive Native and Relational)
Documents; Markup languages; Page representations; Semi-structured data
Native document representations are file formats designed for documents. They can be roughly divided into three types: page-oriented, stream-oriented, and tree-structured. Hybrid types can also be found. Within each type, document representations range from the simple to the complex. All native representations assume an implicit order of the document’s information, reflecting the linear reading order of conventional documents. The most important document representation is the Extensible Markup Language (XML), which is tree-structured and can have any level of complexity. It is seeing widespread use on the Web and in business and is also popular for non-document applications.
Relational databases use a variety of document representations that map to a native representation. Page-oriented and stream-oriented documents are best stored in a coarse-grained manner and do not appear to have stimulated...
- 1.Adobe Systems Incorporated. PDF reference. 6th edn. 2006.Google Scholar
- 2.Boag S, Chamberlin D, Fernández MF, Florescu D, Robie J, Siméon J. XQuery 1.0: an XML query language. Tokyo: World Wide Web Consortium (W3C); 2007.Google Scholar
- 3.Bray T, Paoli J, Sperberg-McQueen CM, Maler E, Yergeau F. Extensible markup language (XML) 1.0. World Wide Web Consortium (W3C). 4th edn. 2006.Google Scholar
- 4.Draper D. Mapping between XML and relational data. In: XQuery from the experts: a guide to the W3C XML query language. Chap. 6. Addison Wesley; 2003.Google Scholar
- 5.Fallside DC, Walmsley P. XML schema part 0: primer. World Wide Web Consortium (W3C). 2nd edn. 2004.Google Scholar
- 7.Goldfarb CF, editor. Information processing – text and office systems – Standard Generalized Markup Language (SGML), International Standard ISO 8879. Geneva: International Organization for Standardization; 1986.Google Scholar
- 8.Kay M. XSL transformations (XSLT) version 2.0. World Wide Web Consortium (W3C). 2007.Google Scholar
- 10.Microsoft Office Word. 2007 Rich Text Format (RTF) specification. 2007. Version 1.9. Downloaded from microsoft.com, November 2007.
- 11.OASIS. Open document format for office applications (OpenDocument) v1.1. 2007. http://docs.oasis-open.org/office/v1.1/OS/. 2007.
- 12.Shanmugasundaram J, Shekita E, Barr R, Carey M, Lindsay B, Pirahesh H, Reinwald B. Efficiently publishing relational data as XML documents. VLDB J. 2001;10(2–3).Google Scholar
- 13.Simske SJ, Baggs SC. Digital capture for automated scanner workflows. In: Proceedings of the 4th ACM Symposium on Document Engineering; 2004. p. 171–7.Google Scholar
- 14.Tatarinov I, Viglas SD, Beyer K, Shanmugasundaram J, Shekita E, Zhang C. Storing and querying ordered XML using a relational database system. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2002. p. 204–15.Google Scholar