A document field is a part of a document or of the document metadata in which the text has a particular function. A document field can contain free or preformatted text. Each field, according to its function, has different characteristics, length, and term distributions.
Textual documents have implicit structure, which aids the understanding of the text. Long textual documents are usually organized in chapters, sections, paragraphs, and each of those can have a concise description in the form of a title. In the case of hypertext documents, explicit links between documents in the form of hyperlinks are often associated with anchor text. News wire documents also have metadata such as date, or the name of the author. Efforts to standardize metadata about documents have resulted in projects such as the Dublin Core Metadata Initiative .
Fields are also being used to represent the annotations of text with semantic and syntactic information. For example, the semantic...
- 1.Dublin Core Metadata Initiative. Retrieved 15 Apr 2008. http://dublincore.org/
- 2.Jin R, Hauptmann A, Zhai C. Title language model for information retrieval. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2002. p. 42–8.Google Scholar
- 3.Zaragoza H, Rode H, Mika P, Atserias J, Ciaramita M, Attardi G. Ranking Very Many Typed Entities on Wikipedia. In: Proceedings of the 16th ACM International Conference on Information and Knowledge Management; 2007. p. 1015–8.Google Scholar