Text Indexing Techniques
Text indexing is the act of processing a text in order to extract statistics considered important for representing the information available and/or to allow fast search on its content. Text indexing operations can be performed not only on natural language texts, but virtually on any type of textual information, such as source code of computer programs, DNA or protein databases, and textual data stored in traditional database systems.
Efforts for indexing electronic texts are found in literature since the beginning of computational systems. For example, descriptions of electronic information search systems that are able to index and search text can be found in the early 1950s .
In a seminal work, Gerard Salton wrote, in 1968, a book containing the basis for the modern information retrieval systems , including a description of a model largely adopted up to now for indexing texts, known as vector space model. Other successful models for indexing...
- 2.Baeza-Yates R, Ribeiro-Neto B. Modern information retrieval. 2nd ed. Reading: Addison Wesley; 2011.Google Scholar
- 4.Manber U, Wu S. Glimpse: a tool to search through entire file systems. In: Proceedings of the USENIX Winter 1994 Technical Conference; 1994. p. 23–32.Google Scholar
- 5.Salton G. Automatic information organization and retrieval. New York: McGraw-Hill; 1968.Google Scholar