Synonyms
Indexing; Inverted indexes
Definition
A core element of modern information retrieval systems is the document index. The index is a set of data structures that are constructed from a source document collection with the goal of allowing an information retrieval system to provide timely, efficient response to search queries. The process of index creation typically involves reading and processing the source document collection, parsing the text in each individual document and extracting the necessary features to allow for retrieving and ranking that document in response to a user query. Additionally, indexing systems often use dimension reduction, compression, and other related techniques to drastically reduce the storage footprint of the source collection in its indexed form. Document indexes are frequently stored in a set of file structures that are conducive to rapid retrieval and ranking by an information retrieval system in response to a query.
Historical Background
As...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Grossman D, Frieder O. Information retrieval: algorithms and heuristics. 2nd ed. Dordrecht: Springer; 2004.
The size of the World Wide Web: http://www.worldwidewebsize.com. Retrieved Mar 2008.
Witten IH, Moffat A, Bell TC. Managing gigabytes: compressing and indexing documents and images. 2nd ed. San Francisco: Morgan Kaufmann; 1999.
Zobel J, Moffat A. Inverted files for text search engines. ACM Comput Surv. 2007;38(2):6.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Beitzel, S.M., Jensen, E.C., Frieder, O. (2018). Index Creation and File Structures. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_944
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_944
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering