Indexing the Web
The process of collecting, parsing, and storing data to provide fast and accurate retrieval of content available on the web. The result of this process is a structure called index that maps the collected data (for instance, words, phrases, concepts, or sound fragments) to the web location where it is possible to find content associated with the data (for instance, pages containing these words, phrases, concepts, or music with the sound fragments). Depending on the data collected, several indices may be created. The process can be manual or automatic. Manually generated indices include web directories, back-of-book-style indices, and metadata. Automatically generated indices are normally associated with the infrastructure of search engines.
One of the first efforts to index the web content was developed by a MIT student, Matthew Grey, who created a program to estimate the size of the web. This program, called World Wide Web...
- 1.Baeza-Yates R, Castillo C, Marin M, Rodriguez A. Crawling a country: better strategies than breadth-first for web page ordering. In: Proceedings of the 14th International World Wide Web Conference; 2005. p. 864–72.Google Scholar
- 2.Baeza-Yates RA, Ribeiro-Neto B. Modern information retrieval. 2nd ed. New York/Toronto: Addison-Wesley; 2011.Google Scholar
- 9.Mostafa J. Seeking better web searches. Sci Am Mag. 2005;292(2):50–87.Google Scholar
- 10.Salton G. Automatic text processing: the transformation, analysis, and retrieval of information by computer. Boston: Addison-Wesley Longman Publishing Co.; 1989.Google Scholar
- 11.Sonnenreich W. A history of search engines. 1999. Available at http://www.wiley.com/legacy/compbooks/sonnenreich/history.html.Google Scholar
- 13.Underwood L. A brief history of search engines. 2004. Available at http://www.webreference.com/authoring/search_history.Google Scholar
- 14.Vidal Mrcio LA, da Silva AS, de Moura ES, Cavalcanti Joo MB. Structure-based crawling in the hidden web. J UCS 2008;14(11):1857–76.Google Scholar