Abstract
In spite of intensive research on linguistic techniques in information retrieval, there are still few large-scale search engines that have taken full advantage of these techniques. This paper presents the integration of various linguistic techniques in one of the largest search engines on the Internet. The techniques include language identification, offensive content filtering, phrasing and anti-phrasing, normalization, and clustering. We go into some of the challenges of Internet search and discuss our experiences with these techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kobayashi, M. and Takeda, K. (2000). “Information Retrieval on the Web.” ACM Computing Surveys, Vol. 32, No. 2, pp. 144–173. 2000.
Risvik, K. M. and R. Michelsen (2001). “Search Engines and Web Dynamics.” To appear in Computer Network, special issue of Web Dynamics. 2002.
Craswell, N., D. Hawking, and K. Griffiths (2001). Which search engine is best at finding airline site home pages? Technical report 01/45. CSIRO, Australia, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Atle Gulla, J., Gunnar Auran, P., Magne Risvik, K. (2002). Linguistics in Large-Scale Web Search. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds) Natural Language Processing and Information Systems. NLDB 2002. Lecture Notes in Computer Science, vol 2553. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36271-1_21
Download citation
DOI: https://doi.org/10.1007/3-540-36271-1_21
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00307-6
Online ISBN: 978-3-540-36271-5
eBook Packages: Springer Book Archive