Usage Data in Web Search: Benefits and Limitations

Baeza-Yates, Ricardo; Maarek, Yoelle

doi:10.1007/978-3-642-31235-9_33

Ricardo Baeza-Yates¹⁸ &
Yoelle Maarek¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7338))

Included in the following conference series:

International Conference on Scientific and Statistical Database Management

1820 Accesses
5 Citations
22 Altmetric

Abstract

Web Search, which takes its root in the mature field of information retrieval, evolved tremendously over the last 20 years. The field encountered its first revolution when it started to deal with huge amounts of Web pages. Then, a major step was accomplished when engines started to consider the structure of the Web graph and link analysis became a differentiator in both crawling and ranking. Finally, a more discrete, but not less critical step, was made when search engines started to monitor and mine the numerous (mostly implicit) signals provided by users while interacting with the search engine. We focus here on this third “revolution” of large scale usage data. We detail the different shapes it takes, illustrating its benefits through a review of some winning search features that could not have been possible without it. We also discuss its limitations and how in some cases it even conflicts with some natural users’ aspirations such as personalization and privacy. We conclude by discussing how some of these conflicts can be circumvented by using adequate aggregation principles to create “ad hoc”crowds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agarwal, D., Chen, B.C., Elango, P.: Explore/Exploit Schemes for Web Content Optimization. In: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, pp. 1–10. IEEE Computer Society, Washington, DC (2009)
Chapter Google Scholar
Baeza-Yates, R., Broder, A., Maarek, Y.: The New Frontier of Web Search Technology: Seven Challenges, ch. 2, pp. 11–23. Springer (2011)
Google Scholar
Baeza-Yates, R., Maarek, Y.: Web retrieval. In: Baeza-Yates, R., Ribeiro-Neto, B. (eds.) Modern Information Retrieval: The Concepts and Technology behind Search, 2nd edn. Addison-Wesley (2011)
Google Scholar
Baeza-Yates, R., Saint-Jean, F.: A Three Level Search Engine Index Based in Query Log Distribution. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 56–65. Springer, Heidelberg (2003)
Chapter Google Scholar
Barbaro, M., Zeller Jr., T.: A face is exposed for aol searcher no. 4417749. The New York Times, August 9 (2006)
Google Scholar
Bilton, N.: Erasing the digital past. The New York Times (April 2011), http://www.nytimes.com/2011/04/03/fashion/03reputation.html
Brenes, D.J., Gayo-Avello, D., Pérez-González, K.: Survey and evaluation of query intent detection methods. In: Proceedings of the 2009 Workshop on Web Search Click Data, WSCD 2009, pp. 1–7. ACM, New York (2009)
Chapter Google Scholar
Cutrell, E., Guan, Z.: What are you looking for?: an eye-tracking study of information usage in web search. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2007, pp. 407–416. ACM, New York (2007)
Chapter Google Scholar
Feild, H.A., Allan, J., Jones, R.: Predicting searcher frustration. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, pp. 34–41. ACM, New York (2010)
Chapter Google Scholar
Goel, S., Broder, A., Gabrilovich, E., Pang, B.: Anatomy of the long tail: ordinary people with extraordinary tastes. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM 2010, pp. 201–210. ACM, New York (2010)
Chapter Google Scholar
Guo, Q., Agichtein, E.: Exploring mouse movements for inferring query intent. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 707–708. ACM, New York (2008)
Chapter Google Scholar
Hamilton, A.: Why cuil is no threat to google. Time.com (Time Magazine Online) (July 2008), http://www.time.com/time/business/article/0,8599,1827331,00.html
Huang, J., White, R.W., Dumais, S.: No clicks, no problem: using cursor movements to understand and improve search. In: Proceedings of the 2011 Annual Conference on Human Factors in Computing Systems, CHI 2011, pp. 1225–1234. ACM, New York (2011)
Google Scholar
Kadouch, D.: Local flavor for google suggest. The Official Google Blog (March 2009), http://googleblog.blogspot.com/2009/03/local-flavor-for-google-suggest.html
Kukich, K.: Techniques for automatically corecting words in text. ACM Computing Surveys 24(4) (December 1992)
Google Scholar
Mullin, J.: FTC commissioner: If companies don’t protect privacy, we’ll go to congress. paidContent.org, the Economics of Digital Content (February 2011)
Google Scholar
Pariser, E.: The Filter Bubble: What the Internet Is Hiding from You. Penguin Press (2011)
Google Scholar
Radlinski, F., Dumais, S.: Improving personalized web search using result diversification. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 691–692. ACM, New York (2006)
Chapter Google Scholar
Shi, X.: Social network analysis of web search engine query logs. Technical report, School of Information, University of Michigan (2007)
Google Scholar
Srikant, R., Basu, S., Wang, N., Pregibon, D.: User browsing models: relevance versus examination. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, pp. 223–232. ACM, New York (2010)
Chapter Google Scholar
Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web usage mining: discovery and applications of usage patterns from web data. SIGKDD Explor. Newsl. 1, 12–23 (2000)
Article Google Scholar
Surowiecki, J.: The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. Random House (2004)
Google Scholar
Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2001)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Yahoo! Research, Barcelona, Spain
Ricardo Baeza-Yates
Yahoo! Research, Haifa, Israel
Yoelle Maarek

Authors

Ricardo Baeza-Yates
View author publications
You can also search for this author in PubMed Google Scholar
Yoelle Maarek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science, EPFL IC SIN-GE, Ecole Polytechnique Federale de Lausanne, Batiment BC, Station 14, 1015, Lausanne, Switzerland
Anastasia Ailamaki
Department of Computer Science, Gonzaga University, 502 E. Boone Avenue, 99258-0026, Spokane, WA, USA
Shawn Bowers

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baeza-Yates, R., Maarek, Y. (2012). Usage Data in Web Search: Benefits and Limitations. In: Ailamaki, A., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2012. Lecture Notes in Computer Science, vol 7338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31235-9_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-31235-9_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31234-2
Online ISBN: 978-3-642-31235-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics