On-Line Multi-Threaded Processing of Web User-Clicks on Multi-Core Processors

Bonacic, Carolina; Garcia, Carlos; Marin, Mauricio; Prieto, Manuel; Tirado, Francisco

doi:10.1007/978-3-642-19328-6_22

On-Line Multi-Threaded Processing of Web User-Clicks on Multi-Core Processors

Carolina Bonacic²⁰,
Carlos Garcia²⁰,
Mauricio Marin²¹,
Manuel Prieto²⁰ &
…
Francisco Tirado²⁰

Conference paper

1491 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6449))

Abstract

Real time search — a setting in which Web search engines are able to include among their query results documents published on the Web in the very recent past — is a clear evidence that many of the off-line computations performed so far on conventional search engines need to be moved to the on-line arena. This is a demanding case for parallel computing since it is necessary to cope efficiently with thousands of concurrent read and write operations per unit time, all requiring latency times within a fraction of a second. To our knowledge, computations related to capturing user preferences through their clicks on the query result webpages and include this feature in the document ranking process are currently performed in an off-line manner. This is effected by pre-processing very large logs containing millions of queries submitted by actual users in a time scale of days, weeks or even months. The outcome is score data for the set of documents indexed by the search engine which were selected by users in the past. This paper studies the efficiency of this process in the on-line setting by evaluating a set of strategies for concurrent read/write operations executed on a multi-threaded multi-core architecture. The benefit of efficient on-line processing of user clicks is making it feasible to include user preference in document ranking also in a real-time fashion.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arusu, A., Cho, J., Garcia-Molina, H., Paepcke, A., Raghavan, S.: Searching the web. ACM Trans. 1(1), 2–43 (2001)
Article Google Scholar
Badue, C., Baeza-Yates, R., Ribeiro, B., Ziviani, N.: Distributed query processing using partitioned inverted files. In: SPIRE, pp. 10–20. IEEE-CS, Los Alamitos (2001)
Google Scholar
Barroso, A., Dean, J., Olzle, U.H.: Web search for a planet: The google cluster architecture. IEEE Micro. 23(2), 22–28 (2002)
Article Google Scholar
Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., Silvestri, F.: Design trade-offs for search engine caching. ACM TWEB 2(4), 1–28 (2008)
Article Google Scholar
Baeza, R., Ribeiro, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Google Scholar
Ding, S., He, J., Yan, H., Suel, T.: Using Graphics Processors for High Performance IR Query Processing. In: WWW, pp. 421–430 (2009)
Google Scholar
Dragicevic, K., Bauer, D.: A survey of concurrent priority queue algorithms. In: IPDPS, pp. 1–6 (2008)
Google Scholar
Fagni, T., Perego, R., Silvestri, F., Orlando, S.: Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data. ACM TOIS 24(1), 51–78 (2006)
Article Google Scholar
Gan, Q., Suel, T.: Improved Techniques for Result Caching in Web Search Engines. In: WWW, pp. 431–440 (2009)
Google Scholar
Jeong, B.S., Omiecinski, E.: Inverted file partitioning schemes in multiple disk systems. IEEE Trans. Parallel and Distributed Systems 16(2), 142–153 (1995)
Article Google Scholar
Lempel, R., Moran, S.: Predictive caching and prefetching of query results in search engines. In: WWW, pp. 19–28 (2003)
Google Scholar
Long, X., Suel, T.: Three-level caching for efficient query processing in large Web search engines. In: 14th WWW, pp. 257–266 (2005)
Google Scholar
MacFarlane, A., McCann, J., Robertson, S.: Parallel search using partitioned inverted files. In: SPIRE 2002, pp. 209–220. IEEE CS, Los Alamitos (2002)
Google Scholar
Markatos, E.: On caching search engine query results. Computer Communications 24(7), 137–143 (2000)
Google Scholar
Marin, M., Gil-Costa, V.: High-performance distributed inverted files. In: Proc. CIKM, pp. 935–938 (2007)
Google Scholar
Marin, M., Bonacic, C., Gil-Costa, V., Gomez-Pantoja, C.: A Search Engine Accepting On-Line Updates. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 348–357. Springer, Heidelberg (2007)
Chapter Google Scholar
Zobel, J., Moffat, A.: Inverted Files for Text Search Engines. ACM Computing Surveys 38(2) (2006)
Google Scholar
Marin, M., Paredes, R., Bonacic, C.: High-Performance Priority Queues for Parallel Crawlers. In: 10th ACM International Workshop on Web Information and Data Management (WIDM 2008), California, US, October 30 (2008)
Google Scholar
Moffat, W., Webber, J., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. Information Retrieval (August 2007)
Google Scholar
Persin, M., Zobel, J., Sacks-Davis, R.: Filtered document retrieval with frequency-sorted indexes. Journal of the American Society for Information Science 47(10), 749–764 (1996)
Article Google Scholar
Ribeiro-Neto, B., Barbosa, R.: Query performance for tightly coupled distributed digital libraries. In: ACM Conference on Digital Libraries, pp. 182–190 (1998)
Google Scholar
Xi, W., Sornil, O., Luo, M., Fox, E.A.: Hybrid partition inverted files: Experimental validation. In: Agosti, M., Thanos, C. (eds.) ECDL 2002. LNCS, vol. 2458, pp. 422–431. Springer, Heidelberg (2002)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Dept. Arquitectura de Computadores y Automatica, Universidad Complutense de Madrid, Spain
Carolina Bonacic, Carlos Garcia, Manuel Prieto & Francisco Tirado
Yahoo! Research Latin America, Universidad de Santiago de Chile, Chile
Mauricio Marin

Authors

Carolina Bonacic
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Mauricio Marin
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Prieto
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Tirado
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculdade de Engenharia da, Universidade do Porto, Rua Dr. Roberto Frias s/n, 4200-465, Porto, Portugal
José M. Laginha M. Palma
INP (ENSEEIHT) IRIT, University of Toulouse, rue Charles-Camichel, CEDEX 7, 31071, Toulouse, France
Michel Daydé
Lawrence Berkeley National Laboratory, Berkeley, USA
Osni Marques
Faculty of Engineering, University of Porto, Rua Dr. Roberto Frias, s/n, 4200-465, Porto, Portugal
João Correia Lopes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bonacic, C., Garcia, C., Marin, M., Prieto, M., Tirado, F. (2011). On-Line Multi-Threaded Processing of Web User-Clicks on Multi-Core Processors. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds) High Performance Computing for Computational Science – VECPAR 2010. VECPAR 2010. Lecture Notes in Computer Science, vol 6449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19328-6_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-19328-6_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19327-9
Online ISBN: 978-3-642-19328-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics