Abstract
Web search engines achieve efficient performance by partitioning and replicating the indexing data structure used to support query processing. Current practice simply partitions and replicates the text collection on the set of cluster processors and then constructs in each processor an index data structure. This paper proposes a different approach by constructing an index data structure that properly considers the fact that data is partitioned and replicated. This leads to a so-called 3D indexing strategy that outperforms current approaches. Performance is further boosted by introducing an application caching scheme devised to hold most frequently issued queries.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval - the concepts and technology behind search, 2nd edn. Pearson Education Ltd. (2011)
Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., Zien, J.Y.: Efficient query evaluation using a two-level retrieval process. In: CIKM, pp. 426–434 (2003)
Fagni, T., Perego, R., Silvestri, F., Orlando, S.: Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst. 24, 51–78 (2006)
Feuerstein, E., Gil-Costa, V., Mizrahi, M., Marin, M.: Performance Evaluation of Improved Web Search Algorithms. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds.) VECPAR 2010. LNCS, vol. 6449, pp. 236–250. Springer, Heidelberg (2011)
Feuerstein, E., Marin, M., Mizrahi, M., Gil-Costa, V., Baeza-Yates, R.: Two-Dimensional Distributed Inverted Files. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 206–213. Springer, Heidelberg (2009)
Gan, Q., Suel, T.: Improved techniques for result caching in web search engines. In: WWW, pp. 431–440 (2009)
Gomez-Pantoja, C., Marin, M., Gil-Costa, V., Bonacic, C.: An Evaluation of Fault-Tolerant Query Processing for Web Search Engines. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part I. LNCS, vol. 6852, pp. 393–404. Springer, Heidelberg (2011)
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: PODC, p. 6 (2009)
Moffat, A., Webber, W., Zobel, J.: Load balancing for term-distributed parallel retrieval. In: SIGIR, pp. 348–355 (2006)
Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. Information Retrieval 10, 205–231 (2007)
Tang, C., Dwarkadas, S.: Hybrid global-local indexing for efficient peer-to-peer information retrieval. In: NSDI, p. 16 (2004)
Valiant, L.G.: A bridging model for multi-core computing. J. Comput. Syst. Sci. 77(1), 154–166 (2011)
Xi, W., Sornil, O., Luo, M., Fox, E.A.: Hybrid Partition Inverted Files: Experimental Validation. In: Agosti, M., Thanos, C. (eds.) ECDL 2002. LNCS, vol. 2458, pp. 422–431. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Feuerstein, E., Gil-Costa, V., Marin, M., Tolosa, G., Baeza-Yates, R. (2012). 3D Inverted Index with Cache Sharing for Web Search Engines. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds) Euro-Par 2012 Parallel Processing. Euro-Par 2012. Lecture Notes in Computer Science, vol 7484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32820-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-32820-6_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32819-0
Online ISBN: 978-3-642-32820-6
eBook Packages: Computer ScienceComputer Science (R0)