Abstract
Top-K query processing is one of the fundamental and the most performance-deciding components in Web search engines. A number of techniques such as dynamic pruning have been proposed to reduce the query processing time on CPU. However, it has become increasingly difficult to further improve Top-K query processing’s efficiency without hurting its effectiveness. On the other hand, Graphic Processing Unit (GPU), a powerful computing accelerator on almost every computer today, is barely tapped in Web search engines. The biggest challenge to accelerate top-K query processing on GPU is that the parallel nature of execution model of GPU prevents many CPU top-K query processing optimizations from being directly ported to GPU. GPU with hundreds of cores is ideal for applications with massive parallelism, which is not readily available in existing CPU-oriented top-K query implementations.
This paper exploits the GPU computation power for top-K query processing. In particular, we propose a new domain-specific parallelization framework to utilize GPU to parallelize it. The proposed framework is general enough for both disjunctive and conjunctive query processing modes. Experiments on TREC collections show that our proposed GPU top-K query processing framework is able to improve the query processing time by a factor of 7 when compared with state-of-the-art dynamic pruning methods for the disjunctive mode and by a factor of 6 when compared with the conjunctive mode. Our results show that our GPU top-K query processing framework is faster than previously known GPU baseline method. In particular, our framework is shown to be more scalable and efficient than the CPU and GPU baselines when K is large.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adinetz, A.: CUDA pro tip: optimized filtering with warp-aggregated atomics. Parallel Forall. Np (2014)
Ao, N., et al.: Efficient parallel lists intersection and index compression algorithms using graphics processing units. Proc. VLDB Endow. 4(8), 470–481 (2011)
Asadi, N., Lin, J.: Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 997–1000. ACM (2013)
Barroso, L.A., Dean, J., Holzle, U.: Web search for a planet: the Google cluster architecture. IEEE Micro 23(2), 22–28 (2003)
Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., Zien, J.: Efficient query evaluation using a two-level retrieval process. In: Proceedings of the twelfth International Conference on Information and Knowledge Management, pp. 426–434. ACM (2003)
Buckley, C., Lewit, A.F.: Optimization of inverted vector searches. In: Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 97–110. ACM (1985)
Büttcher, S., Clarke, C.L.: Index compression is good, especially for random access. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 761–770. ACM (2007)
Ding, S., He, J., Yan, H., Suel, T.: Using graphics processors for high performance IR query processing. In: Proceedings of the 18th International Conference on World Wide Web, pp. 421–430. ACM (2009)
Ding, S., Suel, T.: Faster top-k document retrieval using block-max indexes. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 993–1002. ACM (2011)
Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with CUDA. GPU Gems 3(39), 851–876 (2007)
He, B., et al.: Relational query coprocessing on graphics processors. ACM Trans. Database Syst. (TODS) 34(4), 21 (2009)
Hoberock, J., Bell, N.: Thrust: a parallel template library (2010)
Lee, S.J., Jeon, M., Kim, D., Sohn, A.: Partitioned parallel radix sort. J. Parallel Distrib. Comput. 62(4), 656–668 (2002)
Lichterman, D.: Course project for UIUC ECE 498 AL: programming massively parallel processors. Wen-Mei Hwu and David Kirk, instructors (2007)
Macdonald, C., Santos, R.L., Ounis, I.: The whens and hows of learning to rank for web search. Inf. Retrieval 16(5), 584–628 (2013)
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at TREC-3. NIST Special Publication SP, p. 109 (1995)
Schurman, E., Brutlag, J.: Performance related changes and their user impact. In: Velocity Web Performance and Operations Conference (2009)
Sengupta, S., Harris, M., Garland, M., Owens, J.D.: Efficient parallel scan algorithms for many-core GPUs. In: Scientific Computing with Multicore and Accelerators, pp. 413–442 (2011)
Shams, R., Kennedy, R., et al.: Efficient histogram algorithms for NVIDIA CUDA compatible devices. In: Proceedings of the International Conference on Signal Processing and Communications Systems (ICSPCS), pp. 418–422. Citeseer (2007)
Tatikonda, S., Cambazoglu, B.B., Junqueira, F.P.: Posting list intersection on multicore architectures. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 963–972. ACM (2011)
Tonellotto, N., Macdonald, C., Ounis, I.: Efficient and effective retrieval using selective pruning. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 63–72. ACM (2013)
Turtle, H., Flood, J.: Query evaluation: strategies and optimizations. Inf. Process. Manag. 31(6), 831–850 (1995)
Wang, L., Lin, J., Metzler, D.: A cascade ranking model for efficient ranked retrieval. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 105–114. ACM (2011)
Wu, D., Zhang, F., Ao, N., Wang, G., Liu, J., Liu, J.: Efficient lists intersection by CPU-GPU cooperative computing. In: 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Ph.D. Forum (IPDPSW), pp. 1–8. IEEE (2010)
Wu, H., Fang, H.: Document prioritization for scalable query processing. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1609–1618. ACM (2014)
Zhang, J., Long, X., Suel, T.: Performance of compressed inverted list caching in search engines. In: Proceedings of the 17th International Conference on World Wide Web, pp. 387–396. ACM (2008)
Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar RAM-CPU cache compression. In: Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, p. 59. IEEE (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Y., Fang, H., Li, X. (2019). Scalable Top-K Query Processing Using Graphics Processing Unit. In: Rauchwerger, L. (eds) Languages and Compilers for Parallel Computing. LCPC 2017. Lecture Notes in Computer Science(), vol 11403. Springer, Cham. https://doi.org/10.1007/978-3-030-35225-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-35225-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35224-0
Online ISBN: 978-3-030-35225-7
eBook Packages: Computer ScienceComputer Science (R0)