Skip to main content

A Fast and Flexible Sorting Algorithm with CUDA

  • Conference paper
Algorithms and Architectures for Parallel Processing (ICA3PP 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5574))

Abstract

In this paper, we propose a fast and flexible sorting algorithm with CUDA. The proposed algorithm is much more practical than the previous GPU-based sorting algorithms, as it is able to handle the sorting of elements represented by integers, floats and structures. Meanwhile, our algorithm is optimized for the modern GPU architecture to obtain high performance. We use different strategies for sorting disorderly list and nearly-sorted list to make it adaptive. Extensive experiments demon- strate our algorithm has higher performance than previous GPU-based sorting algorithms and can support real-time applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Purcell, T.J., Donner, C., Cammarano, M., Jensen, H.W., Hanrahan, P.: Photon Mapping on Programmable Graphics Hardware. In: Proceedings of the ACM Siggraph Eurographics Symposium on Graphics Hardware (2003)

    Google Scholar 

  2. Kapasi, U.J., Dally, W.J., Rixner, S., Mattson, P.R., Owens, J.D., Khailany, B.: Efficient Conditional Operations for Data-parallel Architectures. In: Proceedings of the 33rd annual ACM/IEEE International Symposium on Microarchitecture, pp. 159–170 (2000)

    Google Scholar 

  3. Kipfer, P., Segal, M., Westermann, R.: UberFlow: A GPU-based Particle Engine. In: Proceedings of the ACM Siggraph/Eurographics Conference on Graphics Hardware, pp. 115–122 (2004)

    Google Scholar 

  4. Greβ, A., Zachmann, G.: GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures. In: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium (2006)

    Google Scholar 

  5. Bilardi, G., Nicolau, A.: Adaptive Bitonic Sorting. An Optimal Parallel Algorithm for Shared Memory Machines. SIAM Journal on Computing 18(2), 216–228 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  6. Govindaraju, N.K., Raghuvanshi, N., Manocha, D.: Fast and Approximate Stream Mining of Quantiles and Frequencies Using Graphics Processors. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 611–622 (2005)

    Google Scholar 

  7. NVIDIA Corporation. NVIDIA CUDA Programming Guide (2008)

    Google Scholar 

  8. Sintorn, E., Assarsson, U.: Fast Parallel GPU-Sorting Using a Hybrid Algorithm. In: Workshop on General Purpose Processing on Graphics Processing Units (2007)

    Google Scholar 

  9. Cederman, D., Tsigas, P.: A Practical Quicksort Algorithm for Graphics Processors. Technical Report 2008-01, Computer Science and Engineering Chalmers University of Technology (2008)

    Google Scholar 

  10. Harris, M., Satish, N.: Designing Efficient Sorting Algorithms for Manycore GPUs. NVIDIA Technical Report (2008)

    Google Scholar 

  11. Harris, M.: Optimizing Parallel Reduction in CUDA. NVIDIA Developer Technology (2008)

    Google Scholar 

  12. Blelloch, E., Greg Plaxton, C., Leiserson, C.E., Smith, S.J., Maggs, B.M., Zagha, M.: An Experimental Analysis of Parallel Sorting Algorithms (1998)

    Google Scholar 

  13. Harris, M., Sengupta, S., Owens, J.D.: Parallel Prefix Sum (Scan) with CUDA. In: Nguyen, H. (ed.) GPU Gems 3, Addison-Wesley, Reading (2007)

    Google Scholar 

  14. Bilardi, G., Nicolau, A.: Adaptive bitonic sorting: an optimal parallel algorithm for shared-memory machines. SIAM J. Comput. 18(2), 216–228 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  15. Kider, J.T.: GPU as a Parallel Machine: Sorting on the GPU, Lecture of University of Pennsylvania (2005)

    Google Scholar 

  16. Knuth, D.: Section 5.2.4: Sorting by merging. In: The Art of Computer Programming, Sorting and Searching, vol. 3, pp. 158–168 (1998) ISBN 0-201-89685-0

    Google Scholar 

  17. Harris, M.: Parallel Prefix Sum(Scan) with CUDA (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, S., Qin, J., Xie, Y., Zhao, J., Heng, PA. (2009). A Fast and Flexible Sorting Algorithm with CUDA. In: Hua, A., Chang, SL. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2009. Lecture Notes in Computer Science, vol 5574. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03095-6_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03095-6_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03094-9

  • Online ISBN: 978-3-642-03095-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics