Engineering a Multi-core Radix Sort

  • Jan Wassenberg
  • Peter Sanders
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6853)

Abstract

We present a fast radix sorting algorithm that builds upon a microarchitecture-aware variant of counting sort. Taking advantage of virtual memory and making use of write-combining yields a per-pass throughput corresponding to at least 89% of the system’s peak memory bandwidth. Our implementation outperforms Intel’s recently published radix sort by a factor of 1.64. It also compares favorably to the reported performance of an algorithm for Fermi GPUs when data-transfer overhead is included. These results indicate that scalar, bandwidth-sensitive sorting algorithms remain competitive on current architectures. Various other memory-intensive applications can benefit from the techniques described herein.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bohannon, P., McIlroy, P., Rastogi, R.: Main-memory index structures with fixed-size partial keys. In: SIGMOD Conference, pp. 163–174 (2001), http://www.acm.org/sigs/sigmod/sigmod01/eproceedings/papers/Research-Bohannon-et-al.pdf
  2. 2.
    Satish, N., Kim, C., Chhugani, J., Nguyen, A., Lee, V., Kim, D., Dubey, P.: Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In: Elmagarmid, A., Agrawal, D. (eds.) SIGMOD Conference, pp. 351–362. ACM Press, New York (2010), http://doi.acm.org/10.1145/1807167.1807207 Google Scholar
  3. 3.
    Mehlhorn, Sanders: Scanning multiple sequences via cache memory. Algorithmica 35 (2003)Google Scholar
  4. 4.
    Intel. Intel Architecture Software Developer Manual (2010), System Programming Guide, http://www.intel.com/Assets/PDF/manual/253668.pdf
  5. 5.
    Intel Corporation. Intel 64 and IA-32 Architectures Optimization Reference Manual (November 2007), http://www.intel.com/design/processor/manuals/248966.pdf
  6. 6.
    Wassenberg, J., Middelmann, W., Sanders, P.: An efficient parallel algorithm for graph-based image segmentation (June 2009), http://algo2.iti.uni-karlsruhe.de/wassenberg/wassenberg09parallelSegmentation.pdf
  7. 7.
    Jimenez-Gonzalez, D., Navarro, J., Larriba-Pey, J.: Fast parallel in-memory 64-bit sorting. In: Proceedings of the 2001 International Conference on Supercomputing (15th ICS 2001), Sorrento, Napoli, Italy, pp. 114–122. ACM, New York (2001)Google Scholar
  8. 8.
    an Mey, D., Terboven, C.: Affinity matters! OpenMP on multicore and ccNUMA architectures. In: Parallel Computing: Architectures, Algorithms and Applications, vol. 15, Forschungszentrum Jülich and RWTH Aachen University ( Febuary 2008), http://www.compunity.org/events/pastevents/parco07/AffinityMatters_DaM.pdf
  9. 9.
    Panneton, F., L’Ecuyer, P., Matsumoto, M.: Improved long-period generators based on linear recurrences modulo 2. ACM Transactions on Mathematical Software 32 (2006)Google Scholar
  10. 10.
    Satish, N., Kim, C., Chhugani, J., Nguyen, A., Lee, V., Kim, D., Dubey, P.: Fast sort on CPUs, GPUs and intel MIC architectures. Technical report, Intel (2010), http://techresearch.intel.com/userfiles/en-us/FASTsort_CPUsGPUs_IntelMICarchitectures.pdf
  11. 11.
    Merrill, D., Grimshaw, A.: Revisiting sorting for GPGPU stream architectures. Technical Report 3, University of Virginia (February 2010), http://www.cs.virginia.edu/~dgm4d/papers/RadixSortTR.pdf
  12. 12.
    Levinthal, D.: Performance Analysis Guide for Intel Core i7 Processor and Intel Xeon 5500 processors. Intel, http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf
  13. 13.
    Besedin, D.: RightMark memory analyzer, http://cpu.rightmark.org (accessed January 9, 2009)
  14. 14.
    Jacob, B., Ng, S., Wang, D.: Memory systems: cache, DRAM, disk. Morgan Kaufmann, San Francisco (2007)Google Scholar
  15. 15.
    Helman, D., Bader, D., JáJá, J.: A randomized parallel sorting algorithm with an experimental study. J. Parallel Distrib. Comput. 52(1), 1–23 (1998)CrossRefGoogle Scholar
  16. 16.
    Wassenberg, J.: Vmcsort demo (May 2011), http://algo2.iti.kit.edu/wassenberg/vmcsort/demo.html

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Jan Wassenberg
    • 1
  • Peter Sanders
    • 2
  1. 1.Fraunhofer IOSBEttlingenGermany
  2. 2.Karlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations