Efficient Sorting Using Registers and Caches

  • Lars Arge
  • Jeff Chase
  • Jeffrey S. Vitter
  • Rajiv Wickremesinghe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1982)


Modern computer systems have increasingly complex memory systems. Common machine models for algorithm analysis do not reflect many of the features of these systems, e.g., large register sets, lockup-free caches, cache hierarchies, associativity, cache line fetching, and streaming behavior. Inadequate models lead to poor algorithmic choices and an incomplete understanding of algorithm behavior on real machines.

A key step toward developing better models is to quantify the performance effects of features not reflected in the models. This paper explores the effect of memory system features on sorting performance. We introduce a new cache-conscious sorting algorithm, R-merge, which achieves better performance in practice over algorithms that are theoretically superior under the models. R-merge is designed to minimize memory stall cycles rather than cache misses, considering features common to many system designs.


Memory System Main Memory Cache Line Memory Hierarchy Cache Performance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    A. Aggarwal, B. Alpern, A. K. Chandra, and M. Snir. A model for hierarchical memory. Proceedings of the 19th ACM Symposium on Theory of Computation, pages 305–314, 1987.Google Scholar
  2. 2.
    A. Aggarwal and J. S. Vitter. The Input/Output complexity of sorting and related problems. Communications of the ACM, 31(9):1116–1127, 1988.MathSciNetCrossRefGoogle Scholar
  3. 3.
    J. Anderson, L. Berc, J. Dean, S. Ghemawat, M. Henzinger, S.-T. Leung, M. Vandevoorde, C. Waldspurger, and B. Weihl. Continuous profiling: Where have all the cycles gone? In Proceedings of the Sixteenth ACM Symposium on Operating System Principles (SOSP), October 1997.Google Scholar
  4. 4.
    D. Callahan, S. Carr, and K. Kennedy. Improving register allocation for subscripted variables. In Proceedings of the SIGPLAN’ 90 Conference on Programming Language Design and Implementation, White Plains, NY, June 1990.Google Scholar
  5. 5.
    DEC. Programmer’s Guide. DigitalUnix Documentation Library. ATOM toolkit reference.Google Scholar
  6. 6.
    J. H. Edmondson, P. I. Rubinfeld, P. J. Bannon, B. J. Benschneider, D. Bernstein, R. W. Castelino, E. M. Cooper, D. E. Dever, D. R. Donchin, T. C. Fischer, A. K. Jain, S. Mehta, J. E. Meyer, R. P. Preston, V. Rajagopalan, C. Somanathan, S. A. Taylor, and G. M. Wolrich. Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor.Digital Technical Journal, 7(1):119–135, 1995.Google Scholar
  7. 7.
    J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 2 edition, 1995.Google Scholar
  8. 8.
    D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. Addison-Wesley, Reading MA, second edition, 1998.zbMATHGoogle Scholar
  9. 9.
    R. Ladner, J. Fix, and A. LaMarca. Cache performance analysis of traversals and random accesses. In Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms, 1999.Google Scholar
  10. 10.
    M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April1991.Google Scholar
  11. 11.
    A. LaMarca and R. E. Ladner. The influence of caches on the performance of sorting. In Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, 1997.Google Scholar
  12. 12.
    N. Rahman and R. Raman. Analysing cache effects in distribution sorting. In 3rd Workshop on Algorithm Engineering, 1999.Google Scholar
  13. 13.
    N. Rahman and R. Raman. Adapting radix sort to the memory hierarchy. In ALENEX, Workshop on Algorithm Engineering and Expermentation, 2000.Google Scholar
  14. 14.
    P. Sanders. Accessing multiple sequences through set associative caches. ICALP, 1999.Google Scholar
  15. 15.
    P. Sanders. Fast priority queues for cached memory. In ALENEX, Workshop on Algorithm Engineering and Expermentation. Max-Plank-Institut für Informatik, 1999.Google Scholar
  16. 16.
    S. Sen and S. Chatterjee. Towards a theory of cache-efficient algorithms. In Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, 2000.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Lars Arge
    • 1
  • Jeff Chase
    • 1
  • Jeffrey S. Vitter
    • 1
  • Rajiv Wickremesinghe
    • 1
  1. 1.Department of Computer ScienceDuke UniversityDurham, NCUSA

Personalised recommendations