Skip to main content
Log in

Reevaluating Data Stall Time with the Consideration of Data Access Concurrency

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Data access delay has become the prominent performance bottleneck of high-end computing systems. The key to reducing data access delay in system design is to diminish data stall time. Memory locality and concurrency are the two essential factors influencing the performance of modern memory systems. However, existing studies in reducing data stall time rarely focus on utilizing data access concurrency because the impact of memory concurrency on overall memory system performance is not well understood. In this study, a pair of novel data stall time models, the L-C model for the combined effort of locality and concurrency and the P-M model for the effect of pure miss on data stall time, are presented. The models provide a new understanding of data access delay and provide new directions for performance optimization. Based on these new models, a summary table of advanced cache optimizations is presented. It has 38 entries contributed by data concurrency while only has 21 entries contributed by data locality, which shows the value of data concurrency. The L-C and P-M models and their associated results and opportunities introduced in this study are important and necessary for future data-centric architecture and algorithm design of modern computing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Wulf W A, McKee S A. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News, 1995, 23(1): 20–24.

    Article  Google Scholar 

  2. McKee S A. Reflections on the memory wall. In Proc. the 1st Conference on Computing Frontiers, April 2004, p.162.

  3. Borkar S, Chien A A. The future of microprocessors. Communications of the ACM, 2011, 54(5): 67–77.

    Article  Google Scholar 

  4. Nikos H, Ippokratis P, Ryan J et al. Database servers on chip multiprocessors: Limitations and opportunities. In Proc. the 3rd Biennial Conference on Innovative Data Systems Research, Jan. 2007.

  5. Somogyi S, Wenisch T, Ailamaki A et al. Spatio-temporal memory streaming. ACM SIGARCH Computer Architecture News, 2009, 37(3): 69–80.

    Article  Google Scholar 

  6. Hennessy J L, Patterson D A. Computer Architecture: A Quantitative Approach (5th edition). Morgan Kaufmann, 2011

  7. Chou Y, Fahs B, Abraham S. Microarchitecture optimizations for memory-level parallelism. In Proc. the 31st International Symposium on Computer Architecture, June 2004, pp.19-23.

  8. Qureshi M K, Lynch D N, Mutlu O et al. A case for MLPaware cache replacement. ACM SIGARCH Computer Architecture News, 2006, 34(2): 167–178.

    Article  Google Scholar 

  9. Moreto M, Cazorla F J, Ramirez A et al. MLP-aware dynamic cache partitioning. In Proc. the 3rd Int. Conf. High Performance Embedded Architectures and Compilers, Jan. 2008, pp.337-352.

  10. Sun X H,Wang D. Concurrent average memory access time. IEEE Computer, 2014, 47(5): 74–80.

    Article  Google Scholar 

  11. Sun X H. Concurrent-AMAT: A mathematical model for Big Data access. HPC Magazine. http://www.hpcmagazine.eu/state-of-the-art/c-amat-a-model-for-big-data-access/, May 2014.

  12. Karkhanis T, Smith J E. A day in the life of a data cache miss. In Proc. the 2nd Workshop on Memory Performance Issues, May 2002.

  13. Binkert N, Beckmann B, Black G et al. The gem5 simulator. ACM SIGARCH Computer Architecture News, 2011, 39(2): 1–7.

    Article  Google Scholar 

  14. Rosenfeld P, Cooper-Balis E, Jacob B. DRAMSim2: A cycle accurate memory system simulator. Computer Architecture Letters, 2011, 10(1): 16–19.

    Article  Google Scholar 

  15. Spradling C D. SPEC CPU2006 benchmark tools. ACM SIGARCH Computer Architecture News, 2007, 35(1): 130-134.

    Article  Google Scholar 

  16. Wu Y, Chen Y, Chen T et al. An elastic architecture adaptable to various application scenarios. Journal of Computer Science and Technology, 2014, 29(2): 227–238.

    Article  Google Scholar 

  17. Mutlu O, Stark J, Wilkerson C et al. Runahead execution: An alternative to very large instruction windows for out-oforder processors. In Proc. the 9th International Symposium on High-Performance Computer Architecture, Feb. 2003, pp.129-140.

  18. Ketterlin A, Clauss P. Profiling data-dependence to assist parallelization: Framework, scope, and optimization. In Proc. the 45th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2012, pp.437-448.

  19. Sun X H, Wang D. APC: A performance metric of memory systems. ACM SIGMETRICS Performance Evaluation Review, 2012, 40(2): 125–130.

    Article  Google Scholar 

  20. Wang D, Sun X H. APC: A novel memory metric and measurement methodology for modern memory system. IEEE Transactions on Computers, 2014, 63(7): 1626–1639.

    Article  Google Scholar 

  21. Van Craeynest K, Jaleel A, Eeckhout L et al. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In Proc. the 39th Annual International Symposium on Computer Architecture (ISCA), June 2012, pp.213-224.

  22. Wang R, Chen L, Pinkston T M. An analytical performance model for partitioning off-chip memory bandwidth. In Proc. the 27th IEEE International Symposium on Parallel and Distributed Processing, May 2013, pp.165-176.

  23. Kurian G, Khan O, Devadas S. The locality-aware adaptive cache coherence protocol. In Proc. the 40th Annual International Symposium on Computer Architecture, June 2013, pp.523-534.

  24. Iakymchuk R, Bientinesi P. Modeling performance through memory-stalls. ACM SIGMETRICS Performance Evaluation Review, 2012, 40(2): 86–91.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu-Hang Liu.

Additional information

Special Section on Applications and Industry

The work was supported in part by the National Science Foundation of USA under Grant Nos. CNS-1162540, CCF-0937877, and CNS-0751200.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, YH., Sun, XH. Reevaluating Data Stall Time with the Consideration of Data Access Concurrency. J. Comput. Sci. Technol. 30, 227–245 (2015). https://doi.org/10.1007/s11390-015-1517-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-015-1517-2

Keywords

Navigation