Reevaluating Data Stall Time with the Consideration of Data Access Concurrency

Liu, Yu-Hang; Sun, Xian-He

doi:10.1007/s11390-015-1517-2

Reevaluating Data Stall Time with the Consideration of Data Access Concurrency

Regular Paper
Published: 13 March 2015

Volume 30, pages 227–245, (2015)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Yu-Hang Liu¹ &
Xian-He Sun¹

144 Accesses
12 Citations
Explore all metrics

Abstract

Data access delay has become the prominent performance bottleneck of high-end computing systems. The key to reducing data access delay in system design is to diminish data stall time. Memory locality and concurrency are the two essential factors influencing the performance of modern memory systems. However, existing studies in reducing data stall time rarely focus on utilizing data access concurrency because the impact of memory concurrency on overall memory system performance is not well understood. In this study, a pair of novel data stall time models, the L-C model for the combined effort of locality and concurrency and the P-M model for the effect of pure miss on data stall time, are presented. The models provide a new understanding of data access delay and provide new directions for performance optimization. Based on these new models, a summary table of advanced cache optimizations is presented. It has 38 entries contributed by data concurrency while only has 21 entries contributed by data locality, which shows the value of data concurrency. The L-C and P-M models and their associated results and opportunities introduced in this study are important and necessary for future data-centric architecture and algorithm design of modern computing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

A Modern Primer on Processing in Memory

In-memory database acceleration on FPGAs: a survey

Article Open access 26 October 2019

References

Wulf W A, McKee S A. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News, 1995, 23(1): 20–24.
Article Google Scholar
McKee S A. Reflections on the memory wall. In Proc. the 1st Conference on Computing Frontiers, April 2004, p.162.
Borkar S, Chien A A. The future of microprocessors. Communications of the ACM, 2011, 54(5): 67–77.
Article Google Scholar
Nikos H, Ippokratis P, Ryan J et al. Database servers on chip multiprocessors: Limitations and opportunities. In Proc. the 3rd Biennial Conference on Innovative Data Systems Research, Jan. 2007.
Somogyi S, Wenisch T, Ailamaki A et al. Spatio-temporal memory streaming. ACM SIGARCH Computer Architecture News, 2009, 37(3): 69–80.
Article Google Scholar
Hennessy J L, Patterson D A. Computer Architecture: A Quantitative Approach (5th edition). Morgan Kaufmann, 2011
Chou Y, Fahs B, Abraham S. Microarchitecture optimizations for memory-level parallelism. In Proc. the 31st International Symposium on Computer Architecture, June 2004, pp.19-23.
Qureshi M K, Lynch D N, Mutlu O et al. A case for MLPaware cache replacement. ACM SIGARCH Computer Architecture News, 2006, 34(2): 167–178.
Article Google Scholar
Moreto M, Cazorla F J, Ramirez A et al. MLP-aware dynamic cache partitioning. In Proc. the 3rd Int. Conf. High Performance Embedded Architectures and Compilers, Jan. 2008, pp.337-352.
Sun X H,Wang D. Concurrent average memory access time. IEEE Computer, 2014, 47(5): 74–80.
Article Google Scholar
Sun X H. Concurrent-AMAT: A mathematical model for Big Data access. HPC Magazine. http://www.hpcmagazine.eu/state-of-the-art/c-amat-a-model-for-big-data-access/, May 2014.
Karkhanis T, Smith J E. A day in the life of a data cache miss. In Proc. the 2nd Workshop on Memory Performance Issues, May 2002.
Binkert N, Beckmann B, Black G et al. The gem5 simulator. ACM SIGARCH Computer Architecture News, 2011, 39(2): 1–7.
Article Google Scholar
Rosenfeld P, Cooper-Balis E, Jacob B. DRAMSim2: A cycle accurate memory system simulator. Computer Architecture Letters, 2011, 10(1): 16–19.
Article Google Scholar
Spradling C D. SPEC CPU2006 benchmark tools. ACM SIGARCH Computer Architecture News, 2007, 35(1): 130-134.
Article Google Scholar
Wu Y, Chen Y, Chen T et al. An elastic architecture adaptable to various application scenarios. Journal of Computer Science and Technology, 2014, 29(2): 227–238.
Article Google Scholar
Mutlu O, Stark J, Wilkerson C et al. Runahead execution: An alternative to very large instruction windows for out-oforder processors. In Proc. the 9th International Symposium on High-Performance Computer Architecture, Feb. 2003, pp.129-140.
Ketterlin A, Clauss P. Profiling data-dependence to assist parallelization: Framework, scope, and optimization. In Proc. the 45th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2012, pp.437-448.
Sun X H, Wang D. APC: A performance metric of memory systems. ACM SIGMETRICS Performance Evaluation Review, 2012, 40(2): 125–130.
Article Google Scholar
Wang D, Sun X H. APC: A novel memory metric and measurement methodology for modern memory system. IEEE Transactions on Computers, 2014, 63(7): 1626–1639.
Article Google Scholar
Van Craeynest K, Jaleel A, Eeckhout L et al. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In Proc. the 39th Annual International Symposium on Computer Architecture (ISCA), June 2012, pp.213-224.
Wang R, Chen L, Pinkston T M. An analytical performance model for partitioning off-chip memory bandwidth. In Proc. the 27th IEEE International Symposium on Parallel and Distributed Processing, May 2013, pp.165-176.
Kurian G, Khan O, Devadas S. The locality-aware adaptive cache coherence protocol. In Proc. the 40th Annual International Symposium on Computer Architecture, June 2013, pp.523-534.
Iakymchuk R, Bientinesi P. Modeling performance through memory-stalls. ACM SIGMETRICS Performance Evaluation Review, 2012, 40(2): 86–91.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Illinois Institute of Technology, Chicago, IL, 60616-3793, U.S.A.
Yu-Hang Liu & Xian-He Sun

Authors

Yu-Hang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xian-He Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu-Hang Liu.

Additional information

Special Section on Applications and Industry

The work was supported in part by the National Science Foundation of USA under Grant Nos. CNS-1162540, CCF-0937877, and CNS-0751200.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, YH., Sun, XH. Reevaluating Data Stall Time with the Consideration of Data Access Concurrency. J. Comput. Sci. Technol. 30, 227–245 (2015). https://doi.org/10.1007/s11390-015-1517-2

Download citation

Received: 21 November 2014
Revised: 08 January 2015
Published: 13 March 2015
Issue Date: March 2015
DOI: https://doi.org/10.1007/s11390-015-1517-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reevaluating Data Stall Time with the Consideration of Data Access Concurrency

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

A Modern Primer on Processing in Memory

In-memory database acceleration on FPGAs: a survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reevaluating Data Stall Time with the Consideration of Data Access Concurrency

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

A Modern Primer on Processing in Memory

In-memory database acceleration on FPGAs: a survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation