Abstract
This paper evaluates the performance benefits and problems associated to netcaches (network caches), or also known as shared remote access caches (RACs), in SMP-based multiprocessors for scientific and engineering and commercial applications. We consider and compare the effects of these caches onto two different memory models: sequential and release consistency. We use SMPs (Symmetric Multiprocessing) nodes as the building blocks for a multiprocessor due to its availability of cost-effective, which makes SMP nodes an attractive choice for modern and future designers.
As scientific/engineering applications, we simulate six applications from the SPLASH-2 benchmark suite. We compare the performance application for these six programs for two alternatives: (i) the baseline architecture (sequential consistency) and (ii) future architectures (release consistency). Such approaches are considered using netcaches in the system and when that cache is not included in the multiprocessor. We stimulate a machine with 32 processors and which are organized into SMP clusters. Similarly, we show the impact of netcaches on these systems for three different databases benchmarks: TPC-B, TPC-C and TPC-D.
Our simulation results show netcaches are more efficiently on commercial applications than scientific and engineering applications. Furthermore, the impact of these caches is more significantly on machines using release consistency than sequential model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Wulf, W.; McKee, S. Hitting the memory wall: Implications of the obvious. Computer Architecture News 23, 1, p.20–24, Mar., 1995.
Saulsbury, A.; Pong, F.; Nowatzyk, A. Missing the memory wall: The case for Processor/Memory integration. In Proc. of ISCA-96, p. 90–101, May, 1996.
Nayfeh, Basem A; Hammond, L.; Olukotun, K. Evaluation of design alternatives for a multiprocessor microprocessor. In Proc. of ISCA-96, May, 1996.
Nayfeh, B.; Olukotun, K. Exploring the Design Space for a Shared-Cache Multiprocessor. In Proc. of ISCA-94, p. 166–175, April, 1994.
Sequent Company. The requirements and performance of enterprise computer solutions: SMP, clustered SMP and MPP. Technical report at http://www.sequent.com/news/papers, 1997.
Clark, Roy. SCI interconnect chipset and adapter: Building large scale enterprise servers with Pentium Pro SHV nodes. Technical report at http://www.dq.com/products/aviion/numa. Data General Corporation, 1997 and Hot Interconnects IV and IC EXPO’96 conferences.
Pentium Pro Family Developer’s Manual. Intel Corportation, 1996.
Lovett, Tom.; Clapp, Russell. STiNG: A CC-NUMA Computer System for the Commercial Marketplace. In Proc. of ISCA-96, May, 1996.
Kofuji, S.T. SPADE-2: A scalable distributed shared-memory parallel architecture. In Proc. of second-NOW workshop/ASPLOS VII, Oct., 1996.
Erlichson, A.; Nayfeh, B.; Singh, J.P.; Olukotun, K. The Benefits of Clustering in Shared Address Space Multiprocessors: An Applications-Driven Investigation. In Proc. of the Supercomputing, 1995.
Mukherjee, S.S.; Falsaki, B.; Hill, M. D.; Wood, D. A. Coherent Network Interface for Fine-Grain Communication. In Proc. of ISCA-96, May, 1996.
Mori, S. et Al. A Distributed Shared Memory Multiprocessor: ASURA — Memory and Caches Architectures. In Proc. of Super-computing 94, Vol. I, p. 740–749, 1994.
Moreno, E.D. et Al. A Preliminary Study of Network Caches on the NUMAchine Multiprocessor. Technical Report CSRI-350, Department of Computer Science, University of Toronto, Aug, 1996.
Bennett, J.K.; Fletcher, K.E.; Speight, W.E. The performance value of shared network caches in clustered multiprocessor workstations. In Proc. of the 16th. International Conference on Distributed Computing Systems, May, 1996.
Abdelrahman, Tarek et Al. An Overview of the NUMAchine Multiprocessor Project. In Proceedings Supercomputing Symposium’94, Toronto, Canada, p. 283–295, June, 1994.
Lenosky, Daniel E.; Weber, Wolf-Dietrich. Scalable Shared-Memory Multiprocessing. Morgan Kaufmann Publishers, San Francisco — California, 1995.
Blumrich, Matthias A. Et Al. Virtual memory Mapped Network Interface for the SHRIMP Multicomputer. In Proc. of the 21st Annual Intl. Symp. on Computer Architecture, p. 142–153, April 1994.
Kuskin, J. et Al. The Stanford FLASH Multiprocessor. In Proceedings of the 21st Annual Intl. Symp. on Computer Architecture, p. 302–313, April, 1994.
Veenstra, J.E.; Fowler, R.J. Mint: A front-end for efficient simulation of shared-memory multiprocessors. In Proc. of the 2nd. International workshop on Modeling, Analysis and Simulation of Computer and Telecommunications Systems (MASCOTS’94), 1994.
Wilton, S.J.E.; Jouppi, N.P. CACTI: An Enhanced Cache Access and Cycle Time Model. In IEEE Journal of Solid-State Circuits, Vol. 31, No. 5, p. 677–88, May, 1996.
Woo, S.C.; Ohara, M.; Torrie, E.; Singh, J.P.; Gupta, A. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proc. of the ISCA-95, p. 24–36, June, 1995.
Cao, Q.; Trancoso, P.; et Al. Detailed Characterization of a Quad Pentium Pro Server Running TPC-D. In Proceedings of ICCD-99, US.A., Set., 1999.
Moreno, E.; Ucheroni, M.L.; Kofuji. S.T. Cache Performance on Parallel Hash Join Algorithms, HPCS-99, Kingston, Canada, 1999.
Barroso, A.L.; Gharachorloo, K.; Bugnion, E. Memory System Charaterization of Commercial Workloads. In Proceedings of ISCA-99, June, 1998.
DeSota, D.; Forester, R. Effectiveness of Remote Cache in a NUMA System. In Proceedings of Workshop on Computer Architecture Evaluation using Commercial Workloads. Feb., 1999.
Trancoso, P.; Larriba-Pey, J.L.; Zhang, Z.; Torrellas, J. The Memory Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors. In Proceedings of HPCA-97, Feb. 1997.
Transaction Processing Performance Council. TPC Benchmark D Standard Specification Revision 1.3.1., 1998.
Transaction Processing Performance Council. TPC Benchmark C Standard Specification Revision 3.4.., 1998.
Keeton,K.; Patterson, D.; He, Y.; Raphael, R.; Baker, W. Performance Characterization of a Quad Pentium Pro SMP using OLTP Workloads. In Proceedings of ISCA-98. June, 1998.
Ranganathan, P.; Gharachorloo, K.; Adve, S.; Barroso, L.A. Performance of Database Workloads on Shared-Memory Systems with Out-of-order Processors. In Proceedings of ISCA-98. June, 1998.
Lo, J.; Barroso, L.A.; Eggers, S.; Gharachorloo, K.; Levy, H.; Parekh, S. An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors. In Proc. of ISCA-98, June, 1998.
Moreno, Edward D. Remote Caches and Prefetching on SMP Cluster Multiprocessors: Architectural Issues. PhD Thesis, University of Sao Paulo, S.P., Brazil.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Science+Business Media New York
About this chapter
Cite this chapter
Moreno, E.D. (2002). Netcaches on Engineering and Commercial Applications. In: Dimopoulos, N.J., Li, K.F. (eds) High Performance Computing Systems and Applications. The Kluwer International Series in Engineering and Computer Science, vol 657. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0849-6_28
Download citation
DOI: https://doi.org/10.1007/978-1-4615-0849-6_28
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5269-3
Online ISBN: 978-1-4615-0849-6
eBook Packages: Springer Book Archive