Advertisement

Stanford DASH multiprocessor: The hardware and software approach

  • Anoop Gupta
Invited Lecture
Part of the Lecture Notes in Computer Science book series (LNCS, volume 605)

Keywords

Computer Architecture Processing Node Cache Coherence Protocol Multiprocessor Architecture Program Language Design 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Bibliography

General DASH papers

  1. [1]
    Daniel Lenoski, Jim Laudon, Truman Joe, Dave Nakahira, Luis Stevens, Anoop Gupta, and John Hennessy. The DASH Prototype: Implementation and Performance. In Proceedings of 19th International Symposium on Computer Architecture. May, 1992. To appear.Google Scholar
  2. [2]
    Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Wolf-Dietrich Weber, Anoop Gupta, John Hennessy, Mark Horowitz, and Monica Lam. The Stanford DASH Multiprocessor. IEEE Computer 25(3), March, 1992.Google Scholar
  3. [3]
    Daniel Lenoski. The Design and Analysis of DASH: A Scalable Directory-Based Multiprocessor. Technical Report CSL-TR-92-507, Computer Systems Laboratory, Stanford University, 1992.Google Scholar

DASH Coherence Protocol and Directory Structure

  1. [1]
    Anoop Gupta and Wolf-Dietrich Weber. Cache Invalidation Patterns in Shared-Memory Multiprocessors. 1992. To appear in IEEE Transaction on Computers.Google Scholar
  2. [2]
    Anoop Gupta, Wolf-Dietrich Weber, and Todd Mowry. Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes. In Proceedings of International Conference on Parallel Processing. August, 1990.Google Scholar
  3. [3]
    Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta and John Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proceedings of 17th International Symposium on Computer Architecture. May, 1990.Google Scholar

Architecture tools

  1. [1]
    Helen Davis, Stephen Goldschmidt and John Hennessy. Multiprocessor Simulation and Tracing using Tango. In Proceedings of the 1991 International Conference on Parallel Processing. August, 1991.Google Scholar

Latency hiding and tolerating techniques

  1. [1]
    Anoop Gupta, John Hennessy, Kourosh Gharachorloo, Todd Mowry, and Wolf-Dietrich Weber. Comparative Evaluation of Latency Reducing and Tolerating Techniques. In Proceedings of 18th International Symposium on Computer Architecture. May, 1991.Google Scholar
  2. [2]
    Kourosh Gharachorloo, Anoop Gupta, John Hennessy. Hiding Memory Latency Using Dynamic Scheduling in Shared-Memory Multiprocessors. In Proceedings of 19th International Symposium on Computer Architecture. May, 1992. To appear.Google Scholar
  3. [3]
    Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. Performance Evaluation of Memory Consistency Models for Shared-Memory Multiprocessors. In Proceedings of Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. April, 1991.Google Scholar
  4. [4]
    Kourosh Gharachorloo, Daniel Lenoski, James Laudon, Phillip Gibbons, Anoop Gupta and John Hennessy. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors. In Proceedings of 17th International Symposium on Computer Architecture. May, 1990.Google Scholar
  5. [5]
    Todd Mowry and Anoop Gupta. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors. Journal of Parallel and Distributed Computing 12(6), June, 1991.Google Scholar
  6. [6]
    Wolf-Dietrich Weber and Anoop Gupta. Exploring the Benefits of Multiple Hardware Contexts in a Multiprocessor Architecture: Preliminary Results. In Proceedings of 16th International Symposium on Computer Architecture. June, 1989.Google Scholar

Other architectural studies

  1. [1]
    Per Stenstrom, Truman Joe, and Anoop Gupta. Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures. In Proceedings of 19th International Symposium on Computer Architecture. May, 1992. To appear.Google Scholar

Operating systems

  1. [1]
    Josep Torrellas, Anoop Gupta, and John Hennessy. Characterizing the Cache Performance and Synchronization Behavior of a Multiprocessor Operating System. Technical Report CSL-TR-92-502, Computer Systems Laboratory, Stanford University, January, 1992.Google Scholar
  2. [2]
    Anoop Gupta, Andrew Tucker, and Luis Stevens. Making Effective Use of Shared-Memory Multiprocessors: The Process Control Approach. Technical Report CSL-TR-91-475, Computer Systems Laboratory, Stanford University, May, 1991.Google Scholar
  3. [3]
    Anoop Gupta, Andrew Tucker, and Shigeru Urushibara. The Impact of Operating System Scheduling Policies and Synchronization Methods on the Performance of Parallel Applications. In Proceedings of ACM SIGMETRICS. May, 1991.Google Scholar
  4. [4]
    Andrew Tucker and Anoop Gupta. Process Control and Scheduling Issues for Multiprogrammed Shared-Memory Multiprocessors. In Proceedings of 12th ACM Symposium on Operating Systems Principles. December, 1989.Google Scholar

Programming languages

  1. [1]
    Rohit Chandra, Anoop Gupta and John Hennessy. Integrating Concurrency and Data Abstraction in a Parallel Programming Language. Technical Report CSL-TR-92-511, Computer Systems Laboratory, Stanford University, February, 1992.Google Scholar
  2. [2]
    Rohit Chandra, Anoop Gupta, and John Hennessy. COOL: A Language for Parallel Programming. Research Monographs in Parallel and Distributed Computing. Languages and Compilers for Parallel Computing. Edited by Chris Jesshope and David Klappholz, The MIT Press, 1990.Google Scholar
  3. [3]
    Monica Lam and Martin Rinaid. Coarse-Grain Parallel Programming in Jade. In Proceedings of Third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. April, 1991.Google Scholar
  4. [4]
    Martin Rinaid and Monica Lam. Semantic Foundations of Jade. In Proc. 19th Annual ACM Symposium on Principles of Programming Languages. Jan, 1992.Google Scholar

Compilers

  1. [1]
    Michael Wolf and Monica Lam. A Loop Transformation Theory and Algorithm to Maximize Parallelism. IEEE Trans. on Parallel and Distributed Systems, Oct, 1991.Google Scholar
  2. [2]
    Dror Maydan, John Hennessy, and Monica Lam. Efficient and Exact Data Dependence Analysis. In Proc. ACM SIGPLAN 91 Conference on Programming Language Design and Implementation. Jun, 1991.Google Scholar
  3. [3]
    Michael Wolf and Monica Lam. A Data Locality Optimizing Algorithm. In Proc. ACM SIGPLAN 91 Conference on Programming Language Design and Implementation. Jun, 1991.Google Scholar
  4. [4]
    Monica Lam, Edward Rothberg, and Michael Wolf. The Cache Performance and Optimizations of Blocked Algorithms. In Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV). Apr., 1991.Google Scholar

Performance tools

  1. [1]
    Margaret Martonosi, Anoop Gupta, and Tom Anderson. MemSpy: Analyzing Memory System Bottlenecks in Programs. In Proceedings of ACM SIGMETRICS. May, 1992. To appear.Google Scholar
  2. [2]
    Aaron Goldberg and John Hennessy. Performance Debugging Shared Memory Multiprocessor Programs with MTOOL. In Proceedings of Supercomputing '91. November, 1991.Google Scholar

Applications

  1. [1]
    Jaswinder Pal Singh, Wolf-Dietrich Weber, and Anoop Gupta. SPLASH: Stanford Parallel Applications for Shared Memory. Technical Report CSL-TR-91-469, Computer Systems Laboratory, Stanford University, April, 1991.Google Scholar
  2. [2]
    Larry Soule and Anoop Gupta. An Evaluation of Chandy-Misra-Bryant Algorithm for Digital Logic Simulation. 1992. To appear in ACM Transactions on Modeling and Computer Simulation.Google Scholar
  3. [3]
    Jaswinder Pal Singh, Chris Holt, Takashi Totsuka, Anoop Gupta and John Hennessy. Load Balancing and Data Locality in Hierarchical N-Body Methods. Technical Report CSL-TR-92-505, Computer Systems Laboratory, Stanford University, February, 1992.Google Scholar
  4. [4]
    Jaswinder Pal Singh, John Hennessy, and Anoop Gupta. Implications of Hierarchical N-Body Methods for Multiprocessor Architecture. Technical Report CSL-TR-92-506, Computer Systems Laboratory, Stanford University, February, 1992.Google Scholar
  5. [5]
    Jaswinder Pal Singh and John Hennessy. Finding and Exploiting Parallelism in an Ocean Simulation Program: Experience, Results and Implications. 1992. To appear in Journal of Parallel and Distributed Computing.Google Scholar
  6. [6]
    Edward Rothberg and Anoop Gupta. The Performance Impact of Data Reuse in Parallel Dense Cholesky Factorization. Technical Report STAN-CS-92-1401, Computer Science Department, Stanford University, January, 1992.Google Scholar
  7. [7]
    Edward Rothberg and Anoop Gupta. An Evaluation of Left-Looking, Right-Looking, and Multifrontal Approaches to Sparse Cholesky Factorization on Hierarchical-Memory Machines. 1992. To appear in International Journal of High Speed Computing. Also available as Stanford University technical report STAN-CS-91-1377/CSL-TR-91-487, August 1991.Google Scholar
  8. [8]
    Edward Rothberg and Anoop Gupta. Parallel ICCG on a Hierarchical Memory Multiprocessor-Addressing the Triangular Solve Bottleneck. 1992. To appear in Parallel Computing.Google Scholar
  9. [9]
    Edward Rothberg and Anoop Gupta. Techniques for Improving the Performance of Sparse Matrix Factorization on Multiprocessor Workstations. In Proceedings of Supercomputing '90. November, 1990.Google Scholar
  10. [10]
    Edward Rothberg and Anoop Gupta. Efficient Sparse Matrix Factorization on High-Performance Workstations-Exploiting the Memory Hierarchy. ACM Transactions on Mathematical Software 17(3), September, 1991.Google Scholar
  11. [11]
    Margaret Martonosi and Anoop Gupta. Tradeoffs in Message-Passing and Shared-Memory Implementations of a Standard Cell Router. In Proceedings of International Conference on Parallel Processing. August, 1989.Google Scholar
  12. [12]
    Edward Rothberg and Anoop Gupta. Experiences Implementing a Parallel ATMS on a Shared-Memory Multiprocessor. In International Joint Conference on Artificial Intelligence. August, 1989.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1992

Authors and Affiliations

  • Anoop Gupta
    • 1
  1. 1.Computer Systems LaboratoryStanford University

Personalised recommendations