A Parallel System Architecture Based on Dynamically Configurable Shared Memory Clusters

  • Marek Tudruj
  • Łukasz Masko
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2328)


The paper presents a new architectural solution for parallel systems built of shared memory processor clusters. The system is based on dynamically run-time reconfigurable multi-processor clusters; each organized around a local shared memory module placed in a common address space. Each memory module is accessed by a local cluster bus and a common inter-cluster bus. Programs are organized accordingly to their macro dataflow graphs in which tasks and communication are so defined, as to eliminate reloading of data caches during task execution. The behaviour of the proposed system has been evaluated by simulation based on an extended macro dataflow graph representation that includes modelling of data bus arbiters in the system. Program distribution into dynamic processor clusters assumes run—time switching of processors between busses and memory modules. It can reduce contention on data busses. CG algorithm execution in the proposed architecture shows speed—up greater than 4 when 5 busses are applied instead of one.


Shared Memory Memory Module Data Cache Shared Memory System Task Node 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    J. Protic, M. Tomasevic, V. Milutinovic: A Survey of Shared Memory Systems, Proc of the 28th An. Hawaii Conf. of System Sciences, Maui, Hawai, Jan. 1995, pp. 74–84.Google Scholar
  2. 2.
    D. Sima, T. Fountain, P. Kacsuk: Advanced Computer Architectures; A Design Space Approach, Addison-Wesley, 1997.Google Scholar
  3. 3.
    Y. Kanaka, M. Matsuda, M. Ando, K. Kazuto, M. Sato: “COMPaS”: A Pentium Pro PC—based SMP Cluster and its Experience, IPPS Workshop on Personal Computer Based Networks of Workstations, LNCS 1388, pp. 486–497. 1998.Google Scholar
  4. 4.
    Y. Kanaka, M. Matsuda, M. Ando, K. Kazuto, M. Sato: “Performance Improvement by Overlapping Computation and Communication on SMP Clusters”, Inernational Conference on PDPTA’ 98, Vol. 1, 1998, pp. 275–282.Google Scholar
  5. 5.
    Pentium Pro Cluster Workshop,
  6. 6.
    T. Ikedo, J. Yamada, Y. Nonoyama, J. Kimura, M. Yoshida: An Architecture based on the Memory Mapped Node Addressing in Reconfigurable Interconnection Network, 2nd Aizu Int’l Symposium on Parallel Algorithms /Architecture Synthesis, Aizu—Wakamatsu, Japan, March 1997, pp. 50–57.CrossRefGoogle Scholar
  7. 7.
    Scalable Clusters of Commodity Computers,
  8. 8.
    N.J. Boden, D. Cohen et al.: Myrinet — Gigabit—per—second Local—Area Network, IEEE MICRO, Vol. 15, No. 1, 1996, pp. 29–36.CrossRefGoogle Scholar
  9. 9.
    Multimax Technical Summary, Encore Computer Corporation, March 1987.Google Scholar
  10. 10.
    D. Lenoski et al.: The Stanford Dash multi—processor, IEEE Computer, Vol. 25, N. 3, 1992, pp. 63–79.CrossRefGoogle Scholar
  11. 11.
    Convex Exemplar Architecture, Convex Press, 1994, p. 239.Google Scholar
  12. 12.
    D.M. Tullsen, S.J. Eggers: Effective Cache Pre—fetching on Bus Based Multiprocessors, ACM Trans. on Computer Systems, Vol. 13, N. 1 Feb. 1995, pp. 57–88.CrossRefGoogle Scholar
  13. 13.
    D.A. Koufaty et al.: Data Forwarding in Scaleable Shared Memory Multi-Processors, IEEE Trans. on Parallel and Distr. Technology, Vol. 7, N. 12, 1996, pp. 1250–1264.CrossRefGoogle Scholar
  14. 14.
    A. Milenkovic, V. Milutinovic: Cache Injection: A Novel Technique for Tolerating Memory Latency in Bus—Based SMPs, Proceedings of the Euro—Par 2000, LNCS 1900, Springer Verlag, 2000, pp. 558–566.CrossRefGoogle Scholar
  15. 15.
    T. Lang, M. Valero, I. Alegre: Bandwidth of Crossbar and Multiple—Bus Connections for Multiprocessors, IEEE Trans. on EC, V.C-11 N.12, 1982, pp. 1237–1234.Google Scholar
  16. 16.
    C.R. Das, L.N. Bhuyan: Bandwidth Availability of Multiple—Bus Multiprocessors, IEEE Trans. on Computers, Vol. C-34, N. 10, Oct. 1985, pp. 918–926.CrossRefGoogle Scholar
  17. 17.
    Q. Yang, L.N. Bhuyan: Performance of Multiple-Bus Interconnections for Multiprocessors, J. of Parallel and Distr. Computing 8, (March 1990), pp. 267–273.Google Scholar
  18. 18.
    T. Thompson: The World’s Fastest Computers, BYTE, January 1996, pp. 45–64.Google Scholar
  19. 19.
    S.L. Scott: Synchronization and Communication in the T3E Multiprocessor, Proceedings of the 7th ASPLOS Conference, 1996.Google Scholar
  20. 20.
    J.-M. Adamo, L. Trejo: Programming Environment for Phase—Reconfigurable Parallel Programming on Supernode, Journal of Parallel and Distributed Computing, Vol. 23., 1994, pp. 278–292CrossRefGoogle Scholar
  21. 21.
    M. Tudruj, L. Masko: Task Scheduling for Dynamically Reconfigurable Multiple SMP Clusters Based on Extended DSC Approach, PPAM 2001 Conference Proceeedings, Naleczów, Poland, 4–6 Sept. 2001, LNCS, Springer Verlag.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Marek Tudruj
    • 1
  • Łukasz Masko
    • 1
  1. 1.Institute of Computer SciencePolish Academy of SciencesWarsawPoland

Personalised recommendations