κNUMA: A Model for Clusters of SMP-Machines

  • Martin Schmollinger
  • Michael Kaufmann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2328)


The κnuma model is a new model of parallel computation, which should be used to develop and analyse algorithms for clusters of smp-blocks (symmetrical multiprocessing). smp-blocks are parallel computers with shared memory to which the few processors have uniform access (uma). The model implies modern directions like hierarchical interconnection, innernode communication (threads and shared memory) and internode communication (message-passing and remote data access). κnuma is developed on top of the widely accepted bsp (bulk-synchronous parallel) model. In this paper, we present an examplifying analysis of the personalized one-to-all broadcast. It will be shown that if we transfer optimal algorithms based on the bsp model directly, there will be a lack of information and so a loss of performance.


Shared Memory Local Bandwidth Symmetrical Multiprocessing General Purpose Model Remote Data Access 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    A. Alexandrov, M. Ionescu, K. Schauser, and C. Scheiman. LogGP: Incorporating long messages into the LogP model — one step closer towards a realistic model for parallel computation. In In Proceedings of the 7th Symposium on Parallel Algorithms and Architectures, Santa Barbara, CA, pages 95–105, Juli 1995.Google Scholar
  2. 2.
    G. Bilardi, P. Codenotti, G. Del Corso, C. Pinotti, and G. Resta. EURO-PAR 97, volume 1300 of Lecture Notes in Computer Science, chapter Broadcast and Other Primitive Operations on Fat-Trees. Springer, August 1997.Google Scholar
  3. 3.
    A. Bäumker, W. Dittrich, and F. M. auf der Heide. Truly efficent parallel algorithms: c-optimal multisearch for an extension of the BSP model. In Proceedings of the European Symposium on Algorithms, pages 17–30, 1995.Google Scholar
  4. 4.
    D. Culler, R. Karp, D. Patterson, A. Sahay, K. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a realistic model of parallel computation. In Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 1–12, Mai 1993.Google Scholar
  5. 5.
    F. Dehne and A. Fabri. Scalable parallel computational geometry for coarse grained multicomputers. In Proceedings of ACM ninth Annual Computational Geometry, pages 298–307, 1993.Google Scholar
  6. 6.
    P. B. Gibbons, Y. Matias, and V. Ramachandran. Can shared-memory model serve as a bridging model for parallel computation? In ACM Symposium on Parallel Algorithms and Architectures, pages 72–83, 1997.Google Scholar
  7. 7.
    A. Grbic, S. Brown, S. Caranci, R. Grindley, M. Gusat, G. Lemieux, K. Loveless, N. Manjikian, S. Srbljic, M. Stumm, Z. Vranesic, and Z. Zilic. Design and implementation of the NUMAchine multiprocessor. In Proceedings of the 35th IEEE Design Automation Conference, San Francisco, CA, June 1998.Google Scholar
  8. 8.
    R. Grindley. The NUMAschine multiprocessor. In Proceedings of the international conference on parallel processing, Toronto Canada, August 2000.Google Scholar
  9. 9.
    D. B. Gustavson and Q. Li. The scalable coherant interface (SCI). IEEE Communications Magazine, 34(5):52–63, 1996.CrossRefGoogle Scholar
  10. 10.
    S. E. Hambrusch and A. Khokhar. C 3: A parallel model for coarse-grained machines. Journal on Parallel and Distributed Computing, 32(2):139–154, 1996.CrossRefGoogle Scholar
  11. 11.
    H. Hellwagner and A. Reinefeld, editors. SCI-Scalable Coherent Interface, volume 1734 of Lecture Notes in Computer Science. Springer Verlag, 1999.Google Scholar
  12. 12.
    B. Juurlink and H. Wijshoff. EURO-PAR 96, Parallel Processing, volume 1124 of Lecture Notes in Computer Science, chapter The E-BSP Model: Incorporating Unbalanced Communication and General Locality into the BSP Model, pages 339–347. Springer, August 1996.CrossRefGoogle Scholar
  13. 13.
    K. G. Sevcik and S. Zhou. Performance benefits and limitations of large NUMA multiprocessors. In Performance, pages 183–204, Rome, Italy, September 1993.Google Scholar
  14. 14.
    Y. Tanaka, M. Matsuda, M. Ando, K. Kazuto, and M. Sato. IPPS Workshop on Personal Computer Based Networks of Workstations, volume 1388 of Lecture Notes in Computer Science, chapter COMPaS: A Pentium Pro PC-based SMP Cluster and its Experience, pages 486–497. 1998.Google Scholar
  15. 15.
    P. Torre and C. Kruskal. EURO-PAR 96, Parallel Processing, volume 1124 of Lecture Notes in Computer Science, chapter Submachine Locality in the Bulk Synchronous Setting, pages 352–358. Springer, August 1996.CrossRefGoogle Scholar
  16. 16.
    L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103–111, 1990.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Martin Schmollinger
    • 1
  • Michael Kaufmann
    • 1
  1. 1.Wilhelm-Schickhard-Institute for Computer Science, Parallel Computing GroupUniversity of TübingenTübingenGermany

Personalised recommendations