κNUMA: A Model for Clusters of SMP-Machines
The κnuma model is a new model of parallel computation, which should be used to develop and analyse algorithms for clusters of smp-blocks (symmetrical multiprocessing). smp-blocks are parallel computers with shared memory to which the few processors have uniform access (uma). The model implies modern directions like hierarchical interconnection, innernode communication (threads and shared memory) and internode communication (message-passing and remote data access). κnuma is developed on top of the widely accepted bsp (bulk-synchronous parallel) model. In this paper, we present an examplifying analysis of the personalized one-to-all broadcast. It will be shown that if we transfer optimal algorithms based on the bsp model directly, there will be a lack of information and so a loss of performance.
KeywordsShared Memory Local Bandwidth Symmetrical Multiprocessing General Purpose Model Remote Data Access
Unable to display preview. Download preview PDF.
- 1.A. Alexandrov, M. Ionescu, K. Schauser, and C. Scheiman. LogGP: Incorporating long messages into the LogP model — one step closer towards a realistic model for parallel computation. In In Proceedings of the 7th Symposium on Parallel Algorithms and Architectures, Santa Barbara, CA, pages 95–105, Juli 1995.Google Scholar
- 2.G. Bilardi, P. Codenotti, G. Del Corso, C. Pinotti, and G. Resta. EURO-PAR 97, volume 1300 of Lecture Notes in Computer Science, chapter Broadcast and Other Primitive Operations on Fat-Trees. Springer, August 1997.Google Scholar
- 3.A. Bäumker, W. Dittrich, and F. M. auf der Heide. Truly efficent parallel algorithms: c-optimal multisearch for an extension of the BSP model. In Proceedings of the European Symposium on Algorithms, pages 17–30, 1995.Google Scholar
- 4.D. Culler, R. Karp, D. Patterson, A. Sahay, K. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a realistic model of parallel computation. In Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 1–12, Mai 1993.Google Scholar
- 5.F. Dehne and A. Fabri. Scalable parallel computational geometry for coarse grained multicomputers. In Proceedings of ACM ninth Annual Computational Geometry, pages 298–307, 1993.Google Scholar
- 6.P. B. Gibbons, Y. Matias, and V. Ramachandran. Can shared-memory model serve as a bridging model for parallel computation? In ACM Symposium on Parallel Algorithms and Architectures, pages 72–83, 1997.Google Scholar
- 7.A. Grbic, S. Brown, S. Caranci, R. Grindley, M. Gusat, G. Lemieux, K. Loveless, N. Manjikian, S. Srbljic, M. Stumm, Z. Vranesic, and Z. Zilic. Design and implementation of the NUMAchine multiprocessor. In Proceedings of the 35th IEEE Design Automation Conference, San Francisco, CA, June 1998.Google Scholar
- 8.R. Grindley. The NUMAschine multiprocessor. In Proceedings of the international conference on parallel processing, Toronto Canada, August 2000.Google Scholar
- 11.H. Hellwagner and A. Reinefeld, editors. SCI-Scalable Coherent Interface, volume 1734 of Lecture Notes in Computer Science. Springer Verlag, 1999.Google Scholar
- 13.K. G. Sevcik and S. Zhou. Performance benefits and limitations of large NUMA multiprocessors. In Performance, pages 183–204, Rome, Italy, September 1993.Google Scholar
- 14.Y. Tanaka, M. Matsuda, M. Ando, K. Kazuto, and M. Sato. IPPS Workshop on Personal Computer Based Networks of Workstations, volume 1388 of Lecture Notes in Computer Science, chapter COMPaS: A Pentium Pro PC-based SMP Cluster and its Experience, pages 486–497. 1998.Google Scholar