The Journal of Supercomputing

, Volume 12, Issue 1–2, pp 119–136 | Cite as

Pc-based Shared Memory Architecture and Language

  • Dominique Houzet
  • Abdelkrim Fatni


The Image Processing applications require both computing and communication power. The object of the GFLOPS project was to study all aspects concerning the design of such computers. The project's aim was to develop a parallel architecture as well as its software environment to implement these applications efficiently. A development environment, especially a C data-parallel language, has been built for this purpose. The C parallel language presented here, simplifies the use of such architectures by providing the programmer with a global name space and a control mechanism to exploit fine and medium grain parallelism of its applications. The main advantage of our paradigm is that it allows a unique framework to express both data and control parallelism. We have implemented this programming environment on the GFLOPS machine which supports up to 512 processor nodes, which are PC mother boards, connected over a scaleable and cost-effective network, via the PCI-bus, at a constant cost per node. The aim is to obtain at low cost a scaleable virtually shared memory machine. In this paper we discuss the design of the GFLOPS machine and its C parallel language, and evaluate the effectiveness of the mechanisms incorporated. The analysis of the architecture's behaviour was conducted with microbenchmarks and image processing algorithms, written in C.

Image Processing Language Parallel Architecture Evaluation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Houzet, D. and Fatni, A. (1993). A 1-D linearly expandable interconnection network performance analysis. In Conference Proceedings-IEEE Int. Conf. on Application Specific Array Processors (Italy), pp. 572–582.Google Scholar
  2. 2.
    Juvin, D., Basille, J.L., Essafi, H. and Latil, J.Y. (1988). SYMPATI-2, a 1.5 D Processor Array for Image Application. Signal Processing IV: Theories and Apllications. Elsevier Science Publishers B.V. (North Holland).Google Scholar
  3. 3.
    Preston, K. (1989). The Abingdon Cross Benchmark Survey. IEEE Computer, pp. 9–18.Google Scholar
  4. 4.
    Kendall Square Research (1991). KSR1 Principles of Operation. Waltham, MA.Google Scholar
  5. 5.
    Arabnia, H.R. and Bhandarkar, S.M. (1996). Parallel Stereocorrelation on a Reconfigurable Multi-Ring Network. In The Journal of Supercomputing, Kluwer Academic Publishers, Vol 10,No. 3, pp. 243–270.Google Scholar
  6. 6.
    Schmitt, L.A. and Wilson, S.S. (1987). The AIS-5000 Parallel Processor. In Pattern Analysis and Machine Intelligence.Google Scholar
  7. 7.
    Ni, L.M. and McKinley, P.K. (1993). A Survey of Wormhole Routing Techniques in Direct Networks. IEEE Computer, pp. 62–76.Google Scholar
  8. 8.
    Xilinx (1994). The Programmable Gate Array Data Book.Google Scholar
  9. 9.
    Stallman, R. (1994). Using and porting gnu cc. Technical report, GNU.Google Scholar
  10. 10.
    Hatcher, P. and Quinn, M.J. (1991). Data Parallel Programming on MIMD computers. The MIT Press, Cambridge, MA.Google Scholar
  11. 11.
    Culler, D.E. et al. (1993). Parallel Programming in Split-C. University of California, Berkeley.Google Scholar
  12. 12.
    Larus, J.R., Richards, B. and Viswanathan, G. (Nov. 1992). C**: a Large-Grain, Object-Oriented, Data-Parallel Programming Language. Technical report UWTR1126, Computer Science Dept., University of Wisconsin-Madison.Google Scholar
  13. 13.
    Chandy, K.M. and Kesselman, C. (1992). Compositionnal C++: Compositionnal Parallel Programming. Technical report CSTR–92–13, California Institute of Technology.Google Scholar
  14. 14.
    EPPP Project (1994). High Performance C language specification. Technical report, Centre de recherche informatique de MontrÉal. Draft.Google Scholar
  15. 15.
    Essafi, H., Pic, M., Viala, M. and Nicolas, L. (1995). T++: a parallel object oriented language for task and data parallel programming. In Conference Proceedings-IEEE Int. Conf. on Computer Architectures for Machine Perception (Como Italy), pp. 216–220Google Scholar
  16. 16.
    Mowry, T. and Gupta, A. (1991). Tolerating latency through software-controlled prefetching in shared-memory miltiprocessors. JPDC 12(2):87–106.Google Scholar
  17. 17.
    Kim, H.N., Irwin, M.J. and Owens, R.M. (1995). MGAP Applications in Machine Perception. In Conference Proceedings-IEEE CAMP'95 Workshop (Italy), pp. 67–73.Google Scholar
  18. 18.
    Rogers, R.P., MacDuff, I.G. and Tanimoto, S.L. (1995). Systolic Cellular Logic: Architecture and Performance Evaluation. In Conference Proceedings-IEEE CAMP'95 Workshop (Italy), pp. 51–58.Google Scholar
  19. 19.
    Blank, T. (1990). The MasPar MP-1 Architecture. In Conference Proceedings-35th IEEE COMPCON Spring'90, pp. 20–24.Google Scholar
  20. 20.
    Castignolles, N., Cattoen, M. and Larinier, M. (1994). An automatic system for monitoring fish passage at dams. Applications of digital image processing XVII. Andrew G. Tescher Editor. Proc. SPIE 2298.Google Scholar
  21. 21.
    Lenoski, D., Laudon, J., Joe, T., Nakahira, D., Stevens, L., Gupta, A. and Hennessy, J. (1993). The DASH Prototype: Logic Overhead and Performance. IEEE Trans. on Parallel and dist. Syst., 4(1):41–61.Google Scholar
  22. 22.
    ANSI/IEEE Std (1992). 1596–1992 Scalable Coherent Interface.Google Scholar
  23. 23.
    Noakes, M., Wallach, D. and Dally, W. (1993). The J-Machine Multicomputer: An architectural Evaluation. In Conference Proceedings-The 20th Int. Symp. on Computer Architecture, pp. 224–235.Google Scholar
  24. 24.
    Thinking Machine Corporation (1991). The Connection Machine CM-5. Technical Summary, TMC, Boston.Google Scholar
  25. 25.
    Agarwal, A. et al. (1995) The MIT Alewife Machine: Architecture and Performance. In Conference Proceedings-The 22nd Int. Symposium on Computer Architecture, pp. 2–13.Google Scholar
  26. 26.
    Arpaci, R.H., Culler, D.E., Krishnamurthy, A., Steinberg, S.G. and Yelick, K. (1995). Empirical Evaluation of the CRAY-T3D: A compiler Perspective. In Conference Proceedings-The 22nd Int. Symposium on Computer Architecture, pp. 320–331.Google Scholar
  27. 27.
    Kuskin, J. et al. (1994). The Stanford FLASH Multiprocessor. In Conference Proceedings-The 21st Int. Symposium on Computer Architecture, pp. 302–313.Google Scholar
  28. 28.
    Nikhil, R., Papadopoulos, G. and Arvind (1992). *T: A Multithreaded Massively Parallel Architecture. In Conference Proceedings-The 19th Int. Symposium on Computer Architecture, pp. 156–167.Google Scholar
  29. 29.
    Reinhardt, S., Larus, J. and Wood, D. (1994). Tempest and Typhoon: User-Level Shared Memory. In Conference Proceedings-The 21st Int. Symposium on Computer Architecture, pp. 325–336.Google Scholar
  30. 30.
    Fujita, Y., Yamashati, N. and Okazaki, S. (1995). A 64 Parallel Integrated Memory Array Processor and a 30 GIPS Real-Time Vision System. In Conference Proceedings-IEEE CAMP'95 Workshop (Italy), pp. 242–249.Google Scholar
  31. 31.
    Weems, C.C., Levitan, S.P., Hanson, A.R., Riseman, E.M., Shu, D.B. and Nash J.G. (1989). The Image Understanding Architecture. International Journal of Computer Vision, Kluwer Academic Publishers Boston, pp. 251–282.Google Scholar
  32. 32.
    Kuehn, J.T., Siegel, H.J. and Tuomenoksa, D.L. (1985). The use and design of PASM. In Integrated Technology for Parallel Image Processing, ed. S. Levialdi, Academic Press London, pp. 133–152.Google Scholar
  33. 33.
    Olk, J.G.E and Jonker, P.P. (1995). A Programming and Simulation Model of a SIMD-MIMD Architecture for Image processing. In Conference Proceedings-IEEE CAMP'95 Workshop (Italy), pp. 98–105.Google Scholar
  34. 34.
    Li, H. and Maresca, M. (1989). The Polymorphic-Torus Architecture for Computer vision. IEEE Trans. on PAMI, pp. 233–243.Google Scholar

Copyright information

© Kluwer Academic Publishers 1998

Authors and Affiliations

  • Dominique Houzet
    • 1
  • Abdelkrim Fatni
    • 1
  1. 1.IRIT-ENSEEIHT-INPToulouse cedexFrance

Personalised recommendations