Skip to main content
Log in

Pc-based Shared Memory Architecture and Language

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The Image Processing applications require both computing and communication power. The object of the GFLOPS project was to study all aspects concerning the design of such computers. The project's aim was to develop a parallel architecture as well as its software environment to implement these applications efficiently. A development environment, especially a C data-parallel language, has been built for this purpose. The C parallel language presented here, simplifies the use of such architectures by providing the programmer with a global name space and a control mechanism to exploit fine and medium grain parallelism of its applications. The main advantage of our paradigm is that it allows a unique framework to express both data and control parallelism. We have implemented this programming environment on the GFLOPS machine which supports up to 512 processor nodes, which are PC mother boards, connected over a scaleable and cost-effective network, via the PCI-bus, at a constant cost per node. The aim is to obtain at low cost a scaleable virtually shared memory machine. In this paper we discuss the design of the GFLOPS machine and its C parallel language, and evaluate the effectiveness of the mechanisms incorporated. The analysis of the architecture's behaviour was conducted with microbenchmarks and image processing algorithms, written in C.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Houzet, D. and Fatni, A. (1993). A 1-D linearly expandable interconnection network performance analysis. In Conference Proceedings-IEEE Int. Conf. on Application Specific Array Processors (Italy), pp. 572–582.

  2. Juvin, D., Basille, J.L., Essafi, H. and Latil, J.Y. (1988). SYMPATI-2, a 1.5 D Processor Array for Image Application. Signal Processing IV: Theories and Apllications. Elsevier Science Publishers B.V. (North Holland).

    Google Scholar 

  3. Preston, K. (1989). The Abingdon Cross Benchmark Survey. IEEE Computer, pp. 9–18.

  4. Kendall Square Research (1991). KSR1 Principles of Operation. Waltham, MA.

  5. Arabnia, H.R. and Bhandarkar, S.M. (1996). Parallel Stereocorrelation on a Reconfigurable Multi-Ring Network. In The Journal of Supercomputing, Kluwer Academic Publishers, Vol 10,No. 3, pp. 243–270.

  6. Schmitt, L.A. and Wilson, S.S. (1987). The AIS-5000 Parallel Processor. In Pattern Analysis and Machine Intelligence.

  7. Ni, L.M. and McKinley, P.K. (1993). A Survey of Wormhole Routing Techniques in Direct Networks. IEEE Computer, pp. 62–76.

  8. Xilinx (1994). The Programmable Gate Array Data Book.

  9. Stallman, R. (1994). Using and porting gnu cc. Technical report, GNU.

  10. Hatcher, P. and Quinn, M.J. (1991). Data Parallel Programming on MIMD computers. The MIT Press, Cambridge, MA.

    Google Scholar 

  11. Culler, D.E. et al. (1993). Parallel Programming in Split-C. University of California, Berkeley.

    Google Scholar 

  12. Larus, J.R., Richards, B. and Viswanathan, G. (Nov. 1992). C**: a Large-Grain, Object-Oriented, Data-Parallel Programming Language. Technical report UWTR1126, Computer Science Dept., University of Wisconsin-Madison.

  13. Chandy, K.M. and Kesselman, C. (1992). Compositionnal C++: Compositionnal Parallel Programming. Technical report CSTR–92–13, California Institute of Technology.

  14. EPPP Project (1994). High Performance C language specification. Technical report, Centre de recherche informatique de MontrÉal. Draft.

  15. Essafi, H., Pic, M., Viala, M. and Nicolas, L. (1995). T++: a parallel object oriented language for task and data parallel programming. In Conference Proceedings-IEEE Int. Conf. on Computer Architectures for Machine Perception (Como Italy), pp. 216–220

  16. Mowry, T. and Gupta, A. (1991). Tolerating latency through software-controlled prefetching in shared-memory miltiprocessors. JPDC 12(2):87–106.

    Google Scholar 

  17. Kim, H.N., Irwin, M.J. and Owens, R.M. (1995). MGAP Applications in Machine Perception. In Conference Proceedings-IEEE CAMP'95 Workshop (Italy), pp. 67–73.

  18. Rogers, R.P., MacDuff, I.G. and Tanimoto, S.L. (1995). Systolic Cellular Logic: Architecture and Performance Evaluation. In Conference Proceedings-IEEE CAMP'95 Workshop (Italy), pp. 51–58.

  19. Blank, T. (1990). The MasPar MP-1 Architecture. In Conference Proceedings-35th IEEE COMPCON Spring'90, pp. 20–24.

  20. Castignolles, N., Cattoen, M. and Larinier, M. (1994). An automatic system for monitoring fish passage at dams. Applications of digital image processing XVII. Andrew G. Tescher Editor. Proc. SPIE 2298.

  21. Lenoski, D., Laudon, J., Joe, T., Nakahira, D., Stevens, L., Gupta, A. and Hennessy, J. (1993). The DASH Prototype: Logic Overhead and Performance. IEEE Trans. on Parallel and dist. Syst., 4(1):41–61.

    Google Scholar 

  22. ANSI/IEEE Std (1992). 1596–1992 Scalable Coherent Interface.

  23. Noakes, M., Wallach, D. and Dally, W. (1993). The J-Machine Multicomputer: An architectural Evaluation. In Conference Proceedings-The 20th Int. Symp. on Computer Architecture, pp. 224–235.

  24. Thinking Machine Corporation (1991). The Connection Machine CM-5. Technical Summary, TMC, Boston.

    Google Scholar 

  25. Agarwal, A. et al. (1995) The MIT Alewife Machine: Architecture and Performance. In Conference Proceedings-The 22nd Int. Symposium on Computer Architecture, pp. 2–13.

  26. Arpaci, R.H., Culler, D.E., Krishnamurthy, A., Steinberg, S.G. and Yelick, K. (1995). Empirical Evaluation of the CRAY-T3D: A compiler Perspective. In Conference Proceedings-The 22nd Int. Symposium on Computer Architecture, pp. 320–331.

  27. Kuskin, J. et al. (1994). The Stanford FLASH Multiprocessor. In Conference Proceedings-The 21st Int. Symposium on Computer Architecture, pp. 302–313.

  28. Nikhil, R., Papadopoulos, G. and Arvind (1992). *T: A Multithreaded Massively Parallel Architecture. In Conference Proceedings-The 19th Int. Symposium on Computer Architecture, pp. 156–167.

  29. Reinhardt, S., Larus, J. and Wood, D. (1994). Tempest and Typhoon: User-Level Shared Memory. In Conference Proceedings-The 21st Int. Symposium on Computer Architecture, pp. 325–336.

  30. Fujita, Y., Yamashati, N. and Okazaki, S. (1995). A 64 Parallel Integrated Memory Array Processor and a 30 GIPS Real-Time Vision System. In Conference Proceedings-IEEE CAMP'95 Workshop (Italy), pp. 242–249.

  31. Weems, C.C., Levitan, S.P., Hanson, A.R., Riseman, E.M., Shu, D.B. and Nash J.G. (1989). The Image Understanding Architecture. International Journal of Computer Vision, Kluwer Academic Publishers Boston, pp. 251–282.

    Google Scholar 

  32. Kuehn, J.T., Siegel, H.J. and Tuomenoksa, D.L. (1985). The use and design of PASM. In Integrated Technology for Parallel Image Processing, ed. S. Levialdi, Academic Press London, pp. 133–152.

    Google Scholar 

  33. Olk, J.G.E and Jonker, P.P. (1995). A Programming and Simulation Model of a SIMD-MIMD Architecture for Image processing. In Conference Proceedings-IEEE CAMP'95 Workshop (Italy), pp. 98–105.

  34. Li, H. and Maresca, M. (1989). The Polymorphic-Torus Architecture for Computer vision. IEEE Trans. on PAMI, pp. 233–243.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Houzet, D., Fatni, A. Pc-based Shared Memory Architecture and Language. The Journal of Supercomputing 12, 119–136 (1998). https://doi.org/10.1023/A:1007941813399

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1007941813399

Navigation