Optimizing Stream Organization to Improve the Performance of Scientific Computing Applications on the Stream Processor

  • Ying Zhang
  • Gen Li
  • Xuejun Yang
  • Kun Zeng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4494)


It is very important to organize streams well to make stream programs take advantage of the parallel computing and memory system of the stream processor effectively, especially for scientific stream programs. In this paper, after analyzing typical scientific programs, we present and characterize two methods to optimize the stream organization: stream reusing and stream transpose. Several representative scientific stream programs with and without our optimization are performed on a stream typical processor simulator. Simulation results show that these methods can improve scientific stream program performance greatly.


Scientific computing stream programming model stream processor cluster SIMD parallel computing stream reusing stream transpose inter-cluster communication 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wulf, W.A., McKee, S.A.: Hitting the memory wall: implications of the obvious. Computer Architecture News. 23(1), 20–24 (1995)CrossRefGoogle Scholar
  2. 2.
    Burger, D., Goodman, J., Kagi, A.: Memory bandwidth limitations of future microprocessors. In: Proceedings of the 23rd International Symposium on Computer Architecture, Philadelphia, PA, pp. 78–89 (1996)Google Scholar
  3. 3.
    Amarasinghe, W.S.: Stream Architectures. In: PACT 2003 (September 27, 2003)Google Scholar
  4. 4.
    Merrimac – Stanford Streaming Supercomputer Project, Stanford University,
  5. 5.
    Dally, W. J., Hanrahan, P., et al.: Merrimac: Supercomputing with Streams, SC2003, Phoenix, Arizona (November 2003)Google Scholar
  6. 6.
    Erez, M., Ahn, J.H., et al.: Merrimac - Supercomputing with Streams. In: Proceedings of the, SIGGRAPH GP^2 Workshop on General Purpose Computing on Graphics Processors, Los Angeles, California (June 2004)Google Scholar
  7. 7.
    Guibin, W., Yuhua, T., et al.: Application and Study of Scientific Computing on Stream Processor, Advances on Computer Architecture (ACA 2006), Chengdu, China (August 2006)Google Scholar
  8. 8.
    Jing, D., Xuejun, Y., et al.: Implementation and Evaluation of Scientific Computing Programs on Imagine, Advances on Computer Architecture (ACA 2006), Chengdu, China (August 2006)Google Scholar
  9. 9.
    The Imagine Project, Stanford University,
  10. 10.
    Kapasi, U. J., Dally, W. J., et al.: The Imagine Stream Processor. In: Processings of the 2002 International Conference on Computer Design (2002)Google Scholar
  11. 11.
    Johnsson, O., Stenemo, M., ul-Abdin, Z.: Programming and Implementation of Streaming Applications. Master’s thesis, Computer and Electrical Engineering Halmstad University (2005)Google Scholar
  12. 12.
    Rixner, M.: Stream Processor Architecture. Kluwer Academic Publishers, Boston (2001)Google Scholar
  13. 13.
    Mattson, P.: A Programming System for the Imagine Media Processor. Dept. of Electrical Engineering. Ph.D. thesis, Stanford University (2002)Google Scholar
  14. 14.
    Mattson, P., et al.: Imagine Programming System Developer’s Guide (2004)Google Scholar
  15. 15.
    Das, A., Mattson, P., et al.: Imagine Programming System User’s Guide 2.0. (June 2004)Google Scholar
  16. 16.
    Kapasi, U. J., Rixner, S., et al.: Programmable Stream Processor. IEEE Computer (August 2003)Google Scholar
  17. 17.
    A Quasi-Minimal Residual Variant Of The Bi-Cgstab Algorithm For Nonsymmetric Systems Chan, T. F., Gallopoulos, E., Simoncini, V., Szeto,T., Tong, C.H.: SIAM Journal on Scientific Computing (1994)Google Scholar
  18. 18.
    Sermulins, J., Thies, W., et al.: Cache Aware Optimization of Stream Programs, LCTES 2005, Chicago, IL (June 2005)Google Scholar
  19. 19.
    Agrawal, S., Thies, W., et al.: Optimizing Stream Programs Using Linear State Space Analysis, CASES 2005, San Francisco, CA (September 2005)Google Scholar
  20. 20.
    Ahn, J. H., Dally, W.J.: Data Parallel Address Architecture. IEEE Computer Architecture Letters, vol.5(1) (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Ying Zhang
    • 1
  • Gen Li
    • 1
  • Xuejun Yang
    • 1
  • Kun Zeng
    • 1
  1. 1.Institute of Computer, National University of Defense Technology, Changsha, 410073China

Personalised recommendations