Implementing and Optimizing a Data-Intensive Hydrodynamics Application on the Stream Processor

  • Ying Zhang
  • Gen Li
  • Xuejun Yang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4707)


Several representative scientific computing applications have been mapped on the stream processor. But most of them are computation-intensive kernels or synthetic benchmarks. In this paper, we implement and optimize a complete data-intensive hydrodynamics application, QNJ-5, on the stream processor which is designed for computation-intensive applications. Different from other stream programs, how to relieve memory access pressure is especially important to this stream program. Simulation results show that StreamQNJ-5 gets an ultimate speedup of 2.97 and 1.11 over original FORTRAN QNJ-5 on a Xeon and Iantium processor, respectively.


data-intensive scientific computing stream processor kernel join loop-carried stream reusing stream transpose 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dally, W.J., Hanrahan, P., Erez, M., Knight, T.J., Labonté, F., Ahn, J.-H., Jayasena, N., Kapasi, U.J., Das, A., Gummaraju, J., Buck, I.: Merrimac: Supercomputing with Streams, SC, November 2003, Phoenix, Arizona (2003)Google Scholar
  2. 2.
    Erez, M., Ahn, J.H., Jayasena, N., Knight, T.J., Das, A., Labonté, F., Gummaraju, J., Dally, W.J., Hanrahan, P., Rosenblum, M.: Merrimac - Supercomputing with Streams. In: Proceedings of the 2004 SIGGRAPH GP2 Workshop on General Purpose Computing on Graphics Processors, June 2004, Los Angeles, California (2004)Google Scholar
  3. 3.
    Merrimac – Stanford Streaming Supercomputer Project, Stanford University,
  4. 4.
    Fatica, M., Jameson, A., Alonso, J.J.: StreamFLO: an Euler Solver for Streaming Architectures, IAA Paper 2004-1090. In: 42nd Aerospace Sciences Meeting and Exhibit Conference, Reno (January 2004)Google Scholar
  5. 5.
    Narayanan, M., Oliker, L., Janin, A., Husbands, P., Li, X.S.: Scientific kernels on VIRAM and imagine media processors. Lawrence Berkeley National Laboratory. Paper LBNL-54908 (October 10, 2002)Google Scholar
  6. 6.
    Erez, M., Ahn, J.H., Garg, A., Dally, W.J., Darve, E.: Analysis and Performance Results of a Molecular Modeling Application on Merrimac, SC’04, Pittsburgh, Pennsylvania, USA (November 2004)Google Scholar
  7. 7.
    Guibin, W., Yuhua, T., et al.: Application and Study of Scientific Computing on Stream Processor. Advances on Computer Architecture (ACA’06), August, Chengdu, China (2006)Google Scholar
  8. 8.
    Jing, D., Xuejun, Y., et al.: Implementation and Evaluation of Scientific Computing Programs on Imagine. Advances on Computer Architecture (ACA’06), Chengdu, China (August 2006)Google Scholar
  9. 9.
    Khailany, B.: The VLSI Implementation and Evaluation of Area- and Energy-Efficient Streaming Media Processors, Ph.D. Thesis, Dept. of Electrical Engineering, Stanford University (2003)Google Scholar
  10. 10.
    Saman Amarasinghe, W.: Stream Architectures. In: Malyshkin, V. (ed.) PaCT 2003. LNCS, vol. 2763, Springer, Heidelberg (2003)Google Scholar
  11. 11.
    Rixner, M.: Stream Processor Architecture. Kluwer Academic Publishers, Boston, MA (2001)Google Scholar
  12. 12.
    Kapasi, U.J., Rixner, S., et al.: Programmable Stream Processor. IEEE Computer, Los Alamitos (2003)Google Scholar
  13. 13.
    Mattson, P.: A Programming System for the Imagine Media Processor. Dept.of Electrical Engineering. Ph.D. thesis, Stanford University (2002) Google Scholar
  14. 14.
    Mattson, P., et al.: Imagine Programming System Developer’s Guide (2004)Google Scholar
  15. 15.
    Das, A., Mattson, P., et al.: Imagine Programming System User’s Guide 2.0 (June 2004)Google Scholar
  16. 16.
    The Imagine Project, Stanford University,
  17. 17.
    Kapasi, U.J., Dally, W.J., et al.: The Imagine Stream Processor. In: Processings of the 2002 International Conference on Computer Design (2002)Google Scholar
  18. 18.
    Chan, T.F., Gallopoulos, E., Simoncini, V., Szeto, T., Tong, C.H.: A Quasi-Minimal Residual Variant Of The Bi-Cgstab Algorithm For Nonsymmetric Systems. SIAM Journal on Scientific Computing (1994)Google Scholar
  19. 19.
    Rixner, S., Dally, W.J., Kapasi, U.J., Mattson, P., Owens, J.D.: Memory Access Scheduling. In: 27th Annual International Symposium on Computer Architecture, Vancouver, Canada, pp. 128–138 (June 2000)Google Scholar
  20. 20.
    Lawson, C.L., Hanson, R.J., Kincaid, D., Krogh, F.T.: Basic Linear Algebra Subprograms for FORTRAN Usage. ACM Trans. Math. Soft. 5, 308–323 (1979)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Ying Zhang
    • 1
  • Gen Li
    • 1
  • Xuejun Yang
    • 1
  1. 1.National Laboratory for Paralleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan, 410073P.R. of China

Personalised recommendations