Experience in Developing an Open Source Scalable Software Infrastructure in Japan

  • Akira Nishida
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6017)


The Scalable Software Infrastructure for Scientific Computing (SSI) Project was initiated in November 2002, as a five year national project in Japan, for the purpose of constructing a scalable software infrastructure to replace the existing implementations of parallel algorithms in individual scientific fields. The project covered the following four areas: iterative solvers for linear systems, fast integral transforms, their effective implementation for high performance computers of various types, and joint studies with institutes and computer vendors, in order to evaluate the developed libraries for advanced computing environments.

An object-oriented programming model was adopted to enable users to write their parallel codes by just combining elementary mathematical operations. Implemented algorithms are selected from the viewpoint of scalability on massively parallel computing environments. The libraries are freely available via the Internet, and intended to be improved by the feedback from users. Since the first announcement in September 2005, the codes have been downloaded and evaluated by thousands of users at more than 140 organizations around the world.


Computing Environment Conjugate Gradient Method Iterative Solver User Program Twiddle Factor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Nishida, A.: SSI: Overview of simulation software infrastructure for large scale scientific applications (in Japanese). Technical Report 2004-HPC-098, IPSJ (2004)Google Scholar
  2. 2.
    Nishida, A., Suda, R., Hasegawa, H., Nakajima, K., Takahashi, D., Kotakemori, H., Kajiyama, T., Nukada, A., Fujii, A., Hourai, Y., Zhang, S.L., Abe, K., Itoh, S., Sogabe, T.: The Scalable Software Infrastructure for Scientific Computing Project. Kyushu University (2009),
  3. 3.
    Tuminaro, R.S., Heroux, M., Hutchinson, S.A., Shadid, J.N.: Official Aztec User’s Guide, Version 2.1. Technical Report SAND99-8801J, Sandia National Laboratories (1999)Google Scholar
  4. 4.
    Wu, K., Milne, B.: A survey of packages for large linear systems. Technical Report LBNL-45446, Lawrence Berkeley National Laboratory (2000)Google Scholar
  5. 5.
    Balay, S., Buschelman, K., Eijkhout, V., Gropp, W.D., Kaushik, D., Knepley, M.G., McInnes, L.C., Smith, B.F., Zhang, H.: PETSc Users Manual. Technical Report ANL-95/11, Argonne National Laboratory (2004)Google Scholar
  6. 6.
    Dongarra, J., Ltaief, H.: Freely available software for linear algebra on the Web (2009),
  7. 7.
    Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proceedings of the IEEE 93(2), 216–231 (2005)CrossRefGoogle Scholar
  8. 8.
    Moler, C.: Design of an interactive matrix calculator. In: AFIPS National Computer Conference. AFIPS Conference Proceedings, vol. 49, pp. 363–368. AFIPS Press (1980)Google Scholar
  9. 9.
    Kennedy, K., Broom, B., Chauhan, A., Fowler, R., Garvin, J., Koelbel, C., McCosh, C., Mellor-Crummey, J.: Telescoping Languages: A System for Automatic Generation of Domain Languages. Proceedings of the IEEE 93, 387–408 (2005)CrossRefGoogle Scholar
  10. 10.
    Kotakemori, H., Hasegawa, H., Nishida, A.: Performance Evaluation of a Parallel Iterative Method Library using OpenMP. In: Proceedings of the 8th International Conference on High Performance Computing in Asia Pacific Region, pp. 432–436 (2005)Google Scholar
  11. 11.
    Kotakemori, H., Hasegawa, H., Kajiyama, T., Nukada, A., Suda, R., Nishida, A.: Performance evaluation of parallel sparse matrix-vector products on SGI Altix3700. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005 and IWOMP 2006. LNCS, vol. 4315, pp. 153–163. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  12. 12.
    Kotakemori, H., Fujii, A., Hasegawa, H., Nishida, A.: Implementation of Fast Quad Precision Operation and Acceleration with SSE2 for Literative Solver Library (in Japanese). IPSJ Transactions on Advanced Computing Systems 1(1), 73–84 (2008)Google Scholar
  13. 13.
    Nukada, A.: FFTSS: a High Performance Fast Fourier Transform Library. In: Proceedings of the 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. III, pp. 980–983. IEEE Computer Society Press, Washington (2006)Google Scholar
  14. 14.
    Nukada, A., Takahashi, D., Suda, R., Nishida, A.: High Performance FFT on SGI Altix 3700. In: Perrott, R., Chapman, B.M., Subhlok, J., de Mello, R.F., Yang, L.T. (eds.) HPCC 2007. LNCS, vol. 4782, pp. 396–407. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  15. 15.
    Nukada, A., Hourai, Y., Nishida, A., Akiyama, Y.: High Performance 3D Convolution for Protein Docking on IBM Blue Gene. In: Stojmenovic, I., Thulasiram, R.K., Yang, L.T., Jia, W., Guo, M., de Mello, R.F. (eds.) ISPA 2007. LNCS, vol. 4742, pp. 958–969. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  16. 16.
    Kajiyama, T., Nukada, A., Hasegawa, H., Suda, R., Nishida, A.: LAPACK in SILC: Use of a Flexible Application Framework for Matrix Computation Libraries. In: Proceedings of the 8th International Conference on High Performance Computing in Asia Pacific Region, pp. 205–212. IEEE, Washington (2005)CrossRefGoogle Scholar
  17. 17.
    Kajiyama, T., Nukada, A., Hasegawa, H., Suda, R., Nishida, A.: SILC: A Flexible and Environment Independent Interface for Matrix Computation Libraries. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Waśniewski, J. (eds.) PPAM 2005. LNCS, vol. 3911, pp. 928–935. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    Kajiyama, T., Nukada, A., Suda, R., Hasegawa, H., Nishida, A.: A Performance Evaluation Model for the SILC Matrix Computation Framework. In: Proceedings of the IFIP International Conference on Network and Parallel Computing, pp. 93–103. The University of Tokyo, Tokyo (2006)Google Scholar
  19. 19.
    Kajiyama, T., Nukada, A., Suda, R., Hasegawa, H., Nishida, A.: Distributed SILC: An Easy-to-Use Interface for MPI-based Parallel Matrix Computation Libraries. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 860–870. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  20. 20.
    Kajiyama, T., Nukada, A., Suda, R., Hasegawa, H., Nishida, A.: Cloth simulation in the SILC matrix computation framework: A case study. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2007. LNCS, vol. 4967, pp. 1086–1095. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  21. 21.
    Sogabe, T., Sugihara, M., Zhang, S.: An Extension of the Conjugate Residual Method for Solving Nonsymmetric Linear Systems(in Japanese). Transactions of the Japan Society for Industrial and Applied Mathematics 15(3), 445–460 (2005)Google Scholar
  22. 22.
    Abe, K., Sogabe, T., Fujino, S., Zhang, S.: A Product-type Krylov Subspace Method Based on Conjugate Residual Method for Nonsymmetric Coefficient Matrices (in Japanese). IPSJ Transactions on Advanced Computing Systems 48(SIG8(ACS18)), 11–21 (2007)Google Scholar
  23. 23.
    Fujino, S., Fujiwara, M., Yoshida, M.: BiCGSafe method based on minimization of associate residual (in Japanese). Transactions of JSCES 8(20050028), 145–152 (2005), Google Scholar
  24. 24.
    Fujino, S., Onoue, Y.: Estimation of BiCRSafe method based on residual of BiCR method (in Japanese). Technical Report 2007-HPC-111, IPSJ (2007)Google Scholar
  25. 25.
    Saad, Y.: A Flexible Inner-outer Preconditioned GMRES Algorithm. SIAM J. Sci. Stat. Comput. 14, 461–469 (1993)zbMATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    Soonerveld, P., van Gijzen, M.B.: IDR(s): a family of simple and fast algorithms for solving large nonsymmetric systems of linear equations. SIAM J. Sci. Comput. 31, 1035–1062 (2008)CrossRefMathSciNetGoogle Scholar
  27. 27.
    Greenbaum, A.: Iterative Methods for Solving Linear Systems. SIAM, Philadelphia (1997)zbMATHGoogle Scholar
  28. 28.
    Knyazev, A.V.: Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method. SIAM J. Sci. Comput. 23(2), 517–541 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  29. 29.
    Nishida, A.: A Short Survey of Applications and Evaluations of Preconditioned Conjugate Gradient Method for Large Scale Eigenvalue Problems (in Japanese). In: Proceedings of the 2003 Annual Conference, JSIAM, Tokyo, pp. 326–327 (2003)Google Scholar
  30. 30.
    Suetomi, E., Sekimoto, H.: Conjugate gradient like methods and their application to eigenvalue problems for neutron diffusion equation. Ann. Nucl. Energy 18(4), 205–227 (1991)CrossRefGoogle Scholar
  31. 31.
    Saad, Y.: ILUT: a dual threshold incomplete LU factorization. Numerical linear algebra with applications 1(4), 387–402 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  32. 32.
    Li, N., Suchomel, B., Osei-Kuffuor, D., Saad, Y.: ITSOL: ITERATIVE SOLVERS package. In: University of Minnesota (2008),
  33. 33.
    Li, N., Saad, Y., Chow, E.: Crout version of ILU for general sparse matrices. SIAM J. Sci. Comput. 25, 716–728 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  34. 34.
    Kohno, T., Kotakemori, H., Niki, H.: Improving the Modified Gauss-Seidel Method for Z-matrices. Linear Algebra and its Applications 267, 113–123 (1997)zbMATHMathSciNetGoogle Scholar
  35. 35.
    Fujii, A., Nishida, A., Oyanagi, Y.: Evaluation of Parallel Aggregate Creation Orders: Smoothed Aggregation Algebraic Multigrid Method, pp. 99–122. Springer, Berlin (2005)Google Scholar
  36. 36.
    Abe, K., Zhang, S., Hasegawa, H., Himeno, R.: A SOR-base Variable Preconditioned CGR Method (in Japanese). Trans. JSIAM 11(4), 157–170 (2001)Google Scholar
  37. 37.
    Bridson, R., Tang, W.P.: Refining an approximate inverse. J. Comput. Appl. Math. 123, 293–306 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  38. 38.
    Saad, Y.: SPARSKIT: a basic tool kit for sparse matrix computations, version 2. University of Minnesota (1994),
  39. 39.
    Nishida, A., Oyanagi, Y.: Performance Evaluation of Low Level Multithreaded BLAS Kernels on Intel Processor based cc-NUMA Systems. In: Veidenbaum, A., Joe, K., Amano, H., Aiso, H. (eds.) ISHPC 2003. LNCS, vol. 2858, pp. 500–510. Springer, Heidelberg (2003)Google Scholar
  40. 40.
    Duhamel, P., Hollmann, H.: Split-Radix FFT Algorithm. Electron. Lett. 20, 14–16 (1984)CrossRefGoogle Scholar
  41. 41.
    Linzer, E.N., Feig, E.: Implementation of Efficient FFT Algorithms on Fused Multiply-Add Architectures. IEEE Trans. Signal Processing 41, 93–107 (1993)zbMATHCrossRefGoogle Scholar
  42. 42.
    Goedecker, S.: Fast Radix 2,3,4 and 5 Kernels for Fast Fourier Transformations on Computers with Overlapping Multiply-Add Instructions. SIAM J. Sci. Comput. 18, 1605–1611 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  43. 43.
    Karner, H., Auer, M., Ueberhuber, C.W.: Multiply-Add Optimized FFT Kernels. Math. Models and Methods in Appl. Sci. 11, 105–117 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  44. 44.
    Wait, C.D.: IBM PowerPC 440 FPU with complex arithmetic extensions. IBM Journal of Research and Development 49(2/3), 249–254 (2005)CrossRefGoogle Scholar
  45. 45.
    Bailey, D.H.: A fortran-90 double-double library. In: Lawrence Berkeley National Laboratory (2008),
  46. 46.
    Hida, Y., Li, X.S., Bailey, D.H.: Algorithms for quad-double precision floating point arithmetic. In: Proceedings of the 15th Symposium on Computer Arithmetic, pp. 155–162. IEEE, Washington (2001)Google Scholar
  47. 47.
    Dekker, T.: A floating-point technique for extending the available precision. Numerische Mathematik 18, 224–242 (1971)zbMATHCrossRefMathSciNetGoogle Scholar
  48. 48.
    Knuth, D.E.: The Art of Computer Programming: Seminumerical Algorithms, vol. 2. Addison-Wesley, New Jersey (1969)zbMATHGoogle Scholar
  49. 49.
    Bailey, D.H.: High-Precision Floating-Point Arithmetic in Scientific Computation. Computing in Science and Engineering 7, 54–61 (2005)CrossRefGoogle Scholar
  50. 50.
    Intel: Intel Fortran Compiler User’s Guide Vol I. Intel (2009)Google Scholar
  51. 51.
    Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., der Vorst, H.V.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. SIAM, Philadelphia (1994)Google Scholar
  52. 52.
    Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., van der Vorst, H. (eds.): Templates for the Solution of Algebraic Eigenvalue Problems. SIAM, Philadelphia (2000)zbMATHGoogle Scholar
  53. 53.
    Lehoucq, R.B., Sorensen, D.C., Yang, C.: ARPACK Users’ Guide: Solution of Large-scale Eigenvalue Problems with implicitly-restarted Arnoldi Methods. SIAM, Philadelphia (1998)Google Scholar
  54. 54.
    Bramley, R., Wang, X.: SPLIB: A library of iterative methods for sparse linear system. Technical report, Indiana University–Bloomington (1995)Google Scholar
  55. 55.
    Boisvert, R.F., Pozo, R., Remington, K., Barrett, R., Dongarra, J.J.: The Matrix Market: A web resource for test matrix collections, pp. 125–137. Chapman & Hall, London (1997)Google Scholar
  56. 56.
    Casanova, H., Dongarra, J.: NetSolve: A Network Server for Solving Computational Science Problems. In: The International Journal of Supercomputer Applications and High Performance Computing, pp. 212–223. MIT Press, Boston (1995)Google Scholar
  57. 57.
    Sato, M., Nakada, H., Sekiguchi, S., Matsuoka, S., Nagashima, U., Takagi, H.: Ninf: A network based information library for global world-wide computing infrastructure (1997)Google Scholar
  58. 58.
    Rose, L.D., Padua, D.: Techniques for the translation of MATLAB programs into Fortran 90. ACM Transactions on Programming Languages and Systems 21, 286–323 (1999)CrossRefGoogle Scholar
  59. 59.
    Kawabata, H., Suzuki, M., Kitamura, T.: A MATLAB-based code generator for sparse matrix computations. In: Chin, W.-N. (ed.) APLAS 2004. LNCS, vol. 3302, pp. 280–295. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  60. 60.
    MathWorks, I.: Matlab. MathWorks, Inc (2005),
  61. 61.
    Luszczek, P., Dongarra, J.: Design of interactive environment for numerically intensive parallel linear algebra calculations. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2004. LNCS, vol. 3039, pp. 270–277. Springer, Heidelberg (2004)Google Scholar
  62. 62.
    Fujii, A., Suda, R., Nishida, A., Oyanagi, Y.: Evaluation of Asynchronous Iterative Method for Sparse Matrix Solver. In: Proceedings of the Second international Workshop on Automatic Performance Tuning, pp. 43–51. The University of Tokyo, Tokyo (2007)Google Scholar
  63. 63.
    Kajiyama, T., Nukada, A., Suda, R., Hasegawa, H., Nishida, A.: Toward Automatic Performance Tuning for Numerical Simulations in the SILC Matrix Computation Framework. In: Proceedings of the Second international Workshop on Automatic Performance Tuning, pp. 81–90. The University of Tokyo, Tokyo (2007)Google Scholar
  64. 64.
    Nishida, A.: Building Cost Effective High Performance Computing Environment via PCI Express. In: Proceedings of the 2006 International Conference on Parallel Processing Workshops, pp. 519–526. IEEE, Washington (2006)Google Scholar
  65. 65.
    Fujii, A., Suda, R., Nishida, A.: Parallel Matrix Distribution Library for Sparse Matrix Solvers. In: Proceedings of the 8th International Conference on High Performance Computing in Asia Pacific Region, pp. 213–219. IEEE, Washington (2005)CrossRefGoogle Scholar
  66. 66.
    Hourai, Y., Nishida, A., Oyanagi, Y.: Network-aware Data Mapping on Parallel Molecular Dynamics. In: Proceedings of 11th International Conference on Parallel and Distributed Systems, pp. 126–132. IEEE, Washington (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Akira Nishida
    • 1
  1. 1.Research Institute for Information TechnologyKyushu UniversityFukuokaJapan

Personalised recommendations