Advertisement

Computational and Applied Mathematics

, Volume 37, Issue 3, pp 2965–3004 | Cite as

An evaluation of reordering algorithms to reduce the computational cost of the incomplete Cholesky-conjugate gradient method

  • Sanderson L. Gonzaga de Oliveira
  • J. A. B. Bernardes
  • G. O. Chagas
Article

Abstract

This paper is concerned with applying bandwidth and profile reduction reordering algorithms prior to computing an incomplete Cholesky factorization and using this as a preconditioner for the conjugate gradient method. Hundreds of reordering algorithms have been proposed to solve the problems of bandwidth and profile reductions since the mid-1960s. In previous publications, a large range of heuristics for bandwidth and/or profile reductions was reviewed. Based on this experience, 13 heuristics were selected as the most promising methods. These are evaluated in this paper along with a variant of the breadth-first search procedure that is proposed. Numerical results confirm the effectiveness of this modified reordering algorithm for linear systems derived from specific application areas. Moreover, the most promising heuristics for several application areas are identified when reducing the computational cost of the incomplete Cholesky-conjugate gradient method.

Keywords

Bandwidth reduction Profile reduction Combinatorial optimization Heuristics Metaheuristics Reordering algorithms Sparse matrices Renumbering Ordering Graph labeling Conjugate gradient method Graph algorithm Sparse symmetric positive-definite linear systems Incomplete Cholesky factorization 

Notes

Acknowledgements

This work was undertaken with the support of the Fapemig - Fundação de Amparo à Pesquisa do Estado de Minas Gerais. We would like to thank Prof. Dr. Dragan Urosevic, from the Mathematical Institute SANU, for sending us the VNS-band executable program. We would also like to thank Prof. Dr. Fei Xiao, from Beepi, for sending us the source code of the FNCHC heuristic. We thank Dr. J. Scott from the STFC Numerical Analysis Group for her comments. We would like to thank Dr. Luiz H. A. Correia from the Universidade Federal de Lavras for letting us execute our simulations on his machines. In addition, we would like to thank the reviewers for their valuable comments and suggestions.

References

  1. Ajiz MA, Jennings A (1984) A robust incomplete Cholesky-conjugate gradient algorithm. Int J Numer Methods Eng 20:949–966CrossRefMATHGoogle Scholar
  2. Azad A, Jacquelin M, Buluç A, Ng EG (2017) The reverse Cuthill–Mckee algorithm in distributed-memory. In: Proceedings of the 31st IEEE international parallel and distributed processing symposium—IPDPS, IEEE Computer Society, Orlando, FLGoogle Scholar
  3. Bathe K (1996) Finite element procedures. Prentice Hall, New JerseyMATHGoogle Scholar
  4. Benzi M (2002) Preconditioning techniques for large linear systems: a survey. J Comput Phys 182:418–477MathSciNetCrossRefMATHGoogle Scholar
  5. Benzi M, Szyld DB, Van Duin A (1999) Orderings for incomplete factorization preconditioning of nonsymmetric problems. SIAM J Sci Comput 20(5):1652–1670MathSciNetCrossRefMATHGoogle Scholar
  6. Benzi M, Tuma M (2003) A robust incomplete factorization preconditioner for positive definite matrices. Numer Linear Algebra Appl 10:385–400MathSciNetCrossRefMATHGoogle Scholar
  7. Bernardes JAB, Gonzaga de Oliveira SL (2015) A systematic review of heuristics for profile reduction of symmetric matrices. In: Koziel S, Leifsson L, Lees M, Krzhizhanovskaya VV, Dongarra J, Sloot PM (eds) International conference on computational science, ICCS 2015, vol 51, Elsevier, Reykjavk, pp 221–230Google Scholar
  8. Botafogo RA (1993) Cluster analysis for hypertext systems. In: Korfhage R, Rasmussen E, Willett P (eds) Proceedings of the 16th annual international ACM-SIGIR conference on research and development in information retrieval. ACM, Pittsburgh, pp 116–125Google Scholar
  9. Burgess DA, Giles M (1997) Renumbering unstructured grids to improve the performance of codes on hierarchial memory machines. Adv Eng Softw 28(3):189–201CrossRefGoogle Scholar
  10. Burgess IW, Lai PKF (1986) A new node renumbering algorithm for bandwidth reduction. Int J Numer Methods Eng 23:1693–1704CrossRefMATHGoogle Scholar
  11. Chagas GO, Gonzaga de Oliveira SL (2015) Metaheuristic-based heuristics for symmetric-matrix bandwidth reduction: A systematic review. In: Koziel S, Leifsson L, Lees M, Krzhizhanovskaya VV, Dongarra J, Sloot PM (eds), International conference on computational science, ICCS 2015, vol 51, Elsevier, Reykjavk, pp 211–220Google Scholar
  12. Das R, Mavriplis DJ, Saltz JH, Gupta SK, Ponnusamy R (1994) Design and implementation of a parallel unstructured Euler solver using software primitives. AIAA J 32(3):489–496CrossRefMATHGoogle Scholar
  13. Davis TA, Hu Y (2011) The University of Florida sparse matrix collection. ACM Trans Math Softw 38(1):1–25MathSciNetMATHGoogle Scholar
  14. Deb B, Srirama SN (2014) Scalability of parallel genetic algorithm for two-mode clustering. Int J Comput Appl 94(14):23–26Google Scholar
  15. Duff IS, Reid JK, Scott JA (1989) The use of profile reduction algorithms with a frontal code. Int J Numer Methods Eng 28(11):2555–2568CrossRefMATHGoogle Scholar
  16. Felippa CA (1975) Solution of linear equations with skyline-stored symmetric matrix. Comput Struct 5(1):13–29CrossRefMATHGoogle Scholar
  17. Gao J, Liang R, Wang J (2014) Research on the conjugate gradient algorithm with a modified incomplete Cholesky preconditioner on GPU. J Parallel Distrib Comput 74:2088–2098CrossRefGoogle Scholar
  18. George A, Liu JW (1981) Computer solution of large sparse positive definite systems. Prentice-Hall, Englewood CliffsMATHGoogle Scholar
  19. George A, Liu JWH (1979) An implementation of a pseudoperipheral node finder. ACM Trans Math Softw 5(3):284–295CrossRefMATHGoogle Scholar
  20. Gibbs NE, Poole WG, Stockmeyer PK (1976) An algorithm for reducing the bandwidth and profile of a sparse matrix. SIAM J Numer Anal 13(2):236–250MathSciNetCrossRefMATHGoogle Scholar
  21. Gibou F, Min C (2012) On the performance of a simple parallel implementation of the ILU-PCG for the Poisson equation on irregular domains. J Comput Phys 231(14):4531–4536MathSciNetCrossRefMATHGoogle Scholar
  22. Gkountouvas T, Karakasis V, Kourtis K, Goumas G, Koziris N (2013) Improving the performance of the symmetric sparse matrix-vector multiplication in multicore. In: 27th International symposium on parallel and distributed processing (IPDPS), IEEE Computer Society, Boston, pp 273–283Google Scholar
  23. Golub GH, van Loan CF (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, BaltimoreMATHGoogle Scholar
  24. Gonzaga de Oliveira SL, Abreu AAAM, Robaina DT, Kischnhevsky M (2016a) A new heuristic for bandwidth and profile reductions of matrices using a self-organizing map. In: Gervasi O, Murgante B, Misra S, Rocha AMACV, Torre CM, Taniar D, Apduhan BO, Stankova E, Wang S (eds) The 16th international conference on computational science and its applications (ICCSA), LNCS, Part I, vol 9786, Springer, Beijing, pp 54–70Google Scholar
  25. Gonzaga de Oliveira SL, Abreu AAAM, Robaina DT, Kischnhevsky M (2017) An evaluation of four reordering algorithms to reduce the computational cost of the Jacobi-preconditioned conjugate gradient method using high-precision arithmetic. Int J Bus Intell Data Min 12(2):190–209Google Scholar
  26. Gonzaga de Oliveira SL, Bernardes JAB, Chagas GO (2016b) An evaluation of low-cost heuristics for matrix bandwidth and profile reductions. Comput Appl Math. doi: 10.1007/s40314-016-0394-9
  27. Gonzaga de Oliveira SL, Chagas GO (2015) A systematic review of heuristics for symmetric-matrix bandwidth reduction: methods not based on metaheuristics. In: Proceedings of the Brazilian symposium on operations research (SBPO 2015), Sobrapo, PernambucoGoogle Scholar
  28. Hestenes MR, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J Res Natl Bur Stand 49(36):409–436MathSciNetCrossRefMATHGoogle Scholar
  29. Iwashita T, Mifune T, Shimasaki M (2007) Evaluation index of acceleration factor and ordering in shifted ICCG method for electromagnetic field analyses. IEEE Trans Magn 43(4):1493–1496CrossRefGoogle Scholar
  30. Iwashita T, Nakanishi Y, Takahashi Y (2012) Algebraic block multi-color ordering method for parallel multi-threaded sparse triangular solver in ICCG method. In: IEEE 26th international parallel and distributed processing symposium, IEEE Computer Society, Nice, pp 474–482Google Scholar
  31. Jiang ZY, Hu WP, Thomson PF, Lam YC (2000) Solution of the equations of rigid-plastic FE analysis by shifted incomplete Cholesky factorisation and the conjugate gradient method in metal forming processes. J Mater Process Technol 102:70–77CrossRefGoogle Scholar
  32. Johnson D (2002) A theoretician’s guide to the experimental analysis of algorithms. In: Goldwasser M, Johnson DS, McGeoch CC (eds) Proceedings of the 5th and 6th DIMACS implementation challenges, American Mathematical Society, ProvidenceGoogle Scholar
  33. Karantasis KI, Lenharth A, Nguyen D, Garzarán MJ, Pingali K (2014) Parallelization of reordering algorithms for bandwidth and wavefront reduction. In: International conference for high performance computing, networking, storage and analysis, IEEE computer society, New Orleans, LA, pp 921–932Google Scholar
  34. Karp R (1993) Mapping the genome: Some combinatorial problems arising in molecular biology. In: Kosaraju R, Johnson D, Aggarwal A (eds) STOC’93 Proceedings of the 25th annual ACM symposium on theory of computing, ACM, San Diego, CA, pp 278–285Google Scholar
  35. Kaveh A, Sharafi P (2012) Ordering for bandwidth and profile minimization problems via charged system search algorithm. Iran J Sci Technol Trans Civ Eng 36(2):39–52Google Scholar
  36. Kendall DG (1969) Incidence matrices, interval graphs and seriation in archaeology. Pac J Math 28(3):565–570CrossRefMATHGoogle Scholar
  37. Konshin I (2016) Parallel computational models to estimate an actual speedup of analyzed algorithm. In: Voevodin V, Sobolev S (eds) Supercomputing, second Russian supercomputing day, RuSCDays 2016, Springer, Moscow, Communications in Computer and Information Science, pp 304–317Google Scholar
  38. Koohestani B, Poli R (2011) A hyper-heuristic approach to evolving algorithms for bandwidth reduction based on genetic programming. In: Research and development in intelligent systems XXVIII, Springer, London, pp 93–106Google Scholar
  39. Kumfert G, Pothen A (1997) Two improved algorithms for envelope and wavefront reduction. BIT Numer Math 37(3):559–590MathSciNetCrossRefMATHGoogle Scholar
  40. Lanczos C (1952) Solutions of systems of linear equations by minimized iterations. J Res Natl Bur Stand 49(1):33–53MathSciNetCrossRefGoogle Scholar
  41. Li L, Huang TZ, Jing YF, Zhang Y (2010) Application of the incomplete Cholesky factorization preconditioned Krylov subspace method to the vector finite element method for 3-D electromagnetic scattering problems. Comput Phys Commun 181:271–276MathSciNetCrossRefMATHGoogle Scholar
  42. Lim A, Rodrigues B, Xiao F (2003) A new node centroid algorithm for bandwidth minimization. In: Proceedings of the 18th international joint conference on artificial intelligence, IJCAI’03. Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 1544–1545Google Scholar
  43. Lim A, Rodrigues B, Xiao F (2004) A centroid-based approach to solve the bandwidth minimization problem. In: Proceedings of the Hawaii International Conference on System Sciences, Institute of Electrical and Electronics Engineers Inc., Big Island, vol 37, pp 1203–1208Google Scholar
  44. Lim A, Rodrigues B, Xiao F (2007) A fast algorithm for bandwidth minimization. Int J Artif Intell Tools 3:537–544CrossRefGoogle Scholar
  45. Lin YX, Yuan JJ (1994) Profile minimization problem for matrices and graphs. Acta Math Appl Sin 10(1):107–122MathSciNetCrossRefMATHGoogle Scholar
  46. Liu JW (1976) On reducing the profile of sparse symmetric matrices. PhD thesis, University of Waterloo, WaterlooGoogle Scholar
  47. Manguoglu M, Koyutürk M, Sameh AH, Grama A (2010) Weighted matrix ordering and parallel banded preconditioners for iterative linear system solvers. SIAM J Sci Comput 32(3):1201–1216MathSciNetCrossRefMATHGoogle Scholar
  48. Medeiros SRP, Pimenta PM, Goldenberg P (1993) Algorithm for profile and wavefront reduction of sparse matrices with a symmetric structure. Eng Comput 10(3):257–266CrossRefGoogle Scholar
  49. Milyukova OY (2016) Combination of numerical and structured approaches to the construction of a second-order incomplete triangular factorization in parallel preconditioning methods. Comput Math Math Phys 56(5):699–716MathSciNetCrossRefMATHGoogle Scholar
  50. Mladenovic N, Urosevic D, Pérez-Brito D, García-González CG (2010) Variable neighbourhood search for bandwidth reduction. Eur J Oper Res 200:14–27MathSciNetCrossRefMATHGoogle Scholar
  51. Papadimitriou CH (1976) The NP-completeness of bandwidth minimization problem. Comput J 16:177–192MathSciNetMATHGoogle Scholar
  52. Park J, Smelyanskiy M, Vaidyanathan K, Heinecke A, Kalamkar DD, Patwary MMA, Pirogov V, Dubey P, Liu X, Rosales C, Mazauric C, Daley C (2016) Optimizations in a high-performance conjugate gradient benchmark for IA-based multi- and many-core processors. Int J High Perform Comput Appl 30(1):11–27CrossRefGoogle Scholar
  53. Reid JK, Scott JA (1999) Ordering symmetric sparse matrices for small profile and wavefront. Int J Numer Methods Eng 45(12):1737–1755CrossRefMATHGoogle Scholar
  54. Reid JK, Scott JA (2002) Implementing Hager’s exchange methods for matrix profile reduction. ACM Trans Math Softw 28(4):377–391MathSciNetCrossRefMATHGoogle Scholar
  55. Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRefMATHGoogle Scholar
  56. Saad Y, Schultz MH (1986) GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J Sci Stat Comput 7:856–869MathSciNetCrossRefMATHGoogle Scholar
  57. Semba K, Tani K, Yamada T, Iwashita T, Takahashi Y, Nakashima H (2013) Parallel performance of multithreaded ICCG solver based on algebraic block multicolor ordering in finite element electromagnetic field analyses. IEEE Trans Magn 49(5):1581–1584CrossRefGoogle Scholar
  58. Sloan SW (1989) A Fortran program for profile and wavefront reduction. Int J Numer Methods Eng 28(11):2651–2679CrossRefMATHGoogle Scholar
  59. Snay RA (1976) Reducing the profile of sparse symmetric matrices. Bull Geodes 50(4):341–352MathSciNetCrossRefGoogle Scholar
  60. Tsuburaya T, Okamoto Y, Sato S (2015) Parallelized ICCG method using block-multicolor orderings in real symmetric linear system derived from voltage-driven FEM in time domain. Int J Comput Math Electr Electron Eng 34(5):1433–1446CrossRefGoogle Scholar
  61. van der Vorst HA, Dekker K (1988) Conjugate gradient type methods and preconditioning. J Comput Appl Math 24:73–87MathSciNetCrossRefMATHGoogle Scholar
  62. Wang Q, Guo YC, Shi XW (2009) A generalized GPS algorithm for reducing the bandwidth and profile of a sparse matrix. Progress Electromagn Res 90:121–136CrossRefGoogle Scholar
  63. Wout E, van Gijzen MB, Ditzel A, van der Ploeg A, Vuik C (2012) The deflated relaxed incomplete Cholesky CG method for use in a real-time ship simulator. Proc Comput Sci 1:249–257CrossRefGoogle Scholar

Copyright information

© SBMAC - Sociedade Brasileira de Matemática Aplicada e Computacional 2017

Authors and Affiliations

  • Sanderson L. Gonzaga de Oliveira
    • 1
  • J. A. B. Bernardes
    • 1
  • G. O. Chagas
    • 1
  1. 1.Universidade Federal de LavrasLavrasBrazil

Personalised recommendations