An evaluation of reordering algorithms to reduce the computational cost of the incomplete Cholesky-conjugate gradient method
- 146 Downloads
- 4 Citations
Abstract
This paper is concerned with applying bandwidth and profile reduction reordering algorithms prior to computing an incomplete Cholesky factorization and using this as a preconditioner for the conjugate gradient method. Hundreds of reordering algorithms have been proposed to solve the problems of bandwidth and profile reductions since the mid-1960s. In previous publications, a large range of heuristics for bandwidth and/or profile reductions was reviewed. Based on this experience, 13 heuristics were selected as the most promising methods. These are evaluated in this paper along with a variant of the breadth-first search procedure that is proposed. Numerical results confirm the effectiveness of this modified reordering algorithm for linear systems derived from specific application areas. Moreover, the most promising heuristics for several application areas are identified when reducing the computational cost of the incomplete Cholesky-conjugate gradient method.
Keywords
Bandwidth reduction Profile reduction Combinatorial optimization Heuristics Metaheuristics Reordering algorithms Sparse matrices Renumbering Ordering Graph labeling Conjugate gradient method Graph algorithm Sparse symmetric positive-definite linear systems Incomplete Cholesky factorizationNotes
Acknowledgements
This work was undertaken with the support of the Fapemig - Fundação de Amparo à Pesquisa do Estado de Minas Gerais. We would like to thank Prof. Dr. Dragan Urosevic, from the Mathematical Institute SANU, for sending us the VNS-band executable program. We would also like to thank Prof. Dr. Fei Xiao, from Beepi, for sending us the source code of the FNCHC heuristic. We thank Dr. J. Scott from the STFC Numerical Analysis Group for her comments. We would like to thank Dr. Luiz H. A. Correia from the Universidade Federal de Lavras for letting us execute our simulations on his machines. In addition, we would like to thank the reviewers for their valuable comments and suggestions.
References
- Ajiz MA, Jennings A (1984) A robust incomplete Cholesky-conjugate gradient algorithm. Int J Numer Methods Eng 20:949–966CrossRefzbMATHGoogle Scholar
- Azad A, Jacquelin M, Buluç A, Ng EG (2017) The reverse Cuthill–Mckee algorithm in distributed-memory. In: Proceedings of the 31st IEEE international parallel and distributed processing symposium—IPDPS, IEEE Computer Society, Orlando, FLGoogle Scholar
- Bathe K (1996) Finite element procedures. Prentice Hall, New JerseyzbMATHGoogle Scholar
- Benzi M (2002) Preconditioning techniques for large linear systems: a survey. J Comput Phys 182:418–477MathSciNetCrossRefzbMATHGoogle Scholar
- Benzi M, Szyld DB, Van Duin A (1999) Orderings for incomplete factorization preconditioning of nonsymmetric problems. SIAM J Sci Comput 20(5):1652–1670MathSciNetCrossRefzbMATHGoogle Scholar
- Benzi M, Tuma M (2003) A robust incomplete factorization preconditioner for positive definite matrices. Numer Linear Algebra Appl 10:385–400MathSciNetCrossRefzbMATHGoogle Scholar
- Bernardes JAB, Gonzaga de Oliveira SL (2015) A systematic review of heuristics for profile reduction of symmetric matrices. In: Koziel S, Leifsson L, Lees M, Krzhizhanovskaya VV, Dongarra J, Sloot PM (eds) International conference on computational science, ICCS 2015, vol 51, Elsevier, Reykjavk, pp 221–230Google Scholar
- Botafogo RA (1993) Cluster analysis for hypertext systems. In: Korfhage R, Rasmussen E, Willett P (eds) Proceedings of the 16th annual international ACM-SIGIR conference on research and development in information retrieval. ACM, Pittsburgh, pp 116–125Google Scholar
- Burgess DA, Giles M (1997) Renumbering unstructured grids to improve the performance of codes on hierarchial memory machines. Adv Eng Softw 28(3):189–201CrossRefGoogle Scholar
- Burgess IW, Lai PKF (1986) A new node renumbering algorithm for bandwidth reduction. Int J Numer Methods Eng 23:1693–1704CrossRefzbMATHGoogle Scholar
- Chagas GO, Gonzaga de Oliveira SL (2015) Metaheuristic-based heuristics for symmetric-matrix bandwidth reduction: A systematic review. In: Koziel S, Leifsson L, Lees M, Krzhizhanovskaya VV, Dongarra J, Sloot PM (eds), International conference on computational science, ICCS 2015, vol 51, Elsevier, Reykjavk, pp 211–220Google Scholar
- Das R, Mavriplis DJ, Saltz JH, Gupta SK, Ponnusamy R (1994) Design and implementation of a parallel unstructured Euler solver using software primitives. AIAA J 32(3):489–496CrossRefzbMATHGoogle Scholar
- Davis TA, Hu Y (2011) The University of Florida sparse matrix collection. ACM Trans Math Softw 38(1):1–25MathSciNetzbMATHGoogle Scholar
- Deb B, Srirama SN (2014) Scalability of parallel genetic algorithm for two-mode clustering. Int J Comput Appl 94(14):23–26Google Scholar
- Duff IS, Reid JK, Scott JA (1989) The use of profile reduction algorithms with a frontal code. Int J Numer Methods Eng 28(11):2555–2568CrossRefzbMATHGoogle Scholar
- Felippa CA (1975) Solution of linear equations with skyline-stored symmetric matrix. Comput Struct 5(1):13–29CrossRefzbMATHGoogle Scholar
- Gao J, Liang R, Wang J (2014) Research on the conjugate gradient algorithm with a modified incomplete Cholesky preconditioner on GPU. J Parallel Distrib Comput 74:2088–2098CrossRefGoogle Scholar
- George A, Liu JW (1981) Computer solution of large sparse positive definite systems. Prentice-Hall, Englewood CliffszbMATHGoogle Scholar
- George A, Liu JWH (1979) An implementation of a pseudoperipheral node finder. ACM Trans Math Softw 5(3):284–295CrossRefzbMATHGoogle Scholar
- Gibbs NE, Poole WG, Stockmeyer PK (1976) An algorithm for reducing the bandwidth and profile of a sparse matrix. SIAM J Numer Anal 13(2):236–250MathSciNetCrossRefzbMATHGoogle Scholar
- Gibou F, Min C (2012) On the performance of a simple parallel implementation of the ILU-PCG for the Poisson equation on irregular domains. J Comput Phys 231(14):4531–4536MathSciNetCrossRefzbMATHGoogle Scholar
- Gkountouvas T, Karakasis V, Kourtis K, Goumas G, Koziris N (2013) Improving the performance of the symmetric sparse matrix-vector multiplication in multicore. In: 27th International symposium on parallel and distributed processing (IPDPS), IEEE Computer Society, Boston, pp 273–283Google Scholar
- Golub GH, van Loan CF (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, BaltimorezbMATHGoogle Scholar
- Gonzaga de Oliveira SL, Abreu AAAM, Robaina DT, Kischnhevsky M (2016a) A new heuristic for bandwidth and profile reductions of matrices using a self-organizing map. In: Gervasi O, Murgante B, Misra S, Rocha AMACV, Torre CM, Taniar D, Apduhan BO, Stankova E, Wang S (eds) The 16th international conference on computational science and its applications (ICCSA), LNCS, Part I, vol 9786, Springer, Beijing, pp 54–70Google Scholar
- Gonzaga de Oliveira SL, Abreu AAAM, Robaina DT, Kischnhevsky M (2017) An evaluation of four reordering algorithms to reduce the computational cost of the Jacobi-preconditioned conjugate gradient method using high-precision arithmetic. Int J Bus Intell Data Min 12(2):190–209Google Scholar
- Gonzaga de Oliveira SL, Bernardes JAB, Chagas GO (2016b) An evaluation of low-cost heuristics for matrix bandwidth and profile reductions. Comput Appl Math. doi: 10.1007/s40314-016-0394-9
- Gonzaga de Oliveira SL, Chagas GO (2015) A systematic review of heuristics for symmetric-matrix bandwidth reduction: methods not based on metaheuristics. In: Proceedings of the Brazilian symposium on operations research (SBPO 2015), Sobrapo, PernambucoGoogle Scholar
- Hestenes MR, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J Res Natl Bur Stand 49(36):409–436MathSciNetCrossRefzbMATHGoogle Scholar
- Iwashita T, Mifune T, Shimasaki M (2007) Evaluation index of acceleration factor and ordering in shifted ICCG method for electromagnetic field analyses. IEEE Trans Magn 43(4):1493–1496CrossRefGoogle Scholar
- Iwashita T, Nakanishi Y, Takahashi Y (2012) Algebraic block multi-color ordering method for parallel multi-threaded sparse triangular solver in ICCG method. In: IEEE 26th international parallel and distributed processing symposium, IEEE Computer Society, Nice, pp 474–482Google Scholar
- Jiang ZY, Hu WP, Thomson PF, Lam YC (2000) Solution of the equations of rigid-plastic FE analysis by shifted incomplete Cholesky factorisation and the conjugate gradient method in metal forming processes. J Mater Process Technol 102:70–77CrossRefGoogle Scholar
- Johnson D (2002) A theoretician’s guide to the experimental analysis of algorithms. In: Goldwasser M, Johnson DS, McGeoch CC (eds) Proceedings of the 5th and 6th DIMACS implementation challenges, American Mathematical Society, ProvidenceGoogle Scholar
- Karantasis KI, Lenharth A, Nguyen D, Garzarán MJ, Pingali K (2014) Parallelization of reordering algorithms for bandwidth and wavefront reduction. In: International conference for high performance computing, networking, storage and analysis, IEEE computer society, New Orleans, LA, pp 921–932Google Scholar
- Karp R (1993) Mapping the genome: Some combinatorial problems arising in molecular biology. In: Kosaraju R, Johnson D, Aggarwal A (eds) STOC’93 Proceedings of the 25th annual ACM symposium on theory of computing, ACM, San Diego, CA, pp 278–285Google Scholar
- Kaveh A, Sharafi P (2012) Ordering for bandwidth and profile minimization problems via charged system search algorithm. Iran J Sci Technol Trans Civ Eng 36(2):39–52Google Scholar
- Kendall DG (1969) Incidence matrices, interval graphs and seriation in archaeology. Pac J Math 28(3):565–570CrossRefzbMATHGoogle Scholar
- Konshin I (2016) Parallel computational models to estimate an actual speedup of analyzed algorithm. In: Voevodin V, Sobolev S (eds) Supercomputing, second Russian supercomputing day, RuSCDays 2016, Springer, Moscow, Communications in Computer and Information Science, pp 304–317Google Scholar
- Koohestani B, Poli R (2011) A hyper-heuristic approach to evolving algorithms for bandwidth reduction based on genetic programming. In: Research and development in intelligent systems XXVIII, Springer, London, pp 93–106Google Scholar
- Kumfert G, Pothen A (1997) Two improved algorithms for envelope and wavefront reduction. BIT Numer Math 37(3):559–590MathSciNetCrossRefzbMATHGoogle Scholar
- Lanczos C (1952) Solutions of systems of linear equations by minimized iterations. J Res Natl Bur Stand 49(1):33–53MathSciNetCrossRefGoogle Scholar
- Li L, Huang TZ, Jing YF, Zhang Y (2010) Application of the incomplete Cholesky factorization preconditioned Krylov subspace method to the vector finite element method for 3-D electromagnetic scattering problems. Comput Phys Commun 181:271–276MathSciNetCrossRefzbMATHGoogle Scholar
- Lim A, Rodrigues B, Xiao F (2003) A new node centroid algorithm for bandwidth minimization. In: Proceedings of the 18th international joint conference on artificial intelligence, IJCAI’03. Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 1544–1545Google Scholar
- Lim A, Rodrigues B, Xiao F (2004) A centroid-based approach to solve the bandwidth minimization problem. In: Proceedings of the Hawaii International Conference on System Sciences, Institute of Electrical and Electronics Engineers Inc., Big Island, vol 37, pp 1203–1208Google Scholar
- Lim A, Rodrigues B, Xiao F (2007) A fast algorithm for bandwidth minimization. Int J Artif Intell Tools 3:537–544CrossRefGoogle Scholar
- Lin YX, Yuan JJ (1994) Profile minimization problem for matrices and graphs. Acta Math Appl Sin 10(1):107–122MathSciNetCrossRefzbMATHGoogle Scholar
- Liu JW (1976) On reducing the profile of sparse symmetric matrices. PhD thesis, University of Waterloo, WaterlooGoogle Scholar
- Manguoglu M, Koyutürk M, Sameh AH, Grama A (2010) Weighted matrix ordering and parallel banded preconditioners for iterative linear system solvers. SIAM J Sci Comput 32(3):1201–1216MathSciNetCrossRefzbMATHGoogle Scholar
- Medeiros SRP, Pimenta PM, Goldenberg P (1993) Algorithm for profile and wavefront reduction of sparse matrices with a symmetric structure. Eng Comput 10(3):257–266CrossRefGoogle Scholar
- Milyukova OY (2016) Combination of numerical and structured approaches to the construction of a second-order incomplete triangular factorization in parallel preconditioning methods. Comput Math Math Phys 56(5):699–716MathSciNetCrossRefzbMATHGoogle Scholar
- Mladenovic N, Urosevic D, Pérez-Brito D, García-González CG (2010) Variable neighbourhood search for bandwidth reduction. Eur J Oper Res 200:14–27MathSciNetCrossRefzbMATHGoogle Scholar
- Papadimitriou CH (1976) The NP-completeness of bandwidth minimization problem. Comput J 16:177–192MathSciNetzbMATHGoogle Scholar
- Park J, Smelyanskiy M, Vaidyanathan K, Heinecke A, Kalamkar DD, Patwary MMA, Pirogov V, Dubey P, Liu X, Rosales C, Mazauric C, Daley C (2016) Optimizations in a high-performance conjugate gradient benchmark for IA-based multi- and many-core processors. Int J High Perform Comput Appl 30(1):11–27CrossRefGoogle Scholar
- Reid JK, Scott JA (1999) Ordering symmetric sparse matrices for small profile and wavefront. Int J Numer Methods Eng 45(12):1737–1755CrossRefzbMATHGoogle Scholar
- Reid JK, Scott JA (2002) Implementing Hager’s exchange methods for matrix profile reduction. ACM Trans Math Softw 28(4):377–391MathSciNetCrossRefzbMATHGoogle Scholar
- Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRefzbMATHGoogle Scholar
- Saad Y, Schultz MH (1986) GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J Sci Stat Comput 7:856–869MathSciNetCrossRefzbMATHGoogle Scholar
- Semba K, Tani K, Yamada T, Iwashita T, Takahashi Y, Nakashima H (2013) Parallel performance of multithreaded ICCG solver based on algebraic block multicolor ordering in finite element electromagnetic field analyses. IEEE Trans Magn 49(5):1581–1584CrossRefGoogle Scholar
- Sloan SW (1989) A Fortran program for profile and wavefront reduction. Int J Numer Methods Eng 28(11):2651–2679CrossRefzbMATHGoogle Scholar
- Snay RA (1976) Reducing the profile of sparse symmetric matrices. Bull Geodes 50(4):341–352MathSciNetCrossRefGoogle Scholar
- Tsuburaya T, Okamoto Y, Sato S (2015) Parallelized ICCG method using block-multicolor orderings in real symmetric linear system derived from voltage-driven FEM in time domain. Int J Comput Math Electr Electron Eng 34(5):1433–1446CrossRefGoogle Scholar
- van der Vorst HA, Dekker K (1988) Conjugate gradient type methods and preconditioning. J Comput Appl Math 24:73–87MathSciNetCrossRefzbMATHGoogle Scholar
- Wang Q, Guo YC, Shi XW (2009) A generalized GPS algorithm for reducing the bandwidth and profile of a sparse matrix. Progress Electromagn Res 90:121–136CrossRefGoogle Scholar
- Wout E, van Gijzen MB, Ditzel A, van der Ploeg A, Vuik C (2012) The deflated relaxed incomplete Cholesky CG method for use in a real-time ship simulator. Proc Comput Sci 1:249–257CrossRefGoogle Scholar