Parallel Blocked Algorithm for Solving the Algebraic Path Problem on a Matrix Processor

Takahashi, Akihito; Sedukhin, Stanislav

doi:10.1007/11557654_89

Akihito Takahashi²⁰ &
Stanislav Sedukhin²⁰

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 3726))

Included in the following conference series:

International Conference on High Performance Computing and Communications

637 Accesses
5 Citations

Abstract

This paper presents a parallel blocked algorithm for the algebraic path problem (APP). It is known that the complexity of the APP is the same as that of the classical matrix-matrix multiplication; however, solving the APP takes much more running time because of its unique data dependencies that limits data reuse drastically. We examine a parallel implementation of a blocked algorithm for the APP on the one-chip Intrinsity FastMATH adaptive processor, which consists of a scalar MIPS processor extended with a SIMD matrix coprocessor. The matrix coprocessor supports native matrix instructions on an array of 4 × 4 processing elements. Implementing with matrix instructions requires us to transform algorithms in terms of matrix-matrix operations. Conventional vectorization for SIMD vector processing deals with only the innermost loop; however, on the FastMATH processor, we need to vectorize two or three nested loops in order to convert the loops to equivalent one matrix operation. Our experimental results show a peak performance of 9.27 GOPS and high usage rates of matrix instructions for solving the APP. Findings from our experimental results indicate that the SIMD matrix extension to (super)scalar processor would be very useful for fast solution of many matrix-formulated problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Influence of Architectural Features of the SNC-4 Mode of the Intel Xeon Phi KNL on Matrix Multiplication

A methodology for speeding up matrix vector multiplication for single/multi-core architectures

Article 29 March 2015

Sparse CSB_Coo Matrix-Vector and Matrix-Matrix Performance on Intel Xeon Architectures

References

Venkataraman, G., Sahni, S., Mukhopadhyaya, S.: A blocked all-pairs shortest-paths algorithm. In: Halldórsson, M.M. (ed.) SWAT 2000. LNCS, vol. 1851, p. 419. Springer, Heidelberg (2000)
Chapter Google Scholar
Penner, M., Prasanna, V.K.: Cache-friendly implementations of transitive closure. In: Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), Barcelona, Spain (September 2001)
Google Scholar
Griem, G., Oliker, G.: Transitive closure on the Imagine stream processor. In: The 5th Workshop on Media and Stream Processors (MSP-5), San Diego, CA (December 2003)
Google Scholar
Park, J.-S., Penner, M., Prasanna, V.K.: Optimizing graph algorithms for improved cache performance. IEEE Transactions on Parallel and Distributed Systems 15(9), 769–782 (2004)
Article Google Scholar
Floyd, R.W.: Algorithm 97: Shortest path. Communications ACM 5(6), 345 (1962)
Article Google Scholar
Rote, G.: A systolic array algorithm for the algebraic path problem (shortest paths; matrix inversion). Computing 34, 191–219 (1985)
Article MATH MathSciNet Google Scholar
Robert, Y., Trystram, D.: Parallel implementation of the algebraic path problem. In: Proceedings of the Conference on Algorithms and Hardware for Parallel Processing (CONPAR 1986), pp. 149–156 (1986)
Google Scholar
Nunez, F.J., Valero, M.: A block algorithm for the algebraic path problem and its execution on a systolic array. In: Proceedings of the International Conference on Systolic Arrays, pp. 265–274 (1988)
Google Scholar
Maggs, B.M., Plotkin, S.A.: Minimum-cost spanning tree as a path-finding problem. Information Processing Letters 26, 291–293 (1988)
Article MathSciNet Google Scholar
Rote, G.: Path problems in graphs. Computing Supplementum 7, 155–189 (1990)
MathSciNet Google Scholar
Sedukhin, S.: Design and analysis of systolic algorithms for the algebraic path problem. Computers and Artificial Intelligence 11(3), 269–292 (1992)
MATH MathSciNet Google Scholar
Fink, E.: A survey of sequential and systolic algorithms for the algebraic path problem, Technical report CS-92-37, Department of Computer Science, University of Waterloo (1992)
Google Scholar
Cachera, D., Rajopadhye, S., Risset, T., Tadonki, C.: Parallelization of the algebraic path problem on linear SIMD/SPMD arrays, Technical report 1346, Irisa (2001)
Google Scholar
Olson, T.: Advanced processing techniques using the Intrinsity FastMATH processor, in: Embedded Processor Forum, California, USA (May 2002)
Google Scholar
Anantha, V., Harle, C., Olson, T., Yost, G.: An innovative high-performance architecture for vector and matrix math algorithms. In: Proceedings of the 6th Annual Workshop on High Performance Embedded Computing (HPEC 2002), Massachusetts, USA (September 2002)
Google Scholar
Intrinsity Software Application Writer’s Manual, ver. 0.3, Intrinsity, Inc (2003)
Google Scholar
Using MATLAB Version 6, The Math Works, Inc. (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Computer Science and Engineering, University of Aizu, Tsuruga, Ikki-machi, Aizuwakamatsu City, Fukushima, 965-8580, Japan
Akihito Takahashi & Stanislav Sedukhin

Authors

Akihito Takahashi
View author publications
You can also search for this author in PubMed Google Scholar
Stanislav Sedukhin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, St. Francis Xavier University, Antigonish, Canada
Laurence T. Yang
School of Computer Science/Welsh eScience Centre, Cardiff University, UK
Omer F. Rana
Dipartimento di Ingegneria dell’ Informazione - Second, University of Naples - Italy, Real Casa dell’Annunziata - via Roma, 29 81031, Aversa (CE), Italy
Beniamino Di Martino
Computer Science Department, University of Tennessee, 37996-3450, Knoxville, TN, USA
Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Takahashi, A., Sedukhin, S. (2005). Parallel Blocked Algorithm for Solving the Algebraic Path Problem on a Matrix Processor. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J. (eds) High Performance Computing and Communications. HPCC 2005. Lecture Notes in Computer Science, vol 3726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557654_89

Download citation

DOI: https://doi.org/10.1007/11557654_89
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29031-5
Online ISBN: 978-3-540-32079-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parallel Blocked Algorithm for Solving the Algebraic Path Problem on a Matrix Processor

Abstract

Access this chapter

Preview

Similar content being viewed by others

Influence of Architectural Features of the SNC-4 Mode of the Intel Xeon Phi KNL on Matrix Multiplication

A methodology for speeding up matrix vector multiplication for single/multi-core architectures

Sparse CSB_Coo Matrix-Vector and Matrix-Matrix Performance on Intel Xeon Architectures

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Parallel Blocked Algorithm for Solving the Algebraic Path Problem on a Matrix Processor

Abstract

Access this chapter

Preview

Similar content being viewed by others

Influence of Architectural Features of the SNC-4 Mode of the Intel Xeon Phi KNL on Matrix Multiplication

A methodology for speeding up matrix vector multiplication for single/multi-core architectures

Sparse CSB_Coo Matrix-Vector and Matrix-Matrix Performance on Intel Xeon Architectures

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation