Abstract
A scalable mapping is proposed for 3 important kernels from the Numerical Linear Algebra domain, to exploit architectural features to reach asymptotically optimal efficiency and a low energy consumption. Performance and power evaluations were done with input data set matrix sizes ranging from 64\(\times \)64 to 16384\(\times \)16384. 12 architectural variants with up to 10\(\times \)10 processing elements were used to explore scalability of the mapping and the architecture, achieving \(<10\,\%\) energy increase for architectures up to 8\(\times \)8 PEs coupled with performance speed-ups of more than an order of magnitude. This enables a clean area-performance trade-off on the Layers architecture while keeping energy constant over the variants.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ali, M., Stotzer, E., Igual, F.D., van de Geijn, R.A.: Level-3 BLAS on the TI C6678 multi-core DSP. In: Proc. of the 2012 IEEE 24th Intl. Simp. on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 179–186. IEEE (2012)
Chattopadhyay, A.: Ingredients of adaptability: a survey of reconfigurable processors. VLSI Design 2013, 10 (2013)
DeHon, A.: The density advantage of configurable computing. Computer 33(4), 41–49 (2000)
Fell, A., Rákossy, Z.E., Chattopadhyay, A.: Force-directed scheduling for data-flow graph mapping on coarse-grained reconfigurable architectures. In: Reconfigurable Computing and FPGAs (ReConFig), IEEE (2014)
Gonzalez, J., Núñez, R.C.: LAPACKrc: Fast linear algebra kernels/solvers for FPGA accelerators. In: Journal of Physics: Conference Series 180, p. 012042. IOP Publishing (2009)
Lei, Y., Dou, Y., Dong, Y., Zhou, J., Xia, F.: FPGA implementation of an exact dot product and its application in variable-precision floating-point arithmetic. The Journal of Supercomputing 64(2), 580–605 (2013). http://dx.doi.org/10.1007/s11227-012-0860-0
Pedram, A., van de Geijn, R.A., Gerstlauer, A.: Codesign tradeoffs for high-performance, low-power linear algebra architectures. IEEE Trans. Comput. 61(12), 1724–1736 (2012)
Rákossy, Z.E., Acosta Aponte, A., Chattopadhyay, A.: Exploiting architecture description language for diverse IP synthesis in heterogeneous MPSoC. In: Reconfigurable Computing and FPGAs (ReConFig). IEEE (2013)
Rákossy, Z.E., Merchant, F., Acosta Aponte, A., Nandy, S., Chattopadhyay, A.: Scalable and energy-efficient reconfigurable accelerator for column-wise givens rotation. In: 22nd International Conference on Very Large Scale Integration (VLSI-SoC). IEEE (2014)
Rákossy, Z.E., Naphade, T., Chattopadhyay, A.: Design and analysis of layered coarse-grained reconfigurable architecture. In: Reconfigurable Computing and FPGAs (ReConFig), pp. 1–6 (2012)
Volkov, V., Demmel, J.W.: Benchmarking gpus to tune dense linear algebra. In: Proc. of the 2008 ACM/IEEE Conf. on Supercomputing, p. 31. IEEE Press (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Rákossy, Z.E., Stengele, D., Acosta-Aponte, A., Chafekar, S., Bientinesi, P., Chattopadhyay, A. (2015). Scalable and Efficient Linear Algebra Kernel Mapping for Low Energy Consumption on the Layers CGRA. In: Sano, K., Soudris, D., Hübner, M., Diniz, P. (eds) Applied Reconfigurable Computing. ARC 2015. Lecture Notes in Computer Science(), vol 9040. Springer, Cham. https://doi.org/10.1007/978-3-319-16214-0_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-16214-0_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16213-3
Online ISBN: 978-3-319-16214-0
eBook Packages: Computer ScienceComputer Science (R0)