Abstract
Due to the complexity of modern data parallel applications such as image processing applications, automatic approach to infer suitable and efficient hardware realizations are more and more required. Typically, the optimization of data transfer and storage micro-architecture has a key role for the data parallelism. In this paper, we propose a comprehensive method to explore the mapping of a high-level representation of an application into a customizable hardware accelerator. The high-level representation is in a language called Array-OL. The customizable architecture uses FIFO queues and double buffering mechanism to mask the latency of data transfers and external memory access. The mapping of a high-level representation onto the given architecture is performed by applying a set of loop transformations in Array-OL. A method based on integer partition is used to reduce the space of explored solutions.
Keywords
Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Catthoor, F., et al.: Data Access and Storage Management for Embedded Programmable Processors. Kluwer Academic Publishers, Dordrecht (2002)
Balasa, F., Kjeldsberg, P., Vandecappelle, A., Palkovic, M., Hu, Q., Zhu, H., Catthoor, F.: Storage Estimation and Design Space Exploration Methodologies for the Memory Management of Signal Processing Applications. Journal of Signal Processing Systems 53(1), 51–71 (2008)
Hiser, J.D., Davidson, J.W., Whalley, D.B.: Fast, Accurate Design Space Exploration of Embedded Systems Memory Configurations. In: SAC 2007: Proceedings of the 2007 ACM Symposium on Applied Computing, pp. 699–706. ACM, New York (2007)
Hu, Q., Kjeldsberg, P.G., Vandecappelle, A., Palkovic, M., Catthoor, F.: Incremental hierarchical memory size estimation for steering of loop transformations. ACM Transactions on Design Automation of Electronic Systems 12(4), 50 (2007)
Chen, Y., Byna, S., Sun, X.-H., Thakur, R., Gropp, W.: Hiding I/O latency with pre-execution prefetching for parallel applications. In: SC 2008: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, pp. 1–10 (2008)
Panda, P.R., Catthoor, F., Dutt, N.D., Danckaert, K., Brockmeyer, E., Kulkarni, C., Vandercappelle, A., Kjeldsberg, P.G.: Data and memory optimization techniques for embedded systems. ACM Transactions on Design Automation of Electronic Systems 6(2), 149–206 (2001)
Kung, H.T.: Why systolic architectures. Computer 15(1), 37–46 (1982)
Amar, A., Boulet, P., Dumont, P.: Projection of the Array-OL Specification Language onto the Kahn Process Network Computation Model. In: ISPAN 2005: Proceedings of the 8th International Symposium on Parallel Architectures, Algorithms and Networks, pp. 496–503 (2005)
Kim, D., Managuli, R., Kim, Y.: Data cache and direct memory access in programming mediaprocessors. IEEE Micro 21(4), 33–42 (2001)
Ascia, G., Catania, V., Di Nuovo, A.G., Palesi, M., Patti, D.: Efficient design space exploration for application specific systems-on-a-chip. Journal of Systems Architecture 53(10), 733–750 (2007)
Glitia, C., Dumont, P., Boulet, P.: Array-OL with delays, a domain specific specification language for multidimensional intensive signal processing. In: Multidimensional Systems and Signal Processing. Springer, Netherlands (2010)
de Lavarene, B.C., Alleysson, D., Durette, B., Herault, J.: Efficient demosaicing through recursive filtering. In: IEEE International Conference on Image Processing (ICIP 2007), vol. 2 (October 2007)
Hérault, J., Durette, B.: Modeling visual perception for image processing. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 662–675. Springer, Heidelberg (2007)
Glitia, C., Boulet, P.: High level loop transformations for systematic signal processing embedded applications. In: Bereković, M., Dimopoulos, N., Wong, S. (eds.) SAMOS 2008. LNCS, vol. 5114, pp. 187–196. Springer, Heidelberg (2008)
Maximizing loop parallelism and improving data locality via loop fusion and distribution, pp. 301–320. Springer, Heidelberg (2006)
Hannig, F., Dutta, H., Teich, J.: Parallelization approaches for hardware accelerators – loop unrolling versus loop partitioning. In: Architecture of Computing Systems – ARCS 2009, pp. 16–27 (2009)
Xue, J.: Loop tiling for parallelism. Kluwer Academic Publishers, Dordrecht (2000)
Panda, P.R., Nakamura, H., Dutt, N.D., Nicolau, A.: Augmenting loop tiling with data alignment for improved cache performance. IEEE Transactions on Computers 48, 142–149 (1999)
Rosilde, C.: Design Space Exploration for data-dominated image applications with non-affine array references. PhD thesis (2009)
Liu, L., Nagaraj, P., Upadhyaya, S., Sridhar, R.: Defect analysis and defect tolerant design of multi-port srams. J. Electron. Test. 24(1-3), 165–179 (2008)
Imondi, G.C., Zenzo, M., Fazio, M.A.: Pipelined Burst Memory Access, US patent (August 2008)
Schreiber, R., Aditya, S., Mahlke, S., Kathail, V., Rau, B., Cronquist, D., Sivaraman, M.: Pico-npa: High-level synthesis of nonprogrammable hardware accelerators. The Journal of VLSI Signal Processing 31(2), 127–142 (2002)
Ahmed, N., Mateev, N., Pingali, K.: Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. International Journal of Parallel Programming 29(5), 493–544 (2001)
Rahwan, T., Ramchurn, S., Jennings, N., Giovannucci, A.: An anytime algorithm for optimal coalition structure generation. Journal of Artificial Intelligence Research (JAIR) 34, 521–567 (2009)
Gamatié, A., Le Beux, S., Piel, É., Atitallah, R.B., Etien, A., Marquet, P., Dekeyser, J.-L.: A model driven design framework for massively parallel embedded systems. In: ACM Transactions on Embedded Computing Systems (TECS) ©. ACM, New York (to appear 2010), http://hal.inria.fr/inria-00311115/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Corvino, R., Gamatié, A., Boulet, P. (2010). Architecture Exploration for Efficient Data Transfer and Storage in Data-Parallel Applications. In: D’Ambra, P., Guarracino, M., Talia, D. (eds) Euro-Par 2010 - Parallel Processing. Euro-Par 2010. Lecture Notes in Computer Science, vol 6271. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15277-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-15277-1_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15276-4
Online ISBN: 978-3-642-15277-1
eBook Packages: Computer ScienceComputer Science (R0)