Design Space Exploration for Efficient Data Intensive Computing on SoCs

Corvino, Rosilde; Gamatié, Abdoulaye; Boulet, Pierre

doi:10.1007/978-1-4614-1415-5_23

Rosilde Corvino³,
Abdoulaye Gamatié⁴ &
Pierre Boulet⁴

1494 Accesses
3 Citations

Abstract

Finding efficient implementations of data intensive applications, such as radar/sonar signal and image processing, on a system-on-chip is a very challenging problem due to increasing complexity and performance requirements of such applications. One major issue is the optimization of data transfer and storage micro-architecture, which is crucial in this context. In this chapter, we propose a comprehensive method to explore the mapping of high-level representations of applications into a customizable hardware accelerator. The high-level representation is given in a language named Array-OL. The customizable architecture uses FIFO queues and a double buffering mechanism to mask the latency of data transfers and external memory access. The mapping of a high-level representation onto a given architecture is achieved by applying loop transformations in Array-OL. A method based on integer partition is used to reduce the space of explored solutions. Our proposition aims at facilitating the inference of adequate hardware realizations for data intensive applications. It is illustrated on a case study consisting in implementing a hydrophone monitoring application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Available on demand athttp://www.es.ele.tue.nl/~rcorvino/tools.html

References

Tony Hey, Stewart Tansley, and Kristin Tolle, editors.The Fourth Paradigm: Data-Intensive Scientific Discovery. 2009.
Google Scholar
Jianwen Zhu and Nikil Dutt. Electronic system-level design and high-level synthesis. In Laung-Terng Wang, Yao-Wen Chang, and Kwang-Ting (Tim) Cheng, editors,Electronic Design Automation, pages 235–297. Morgan Kaufmann, Boston, 2009.
Google Scholar
Felice Balarin, Massimiliano Chiodo, Paolo Giusto, Harry Hsieh, Attila Jurecska, Luciano Lavagno, Claudio Passerone, Alberto Sangiovanni-Vincentelli, Ellen Sentovich, Kei Suzuki, and Bassam Tabbara.Hardware-software co-design of embedded systems: the POLIS approach. Kluwer Academic Publishers, Norwell, MA, USA, 1997.
Book MATH Google Scholar
R. Ernst, J. Henkel, Th. Benner, W. Ye, U. Holtmann, D. Herrmann, and M. Trawny. The cosyma environment for hardware/software cosynthesis of small embedded systems.Microprocessors and Microsystems, 20(3):159–166, 1996.
Article Google Scholar
B. Kienhuis, E. Deprettere, K. Vissers, and P. Van Der Wolf. An approach for quantitative analysis of application-specific dataflow architectures. InApplication-Specific Systems, Architectures and Processors, 1997. Proceedings., IEEE International Conference on, pages 338–349, Jul 1997.
Google Scholar
Sander Stuijk.Predictable Mapping of Streaming Applications on Multiprocessors. PhD thesis, Technische Universiteit Eindhoven, The Nederlands, 2007.
Google Scholar
Andreas Gerstlauer and Daniel D. Gajski. System-level abstraction semantics. InProceedings of the 15th international symposium on System Synthesis, ISSS ’02, pages 231–236, New York, NY, USA, 2002. ACM.
Google Scholar
P. R. Panda, F. Catthoor, N. D. Dutt, K. Danckaert, E. Brockmeyer, C. Kulkarni, A. Vandercappelle, and P. G. Kjeldsberg. Data and memory optimization techniques for embedded systems.ACM Trans. Des. Autom. Electron. Syst., 6:149–206, April 2001.
Article Google Scholar
F. Catthoor, K. Danckaert, C. Kulkarni, E. Brockmeyer, P. G. Kjeldsberg, T. Van Achteren, and T. Omnes.Data access and storage management for embedded programmable processors. Springer, 2002.
Google Scholar
Rosilde Corvino, Abdoulaye Gamatié, and Pierre Boulet. Architecture exploration for efficient data transfer and storage in data-parallel applications. In Pasqua D’Ambra, Mario Guarracino, and Domenico Talia, editors,Euro-Par 2010 - Parallel Processing, volume 6271 ofLecture Notes in Computer Science, pages 101–116. Springer Berlin/Heidelberg, 2010.
Google Scholar
Lech Józwiak, Nadia Nedjah, and Miguel Figueroa. Modern development methods and tools for embedded reconfigurable systems: A survey.Integration, the VLSI Journal, 43(1):1–33, 2010.
Google Scholar
Edward A. Lee and David G. Messerschmitt. Synchronous Data Flow.Proceedings of the IEEE, 75(9):1235–1245, September 1987.
Article Google Scholar
A. Sangiovanni-Vincentelli and G. Martin. Platform-based design and software design methodology for embedded systems.Design Test of Computers, IEEE, 18(6):23–33, Nov/Dec 2001.
Google Scholar
Giuseppe Ascia, Vincenzo Catania, Alessandro G. Di Nuovo, Maurizio Palesi, and Davide Patti. Efficient design space exploration for application specific systems-on-a-chip.Journal of Systems Architecture, 53(10):733–750, 2007.
Article Google Scholar
F Balasa, P Kjeldsberg, A Vandecappelle, M Palkovic, Q Hu, H Zhu, and F Catthoor. Storage Estimation and Design Space Exploration Methodologies for the Memory Management of Signal Processing Applications.Journal of Signal Processing Systems, 53(1):51–71, Nov 2008.
Article Google Scholar
Yong Chen, Surendra Byna, Xian-He Sun, Rajeev Thakur, and William Gropp. Hiding i/o latency with pre-execution prefetching for parallel applications. InACM/IEEE Supercomputing Conference (SC’08), page 40, 2008.
Google Scholar
P. R. Panda, F. Catthoor, N. D. Dutt, K. Danckaert, E. Brockmeyer, C. Kulkarni, A. Vandercappelle, and P. G. Kjeldsberg. Data and memory optimization techniques for embedded systems.ACM Transactions on Design Automation of Electronic Systems, 6(2):149–206, 2001.
Article Google Scholar
H T Kung. Why systolic architectures.Computer, 15(1):37–46, 1982.
Google Scholar
Abdelkader Amar, Pierre Boulet, and Philippe Dumont. Projection of the Array-OL Specification Language onto the Kahn Process Network Computation Model. InISPAN ’05: Proceedings of the 8th International Symposium on Parallel Architectures, Algorithms and Networks, pages 496–503, 2005.
Google Scholar
D. Kim, R. Managuli, and Y. Kim. Data cache and direct memory access in programming mediaprocessors.Micro, IEEE, 21(4):33–42, Jul 2001.
Article Google Scholar
Jason D. Hiser, Jack W. Davidson, and David B. Whalley. Fast, Accurate Design Space Exploration of Embedded Systems Memory Configurations. InSAC ’07: Proceedings of the 2007 ACM symposium on Applied computing, pages 699–706, New York, NY, USA, 2007. ACM.
Google Scholar
Q. Hu, P. G. Kjeldsberg, A. Vandecappelle, M. Palkovic, and F. Catthoor. Incremental hierarchical memory size estimation for steering of loop transformations.ACM Transactions on Design Automation of Electronic Systems, 12(4):50, 2007.
Google Scholar
Yong Chen, Surendra Byna, Xian-He Sun, Rajeev Thakur, and William Gropp. Hiding I/O latency with pre-execution prefetching for parallel applications. InSC ’08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1–10, 2008.
Google Scholar
P.K. Murthy and E.A. Lee. Multidimensional synchronous dataflow.IEEE Transactions on Signal Processing, 50(8):2064–2079, Aug. 2002.
Article Google Scholar
F. Deprettere and T. Stefanov. Affine nested loop programs and their binary cyclo-static dataflow counterparts. InProc. of Conf. on Application Specific Systems, Architectures, and Processors, pages 186–190, 2006.
Google Scholar
Albert Cohen, Marc Duranton, Christine Eisenbeis, Claire Pagetti, Florence Plateau, and Marc Pouzet. N-synchronous kahn networks: a relaxed model of synchrony for real-time systems. InPOPL ’06: Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 180–193, 2006.
Google Scholar
Sylvain Girbal, Nicolas Vasilache, Cédric Bastoul, Albert Cohen, David Parello, Marc Sigler, and Olivier Temam. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies.Journal of Parallel Programming, 34:261–317, 2006.
Article MATH Google Scholar
Mark Thompson, Hristo Nikolov, Todor Stefanov, Andy D. Pimentel, Cagkan Erbas, Simon Polstra, and Ed F. Deprettere. A framework for rapid system-level exploration, synthesis, and programming of multimedia mp-socs. InProceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis, CODES+ISSS’07, pages 9–14, New York, NY, USA, 2007. ACM.
Google Scholar
Scott Fischaber, Roger Woods, and John McAllister. Soc memory hierarchy derivation from dataflow graphs.Journal of Signal Processing Systems, 60:345–361, 2010.
Article Google Scholar
Calin Glitia and Pierre Boulet. High Level Loop Transformations for Systematic Signal Processing Embedded Applications. Research Report RR-6469, INRIA, 2008.
Google Scholar
S.H. Fuller and L.I. Millett. Computing performance: Game over or next level?Computer, 44(1):31–38, Jan. 2011.
Article Google Scholar
Rosilde Corvino.Exploration de l’espace des architectures pour des systèmes de traitement d’image, analyse faite sur des blocs fondamentaux de la rétine numérique. PhD thesis, Université Joseph-Fourier - Grenoble I, France, 2009.
Google Scholar
Calin Glitia, Philippe Dumont, and Pierre Boulet. Array-OL with delays, a domain specific specification language for multidimensional intensive signal processing.Multidimensional Systems and Signal Processing (Springer Netherlands), 2010.
Google Scholar
B.C. de Lavarene, D. Alleysson, B. Durette, and J. Herault. Efficient demosaicing through recursive filtering. InIEEE International Conference on Image Processing (ICIP 07), volume 2, Oct. 2007.
Google Scholar
Jeanny Hérault and Barthélémy Durette. Modeling visual perception for image processing.Computational and Ambient Intelligence (LNCS Springer Berlin/Heidelberg), pages 662–675, 2007.
Google Scholar
Calin Glitia and Pierre Boulet. High level loop transformations for systematic signal processing embedded applications.Embedded Computer Systems: Architectures, Modeling, and Simulation (Springer), pages 187–196, 2008.
Google Scholar
Ken Kennedy and Kathryn S. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. InProceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, pages 301–320, London, UK, 1994. Springer-Verlag.
Google Scholar
Frank Hannig, Hritam Dutta, and Jürgen Teich. Parallelization approaches for hardware accelerators – loop unrolling versus loop partitioning.Architecture of Computing Systems – ARCS 2009, pages 16–27, 2009.
Google Scholar
Jingling Xue.Loop tiling for parallelism. Kluwer Academic Publishers, 2000.
Google Scholar
Preeti Ranjan Panda, Hiroshi Nakamura, Nikil D. Dutt, and Alexandru Nicolau. Augmenting loop tiling with data alignment for improved cache performance.IEEE Transactions on Computers, 48:142–149, 1999.
Google Scholar
Lushan Liu, Pradeep Nagaraj, Shambhu Upadhyaya, and Ramalingam Sridhar. Defect analysis and defect tolerant design of multi-port srams.J. Electron. Test., 24(1–3):165–179, 2008.
Google Scholar
Robert Schreiber, Shail Aditya, Scott Mahlke, Vinod Kathail, B Rau, Darren Cronquist, and Mukund Sivaraman. Pico-npa: High-level synthesis of nonprogrammable hardware accelerators.The Journal of VLSI Signal Processing, 31(2):127–142, Jun 2002.
Google Scholar
Imondi GC, Zenzo M, and Fazio MA. Pipelined Burst Memory Access, US patent, August 2008. patent.
Google Scholar
Nawaaz Ahmed, Nikolay Mateev, and Keshav Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests.International Journal of Parallel Programming, 29(5):493–544, Oct 2001.
Article MATH Google Scholar
Talal Rahwan, Sarvapali Ramchurn, Nicholas Jennings, and Andrea Giovannucci. An anytime algorithm for optimal coalition structure generation.Journal of Artificial Intelligence Research (JAIR), 34:521–567, April 2009.
Google Scholar
Abdoulaye Gamatié, Sébastien Le Beux, Éric Piel, Rabie Ben Atitallah, Anne Etien, Philippe Marquet, and Jean-Luc Dekeyser. A model driven design framework for massively parallel embedded systems.ACM Transactions on Embedded Computing Systems (TECS) ACM (To appear), preliminary version athttp://hal.inria.fr/inria-00311115/2010.

Download references

Author information

Authors and Affiliations

University of Technology Eindhoven, Den Dolech 2, 5612 AZ, Eindhoven, The Netherlands
Rosilde Corvino
LIFL/CNRS and Inria, Parc Scientifique de la Haute Borne, Park Plaza – Bâtiment A, 40 avenue Halley, Villeneuve d’Ascq, France
Abdoulaye Gamatié & Pierre Boulet

Authors

Rosilde Corvino
View author publications
You can also search for this author in PubMed Google Scholar
Abdoulaye Gamatié
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Boulet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rosilde Corvino .

Editor information

Editors and Affiliations

Dept. of Computer Science & Engineering, Florida Atlantic University, Boca Raton, 33431, Florida, USA
Borko Furht
LexisNexis, Boca Raton, 33487, Florida, USA
Armando Escalante

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Corvino, R., Gamatié, A., Boulet, P. (2011). Design Space Exploration for Efficient Data Intensive Computing on SoCs. In: Furht, B., Escalante, A. (eds) Handbook of Data Intensive Computing. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1415-5_23

Download citation

DOI: https://doi.org/10.1007/978-1-4614-1415-5_23
Published: 11 November 2011
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1414-8
Online ISBN: 978-1-4614-1415-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics