Throughput-Driven Parallel Embedded Software Synthesis from Synchronous Dataflow Models: Caveats and Remedies

Hashemi, Matin; Barijough, Kamyar Mirzazad; Ghiasi, Soheil

doi:10.1007/978-3-319-47307-9_4

Matin Hashemi³,
Kamyar Mirzazad Barijough³ &
Soheil Ghiasi⁴

454 Accesses

Abstract

Synchronous dataflow (SDF) graphs are often the computational model of choice for specification, analysis, and automated synthesis of parallel streaming kernels targeting embedded multiprocessor system-on-a-chip (MPSoC) platforms. We discuss several limitations of the SDF graphs in the context of conventional parallel software synthesis methodologies, and highlight the associated degradation in analysis accuracy and performance of the synthesized software. Subsequently, we propose several extensions to the strict notion of SDF graph model that address the identified issues. We present extensive empirical evaluations, which underscore the model limitations and the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Auto-concurrency, i.e., multiple concurrent firings of an actor, is not allowed in our discussion.
2.
Here, we use the terms “steady-state throughput” and “throughput” interchangeably.
3.
We also experimented with SDF3 benchmarks in Sect. 4.2.5.2. However SDF3 benchmarks merely include graph parameters and not task implementations. Thus, we could only perform the experiments shown in Fig. 4.6a, b and not c. Detailed results are omitted due space limits. For SDF3 benchmarks, on average, buffer size reduction using implementation aware analysis is 6×, and runtime ratio of implementation aware over implementation oblivious is 5×.
4.
The discussion does not pertain to sorting of large databases which do not entirely fit in the memory.

References

M. Ade, R. Lauwereins, J. Peperstraete, Data memory minimisation for synchronous data flow graphs emulated on DSP-FPGA targets, in Design Automation Conference, 1997
Google Scholar
M.A. Bamakhrama, T.P. Stefanov, On the hard-real-time scheduling of embedded streaming applications. Des. Autom. Embed. Syst. Springer Netherlands, 17 (2), 221–249 (2012)
Article Google Scholar
K.M. Barijough, M. Hashemi, V. Khibin, S. Ghiasi, Implementation-aware model analysis: the case of buffer-throughput tradeoff in streaming applications, in Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems, 2015, p. 11
Google Scholar
S.S. Battacharyya, E.A. Lee, P.K. Murthy, Software Synthesis from Dataflow Graphs (Kluwer, Boston, 1996)
Book MATH Google Scholar
S. Bell et al., Tile64 - processor: a 64-core soc with mesh interconnect, in International Solid-State Circuits Conference, 2008
Google Scholar
Benchmarks, http://sharif.edu/~matin and http://leps.ece.ucdavis.edu
B. Bhattacharya, S. Bhattacharyya, Parameterized dataflow modeling for DSP systems. IEEE Trans. Signal Process. 49 (10), 2408–2421 (2001)
Article MathSciNet Google Scholar
S.S. Bhattacharyya, P.K. Murthy, E.A. Lee, Software Synthesis from Dataflow Graphs (Springer, Berlin, 1996)
Book MATH Google Scholar
J.A. Cataldo, The power of higher-order composition languages in system design. Ph.D. thesis, University of California, Berkeley, 2006
Google Scholar
J.-L. Colaço, A. Girault, G. Hamon, M. Pouzet, Towards a higher-order synchronous data-flow language, in International Conference on Embedded Software, 2004, pp. 230–239
Google Scholar
M.H. Foroozannejad, M. Hashemi, T.L. Hodges, S. Ghiasi, Look into details: the benefits of fine-grain streaming buffer analysis, in Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems, 2010, pp. 27–36
Google Scholar
M.H. Foroozannejad, T. Hodges, M. Hashemi, S. Ghiasi, Postscheduling buffer management trade-offs in streaming software synthesis. ACM Trans. Des. Autom. Electron. Syst. 17 (3), 27 (2012)
Google Scholar
M.H. Foroozannejad, M. Hashemi, A. Mahini, B.M. Baas, S. Ghiasi, Time-scalable mapping for circuit-switched gals chip multiprocessor platforms. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 33 (5), 752–762 (2014)
Article Google Scholar
P. Fradet, A. Girault, P. Poplavko, A schedulable parametric data-flow MoC, in Proceedings of the Conference on Design Automation and Test in Europe, 2012
Google Scholar
M. Geilen, Reduction techniques for synchronous dataflow graphs, in Design Automation Conference, 2009
Google Scholar
A.H. Ghamarian et al., Throughput analysis of synchronous data flow graphs, in International Conference on Application of Concurrency to System Design, 2006
Google Scholar
M. Gordon, Compiler techniques for scalable performance of stream programs on multicore architectures. Ph.D. thesis, Massachusetts Institute of Technology, 2010
Google Scholar
Graphite, http://graphite.csail.mit.edu
M. Hashemi, Automated software synthesis for streaming applications on embedded manycore processors. PhD thesis, University of California, Davis, 2011
Google Scholar
M. Hashemi, S. Ghiasi, Exact and approximate task assignment algorithms for pipelined software synthesis, in Proceedings of the Conference on Design Automation and Test in Europe, 2008, pp. 746–751
Google Scholar
M. Hashemi, S. Ghiasi, Throughput-driven synthesis of embedded software for pipelined execution on multicore architectures. ACM Trans. Embed. Comput. Syst. 8, 11 (2009)
Article Google Scholar
M. Hashemi, S. Ghiasi, Versatile task assignment for heterogeneous soft dual-processor platforms. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 29 (3) (2010)
Google Scholar
M. Hashemi, M.H. Foroozannejad, S. Ghiasi, C. Etzel, Formless: Scalable utilization of embedded manycores in streaming applications, in Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems, 2012, pp. 71–78
Google Scholar
M. Hashemi, M.H. Foroozannejad, S. Ghiasi, Throughput-memory footprint trade-off in synthesis of streaming software on embedded multiprocessors. ACM Trans. Embed. Comput. Syst. 13 (3) (2013)
Google Scholar
P.-K. Huang, M. Hashemi, S. Ghiasi, System-level performance estimation for application-specific MPSoC interconnect synthesis, in Proceedings of the 2008 Symposium on Application Specific Processors, 2008, pp. 95–100
Google Scholar
G. Karypis, V. Kumar, METIS 4.0: unstructured graph partitioning and sparse matrix ordering system. Technical Report, Department of Computer Science. University of Minnesota, Minneapolis, 1998
Google Scholar
E.A. Lee, D.G. Messerschmitt, Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. Comput. 36, 24–35 (1987)
Article Google Scholar
E.A. Lee, D.G. Messerschmitt, Synchronous data flow. Proc. IEEE 75 (9), 1235–1245 (1987)
Article Google Scholar
T. Mohsenin, D. Truong, B. Baas, Multi-split-row threshold decoding implementations for LDPC codes, in International Symposium on Circuits and Systems, 2009
Google Scholar
A. Moonen et al., Practical and accurate throughput analysis with the cyclo static dataflow model, in International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2007
Google Scholar
O.M. Moreira, M.J. Bekooij, Self-timed scheduling analysis for real-time applications. EURASIP J. Adv. Signal Process. 2007, 14 (2007)
Article MATH Google Scholar
J. Nickolls et al., Scalable parallel programming with CUDA. ACM Queue 6, 40–53 (2008)
Article Google Scholar
H. Oh, S. Ha, Fractional rate dataflow model for efficient code synthesis. J. VLSI Signal Process. Syst. Signal Image Video Technol. 37 (1), 41–51 (2004)
Article Google Scholar
J.D. Owens, U.J. Kapasi, P. Mattson, B. Towles, B. Serebrin, S. Rixner, W.J. Dally, Media processing applications on the imagine stream processor, in International Conference on Computer Design, 2002, pp. 295–302.
Google Scholar
K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation (Wiley, New York, 2008)
Google Scholar
A. Pinto, A. Bonivento, A.L. Sangiovanni-Vincentelli, R. Passerone, M. Sgroi, System level design paradigms: Platform-based design and communication synthesis. ACM Trans. Des. Autom. Electron. Syst. 11 (3), 537–563 (2006)
Article Google Scholar
A. Sangiovanni-Vincentelli, G. Martin, A vision for embedded systems: platform-based design and software methodology. Des. Test Comput. 18 (6), 23–33 (2001)
Article Google Scholar
A. Sangiovanni-Vincentelli, L. Carloni, F. De Bernardinis, M. Sgroi, Benefits and challenges for platform-based design, in Design Automation Conference, 2004. Proceedings. 41st, 2004, pp. 409–414
Google Scholar
SDF3, http://www.es.ele.tue.nl/sdf3
S. Stuijk, Predictable mapping of streaming applications on multiprocessors. Ph.D. thesis, Eindhoven University of Technology, The Netherlands, 2007
Google Scholar
S. Stuijk et al., Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs, in Design Automation Conference, 2006
Google Scholar
S. Stuijk, M. Geilen, T. Basten, Throughput-buffering trade-off exploration for cyclo-static and synchronous dataflow graphs. IEEE Trans. Comput. 57 (10), (2008)
Google Scholar
W. Taha, A gentle introduction to multi-stage programming. Domain-Specific Program Generation (Springer, Berlin, 2003), pp. 30–50
Google Scholar
B. Theelen et al., A scenario-aware data flow model for combined long-run average and worst-case performance analysis, in Proceedings of the International Conference on Formal Methods and Models in CoDesign, 2006 http://dl.acm.org/citation.cfm?id=2674331
Google Scholar
W. Thies, Language and compiler support for stream programs. Ph.D. thesis, Massachusetts Institute of Technology, 2009
Google Scholar
W. Thies et al., Streamit: a language for streaming applications, in International Conference on Compiler Construction, 2002
Google Scholar
D. Truong et al., A 167-processor 65 nm computational platform with per-processor dynamic supply voltage and dynamic clock frequency scaling, in IEEE Symposium on VLSI Circuits, 2008
Google Scholar
M.H. Wiggers, M.J. Bekooij, G.J. Smit, Buffer capacity computation for throughput constrained streaming applications with data-dependent inter-task communication, in IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2008
Google Scholar
Z. Xiao, B. Baas, 1080p h.264/avc baseline residual encoder for a fine-grained many-core system. IEEE Trans. Circuits Syst. Video Technol. 21, 890–902 (2011)
Google Scholar
Y. Zhou, E.A. Lee, A causality interface for deadlock analysis in dataflow, in International Conference on Embedded Software, 2006, pp. 44–52
Google Scholar

Download references

Author information

Authors and Affiliations

Sharif University of Technology, Tehran, Iran
Matin Hashemi & Kamyar Mirzazad Barijough
University of California, Davis, CA, USA
Soheil Ghiasi

Authors

Matin Hashemi
View author publications
You can also search for this author in PubMed Google Scholar
Kamyar Mirzazad Barijough
View author publications
You can also search for this author in PubMed Google Scholar
Soheil Ghiasi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matin Hashemi .

Editor information

Editors and Affiliations

Campus MINATEC, CEA, Grenoble, France
Anca Molnos
Campus MINATEC, CEA, Grenoble, France
Christian Fabre

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hashemi, M., Barijough, K.M., Ghiasi, S. (2017). Throughput-Driven Parallel Embedded Software Synthesis from Synchronous Dataflow Models: Caveats and Remedies. In: Molnos, A., Fabre, C. (eds) Model-Implementation Fidelity in Cyber Physical System Design. Springer, Cham. https://doi.org/10.1007/978-3-319-47307-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-47307-9_4
Published: 10 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47306-2
Online ISBN: 978-3-319-47307-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics