Abstract
Synchronous dataflow (SDF) graphs are often the computational model of choice for specification, analysis, and automated synthesis of parallel streaming kernels targeting embedded multiprocessor system-on-a-chip (MPSoC) platforms. We discuss several limitations of the SDF graphs in the context of conventional parallel software synthesis methodologies, and highlight the associated degradation in analysis accuracy and performance of the synthesized software. Subsequently, we propose several extensions to the strict notion of SDF graph model that address the identified issues. We present extensive empirical evaluations, which underscore the model limitations and the effectiveness of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Auto-concurrency, i.e., multiple concurrent firings of an actor, is not allowed in our discussion.
- 2.
Here, we use the terms “steady-state throughput” and “throughput” interchangeably.
- 3.
We also experimented with SDF3 benchmarks in Sect. 4.2.5.2. However SDF3 benchmarks merely include graph parameters and not task implementations. Thus, we could only perform the experiments shown in Fig. 4.6a, b and not c. Detailed results are omitted due space limits. For SDF3 benchmarks, on average, buffer size reduction using implementation aware analysis is 6×, and runtime ratio of implementation aware over implementation oblivious is 5×.
- 4.
The discussion does not pertain to sorting of large databases which do not entirely fit in the memory.
References
M. Ade, R. Lauwereins, J. Peperstraete, Data memory minimisation for synchronous data flow graphs emulated on DSP-FPGA targets, in Design Automation Conference, 1997
M.A. Bamakhrama, T.P. Stefanov, On the hard-real-time scheduling of embedded streaming applications. Des. Autom. Embed. Syst. Springer Netherlands, 17 (2), 221–249 (2012)
K.M. Barijough, M. Hashemi, V. Khibin, S. Ghiasi, Implementation-aware model analysis: the case of buffer-throughput tradeoff in streaming applications, in Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems, 2015, p. 11
S.S. Battacharyya, E.A. Lee, P.K. Murthy, Software Synthesis from Dataflow Graphs (Kluwer, Boston, 1996)
S. Bell et al., Tile64 - processor: a 64-core soc with mesh interconnect, in International Solid-State Circuits Conference, 2008
Benchmarks, http://sharif.edu/~matin and http://leps.ece.ucdavis.edu
B. Bhattacharya, S. Bhattacharyya, Parameterized dataflow modeling for DSP systems. IEEE Trans. Signal Process. 49 (10), 2408–2421 (2001)
S.S. Bhattacharyya, P.K. Murthy, E.A. Lee, Software Synthesis from Dataflow Graphs (Springer, Berlin, 1996)
J.A. Cataldo, The power of higher-order composition languages in system design. Ph.D. thesis, University of California, Berkeley, 2006
J.-L. Colaço, A. Girault, G. Hamon, M. Pouzet, Towards a higher-order synchronous data-flow language, in International Conference on Embedded Software, 2004, pp. 230–239
M.H. Foroozannejad, M. Hashemi, T.L. Hodges, S. Ghiasi, Look into details: the benefits of fine-grain streaming buffer analysis, in Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems, 2010, pp. 27–36
M.H. Foroozannejad, T. Hodges, M. Hashemi, S. Ghiasi, Postscheduling buffer management trade-offs in streaming software synthesis. ACM Trans. Des. Autom. Electron. Syst. 17 (3), 27 (2012)
M.H. Foroozannejad, M. Hashemi, A. Mahini, B.M. Baas, S. Ghiasi, Time-scalable mapping for circuit-switched gals chip multiprocessor platforms. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 33 (5), 752–762 (2014)
P. Fradet, A. Girault, P. Poplavko, A schedulable parametric data-flow MoC, in Proceedings of the Conference on Design Automation and Test in Europe, 2012
M. Geilen, Reduction techniques for synchronous dataflow graphs, in Design Automation Conference, 2009
A.H. Ghamarian et al., Throughput analysis of synchronous data flow graphs, in International Conference on Application of Concurrency to System Design, 2006
M. Gordon, Compiler techniques for scalable performance of stream programs on multicore architectures. Ph.D. thesis, Massachusetts Institute of Technology, 2010
Graphite, http://graphite.csail.mit.edu
M. Hashemi, Automated software synthesis for streaming applications on embedded manycore processors. PhD thesis, University of California, Davis, 2011
M. Hashemi, S. Ghiasi, Exact and approximate task assignment algorithms for pipelined software synthesis, in Proceedings of the Conference on Design Automation and Test in Europe, 2008, pp. 746–751
M. Hashemi, S. Ghiasi, Throughput-driven synthesis of embedded software for pipelined execution on multicore architectures. ACM Trans. Embed. Comput. Syst. 8, 11 (2009)
M. Hashemi, S. Ghiasi, Versatile task assignment for heterogeneous soft dual-processor platforms. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 29 (3) (2010)
M. Hashemi, M.H. Foroozannejad, S. Ghiasi, C. Etzel, Formless: Scalable utilization of embedded manycores in streaming applications, in Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems, 2012, pp. 71–78
M. Hashemi, M.H. Foroozannejad, S. Ghiasi, Throughput-memory footprint trade-off in synthesis of streaming software on embedded multiprocessors. ACM Trans. Embed. Comput. Syst. 13 (3) (2013)
P.-K. Huang, M. Hashemi, S. Ghiasi, System-level performance estimation for application-specific MPSoC interconnect synthesis, in Proceedings of the 2008 Symposium on Application Specific Processors, 2008, pp. 95–100
G. Karypis, V. Kumar, METIS 4.0: unstructured graph partitioning and sparse matrix ordering system. Technical Report, Department of Computer Science. University of Minnesota, Minneapolis, 1998
E.A. Lee, D.G. Messerschmitt, Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. Comput. 36, 24–35 (1987)
E.A. Lee, D.G. Messerschmitt, Synchronous data flow. Proc. IEEE 75 (9), 1235–1245 (1987)
T. Mohsenin, D. Truong, B. Baas, Multi-split-row threshold decoding implementations for LDPC codes, in International Symposium on Circuits and Systems, 2009
A. Moonen et al., Practical and accurate throughput analysis with the cyclo static dataflow model, in International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2007
O.M. Moreira, M.J. Bekooij, Self-timed scheduling analysis for real-time applications. EURASIP J. Adv. Signal Process. 2007, 14 (2007)
J. Nickolls et al., Scalable parallel programming with CUDA. ACM Queue 6, 40–53 (2008)
H. Oh, S. Ha, Fractional rate dataflow model for efficient code synthesis. J. VLSI Signal Process. Syst. Signal Image Video Technol. 37 (1), 41–51 (2004)
J.D. Owens, U.J. Kapasi, P. Mattson, B. Towles, B. Serebrin, S. Rixner, W.J. Dally, Media processing applications on the imagine stream processor, in International Conference on Computer Design, 2002, pp. 295–302.
K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation (Wiley, New York, 2008)
A. Pinto, A. Bonivento, A.L. Sangiovanni-Vincentelli, R. Passerone, M. Sgroi, System level design paradigms: Platform-based design and communication synthesis. ACM Trans. Des. Autom. Electron. Syst. 11 (3), 537–563 (2006)
A. Sangiovanni-Vincentelli, G. Martin, A vision for embedded systems: platform-based design and software methodology. Des. Test Comput. 18 (6), 23–33 (2001)
A. Sangiovanni-Vincentelli, L. Carloni, F. De Bernardinis, M. Sgroi, Benefits and challenges for platform-based design, in Design Automation Conference, 2004. Proceedings. 41st, 2004, pp. 409–414
S. Stuijk, Predictable mapping of streaming applications on multiprocessors. Ph.D. thesis, Eindhoven University of Technology, The Netherlands, 2007
S. Stuijk et al., Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs, in Design Automation Conference, 2006
S. Stuijk, M. Geilen, T. Basten, Throughput-buffering trade-off exploration for cyclo-static and synchronous dataflow graphs. IEEE Trans. Comput. 57 (10), (2008)
W. Taha, A gentle introduction to multi-stage programming. Domain-Specific Program Generation (Springer, Berlin, 2003), pp. 30–50
B. Theelen et al., A scenario-aware data flow model for combined long-run average and worst-case performance analysis, in Proceedings of the International Conference on Formal Methods and Models in CoDesign, 2006 http://dl.acm.org/citation.cfm?id=2674331
W. Thies, Language and compiler support for stream programs. Ph.D. thesis, Massachusetts Institute of Technology, 2009
W. Thies et al., Streamit: a language for streaming applications, in International Conference on Compiler Construction, 2002
D. Truong et al., A 167-processor 65 nm computational platform with per-processor dynamic supply voltage and dynamic clock frequency scaling, in IEEE Symposium on VLSI Circuits, 2008
M.H. Wiggers, M.J. Bekooij, G.J. Smit, Buffer capacity computation for throughput constrained streaming applications with data-dependent inter-task communication, in IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2008
Z. Xiao, B. Baas, 1080p h.264/avc baseline residual encoder for a fine-grained many-core system. IEEE Trans. Circuits Syst. Video Technol. 21, 890–902 (2011)
Y. Zhou, E.A. Lee, A causality interface for deadlock analysis in dataflow, in International Conference on Embedded Software, 2006, pp. 44–52
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Hashemi, M., Barijough, K.M., Ghiasi, S. (2017). Throughput-Driven Parallel Embedded Software Synthesis from Synchronous Dataflow Models: Caveats and Remedies. In: Molnos, A., Fabre, C. (eds) Model-Implementation Fidelity in Cyber Physical System Design. Springer, Cham. https://doi.org/10.1007/978-3-319-47307-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-47307-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47306-2
Online ISBN: 978-3-319-47307-9
eBook Packages: EngineeringEngineering (R0)