Skip to main content

Throughput-Driven Parallel Embedded Software Synthesis from Synchronous Dataflow Models: Caveats and Remedies

  • Chapter
  • First Online:
Model-Implementation Fidelity in Cyber Physical System Design

Abstract

Synchronous dataflow (SDF) graphs are often the computational model of choice for specification, analysis, and automated synthesis of parallel streaming kernels targeting embedded multiprocessor system-on-a-chip (MPSoC) platforms. We discuss several limitations of the SDF graphs in the context of conventional parallel software synthesis methodologies, and highlight the associated degradation in analysis accuracy and performance of the synthesized software. Subsequently, we propose several extensions to the strict notion of SDF graph model that address the identified issues. We present extensive empirical evaluations, which underscore the model limitations and the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Auto-concurrency, i.e., multiple concurrent firings of an actor, is not allowed in our discussion.

  2. 2.

    Here, we use the terms “steady-state throughput” and “throughput” interchangeably.

  3. 3.

    We also experimented with SDF3 benchmarks in Sect. 4.2.5.2. However SDF3 benchmarks merely include graph parameters and not task implementations. Thus, we could only perform the experiments shown in Fig. 4.6a, b and not c. Detailed results are omitted due space limits. For SDF3 benchmarks, on average, buffer size reduction using implementation aware analysis is 6×, and runtime ratio of implementation aware over implementation oblivious is 5×.

  4. 4.

    The discussion does not pertain to sorting of large databases which do not entirely fit in the memory.

References

  1. M. Ade, R. Lauwereins, J. Peperstraete, Data memory minimisation for synchronous data flow graphs emulated on DSP-FPGA targets, in Design Automation Conference, 1997

    Google Scholar 

  2. M.A. Bamakhrama, T.P. Stefanov, On the hard-real-time scheduling of embedded streaming applications. Des. Autom. Embed. Syst. Springer Netherlands, 17 (2), 221–249 (2012)

    Article  Google Scholar 

  3. K.M. Barijough, M. Hashemi, V. Khibin, S. Ghiasi, Implementation-aware model analysis: the case of buffer-throughput tradeoff in streaming applications, in Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems, 2015, p. 11

    Google Scholar 

  4. S.S. Battacharyya, E.A. Lee, P.K. Murthy, Software Synthesis from Dataflow Graphs (Kluwer, Boston, 1996)

    Book  MATH  Google Scholar 

  5. S. Bell et al., Tile64 - processor: a 64-core soc with mesh interconnect, in International Solid-State Circuits Conference, 2008

    Google Scholar 

  6. Benchmarks, http://sharif.edu/~matin and http://leps.ece.ucdavis.edu

  7. B. Bhattacharya, S. Bhattacharyya, Parameterized dataflow modeling for DSP systems. IEEE Trans. Signal Process. 49 (10), 2408–2421 (2001)

    Article  MathSciNet  Google Scholar 

  8. S.S. Bhattacharyya, P.K. Murthy, E.A. Lee, Software Synthesis from Dataflow Graphs (Springer, Berlin, 1996)

    Book  MATH  Google Scholar 

  9. J.A. Cataldo, The power of higher-order composition languages in system design. Ph.D. thesis, University of California, Berkeley, 2006

    Google Scholar 

  10. J.-L. Colaço, A. Girault, G. Hamon, M. Pouzet, Towards a higher-order synchronous data-flow language, in International Conference on Embedded Software, 2004, pp. 230–239

    Google Scholar 

  11. M.H. Foroozannejad, M. Hashemi, T.L. Hodges, S. Ghiasi, Look into details: the benefits of fine-grain streaming buffer analysis, in Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems, 2010, pp. 27–36

    Google Scholar 

  12. M.H. Foroozannejad, T. Hodges, M. Hashemi, S. Ghiasi, Postscheduling buffer management trade-offs in streaming software synthesis. ACM Trans. Des. Autom. Electron. Syst. 17 (3), 27 (2012)

    Google Scholar 

  13. M.H. Foroozannejad, M. Hashemi, A. Mahini, B.M. Baas, S. Ghiasi, Time-scalable mapping for circuit-switched gals chip multiprocessor platforms. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 33 (5), 752–762 (2014)

    Article  Google Scholar 

  14. P. Fradet, A. Girault, P. Poplavko, A schedulable parametric data-flow MoC, in Proceedings of the Conference on Design Automation and Test in Europe, 2012

    Google Scholar 

  15. M. Geilen, Reduction techniques for synchronous dataflow graphs, in Design Automation Conference, 2009

    Google Scholar 

  16. A.H. Ghamarian et al., Throughput analysis of synchronous data flow graphs, in International Conference on Application of Concurrency to System Design, 2006

    Google Scholar 

  17. M. Gordon, Compiler techniques for scalable performance of stream programs on multicore architectures. Ph.D. thesis, Massachusetts Institute of Technology, 2010

    Google Scholar 

  18. Graphite, http://graphite.csail.mit.edu

  19. M. Hashemi, Automated software synthesis for streaming applications on embedded manycore processors. PhD thesis, University of California, Davis, 2011

    Google Scholar 

  20. M. Hashemi, S. Ghiasi, Exact and approximate task assignment algorithms for pipelined software synthesis, in Proceedings of the Conference on Design Automation and Test in Europe, 2008, pp. 746–751

    Google Scholar 

  21. M. Hashemi, S. Ghiasi, Throughput-driven synthesis of embedded software for pipelined execution on multicore architectures. ACM Trans. Embed. Comput. Syst. 8, 11 (2009)

    Article  Google Scholar 

  22. M. Hashemi, S. Ghiasi, Versatile task assignment for heterogeneous soft dual-processor platforms. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 29 (3) (2010)

    Google Scholar 

  23. M. Hashemi, M.H. Foroozannejad, S. Ghiasi, C. Etzel, Formless: Scalable utilization of embedded manycores in streaming applications, in Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems, 2012, pp. 71–78

    Google Scholar 

  24. M. Hashemi, M.H. Foroozannejad, S. Ghiasi, Throughput-memory footprint trade-off in synthesis of streaming software on embedded multiprocessors. ACM Trans. Embed. Comput. Syst. 13 (3) (2013)

    Google Scholar 

  25. P.-K. Huang, M. Hashemi, S. Ghiasi, System-level performance estimation for application-specific MPSoC interconnect synthesis, in Proceedings of the 2008 Symposium on Application Specific Processors, 2008, pp. 95–100

    Google Scholar 

  26. G. Karypis, V. Kumar, METIS 4.0: unstructured graph partitioning and sparse matrix ordering system. Technical Report, Department of Computer Science. University of Minnesota, Minneapolis, 1998

    Google Scholar 

  27. E.A. Lee, D.G. Messerschmitt, Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. Comput. 36, 24–35 (1987)

    Article  Google Scholar 

  28. E.A. Lee, D.G. Messerschmitt, Synchronous data flow. Proc. IEEE 75 (9), 1235–1245 (1987)

    Article  Google Scholar 

  29. T. Mohsenin, D. Truong, B. Baas, Multi-split-row threshold decoding implementations for LDPC codes, in International Symposium on Circuits and Systems, 2009

    Google Scholar 

  30. A. Moonen et al., Practical and accurate throughput analysis with the cyclo static dataflow model, in International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2007

    Google Scholar 

  31. O.M. Moreira, M.J. Bekooij, Self-timed scheduling analysis for real-time applications. EURASIP J. Adv. Signal Process. 2007, 14 (2007)

    Article  MATH  Google Scholar 

  32. J. Nickolls et al., Scalable parallel programming with CUDA. ACM Queue 6, 40–53 (2008)

    Article  Google Scholar 

  33. H. Oh, S. Ha, Fractional rate dataflow model for efficient code synthesis. J. VLSI Signal Process. Syst. Signal Image Video Technol. 37 (1), 41–51 (2004)

    Article  Google Scholar 

  34. J.D. Owens, U.J. Kapasi, P. Mattson, B. Towles, B. Serebrin, S. Rixner, W.J. Dally, Media processing applications on the imagine stream processor, in International Conference on Computer Design, 2002, pp. 295–302.

    Google Scholar 

  35. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation (Wiley, New York, 2008)

    Google Scholar 

  36. A. Pinto, A. Bonivento, A.L. Sangiovanni-Vincentelli, R. Passerone, M. Sgroi, System level design paradigms: Platform-based design and communication synthesis. ACM Trans. Des. Autom. Electron. Syst. 11 (3), 537–563 (2006)

    Article  Google Scholar 

  37. A. Sangiovanni-Vincentelli, G. Martin, A vision for embedded systems: platform-based design and software methodology. Des. Test Comput. 18 (6), 23–33 (2001)

    Article  Google Scholar 

  38. A. Sangiovanni-Vincentelli, L. Carloni, F. De Bernardinis, M. Sgroi, Benefits and challenges for platform-based design, in Design Automation Conference, 2004. Proceedings. 41st, 2004, pp. 409–414

    Google Scholar 

  39. SDF3, http://www.es.ele.tue.nl/sdf3

  40. S. Stuijk, Predictable mapping of streaming applications on multiprocessors. Ph.D. thesis, Eindhoven University of Technology, The Netherlands, 2007

    Google Scholar 

  41. S. Stuijk et al., Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs, in Design Automation Conference, 2006

    Google Scholar 

  42. S. Stuijk, M. Geilen, T. Basten, Throughput-buffering trade-off exploration for cyclo-static and synchronous dataflow graphs. IEEE Trans. Comput. 57 (10), (2008)

    Google Scholar 

  43. W. Taha, A gentle introduction to multi-stage programming. Domain-Specific Program Generation (Springer, Berlin, 2003), pp. 30–50

    Google Scholar 

  44. B. Theelen et al., A scenario-aware data flow model for combined long-run average and worst-case performance analysis, in Proceedings of the International Conference on Formal Methods and Models in CoDesign, 2006 http://dl.acm.org/citation.cfm?id=2674331

    Google Scholar 

  45. W. Thies, Language and compiler support for stream programs. Ph.D. thesis, Massachusetts Institute of Technology, 2009

    Google Scholar 

  46. W. Thies et al., Streamit: a language for streaming applications, in International Conference on Compiler Construction, 2002

    Google Scholar 

  47. D. Truong et al., A 167-processor 65 nm computational platform with per-processor dynamic supply voltage and dynamic clock frequency scaling, in IEEE Symposium on VLSI Circuits, 2008

    Google Scholar 

  48. M.H. Wiggers, M.J. Bekooij, G.J. Smit, Buffer capacity computation for throughput constrained streaming applications with data-dependent inter-task communication, in IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2008

    Google Scholar 

  49. Z. Xiao, B. Baas, 1080p h.264/avc baseline residual encoder for a fine-grained many-core system. IEEE Trans. Circuits Syst. Video Technol. 21, 890–902 (2011)

    Google Scholar 

  50. Y. Zhou, E.A. Lee, A causality interface for deadlock analysis in dataflow, in International Conference on Embedded Software, 2006, pp. 44–52

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matin Hashemi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Hashemi, M., Barijough, K.M., Ghiasi, S. (2017). Throughput-Driven Parallel Embedded Software Synthesis from Synchronous Dataflow Models: Caveats and Remedies. In: Molnos, A., Fabre, C. (eds) Model-Implementation Fidelity in Cyber Physical System Design. Springer, Cham. https://doi.org/10.1007/978-3-319-47307-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47307-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47306-2

  • Online ISBN: 978-3-319-47307-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics