Toward Efficient Execution of RVC-CAL Dataflow Programs on Multicore Platforms

  • Ilkka Hautala
  • Jani Boutellier
  • Teemu Nyländen
  • Olli Silvén
Article
  • 27 Downloads

Abstract

The increasing number of cores in System on Chips (SoC) has introduced challenges in software parallelization. As an answer to this, the dataflow programming model offers a concurrent and reusability promoting approach for describing applications. In this work, a runtime for executing Dataflow Process Networks (DPN) on multicore platforms is proposed. The main difference between this work and existing methods is letting the operating system perform Central processing unit (CPU) load-balancing freely, instead of limiting thread migration between processing cores through CPU affinity. The proposed runtime is benchmarked on desktop and server multicore platforms using five different applications from video coding and telecommunication domains. The results show that the proposed method offers significant improvements over the state-of-art, in terms of performance and reliability.

Keywords

Dataflow Process Networks RVC-CAL Orcc Multicore 

References

  1. 1.
    Abadi, M, & et al. (2016). Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.
  2. 2.
    Abadi, M., Isard, M., & Murray, D. (2017). Actor merging for dataflow process networks. In Proceedings of the 1st ACM SIGPLAN international workshop on machine learning and programming languages (pp. 1–7).Google Scholar
  3. 3.
    Bonfietti, A., Benini, L., Lombardi, M., & Milano, M. (2010). An efficient and complete approach for throughput-maximal SDF allocation and scheduling on multi-core platforms. In Design, automation & test in Europe conference & exhibition (DATE) (pp. 897–902).Google Scholar
  4. 4.
    Boutellier, J., Ersfolk, J., Lilius, J., Mattavelli, M., Roquier, G., & Silven, O. (2015). Actor merging for dataflow process networks. IEEE Transactions on Signal Processing, 63(10), 2496–2508.MathSciNetCrossRefGoogle Scholar
  5. 5.
    Buck, J.T., & Lee, E.A. (1993). Scheduling dynamic dataflow graphs with bounded memory using the token flow model. In 1993 IEEE international conference on acoustics, speech, and signal processing (pp. 429–432).Google Scholar
  6. 6.
    Boutellier, J., & Ghazi, A. (2015). Multicore execution of dynamic dataflow programs on the distributed application layer. In 2015 IEEE global conference on signal and information processing (GlobalSIP) (pp. 893–897).Google Scholar
  7. 7.
    Carlsson, A., Eker, J., Olsson, T., & Von Platen, C. (2010). Scalable parallelism using dataflow programming. Ericsson Review, 2(1), 16–21.Google Scholar
  8. 8.
    Chavarrias, M., Pescador, F., Garrido, M., Juárez, E., & Raulet, M. (2013). A DSP-based HEVC decoder implementation using an actor language dataflow model. IEEE Transactions on Consumer Electronics, 59 (4), 839–847.CrossRefGoogle Scholar
  9. 9.
    Chavarrias, M., Pescador, F., Garrido, M., Juárez, E., & Sanz, C. (2015). A multicore DSP HEVC decoder using an actor based dataflow model and OpenMP. IEEE Transactions on Consumer Electronics, 61(2), 236–244.CrossRefGoogle Scholar
  10. 10.
    Chavarrias, M., Pescador, F., Garrido, M., & Juárez, E. (2014). An automatic tool for the static distribution of actors in RVC-CAL based multicore designs. In IEEE 2014 conference on design of circuits and integrated circuits (DCIS) (pp. 1–6).Google Scholar
  11. 11.
    Eker, J., & Janneck, J. (2003). Cal language report. Technical report.Google Scholar
  12. 12.
    Gautier, T., Lima, J.V., Maillard, N., & Raffin, B. (2013). Xkaapi: a runtime system for data-flow task programming on heterogeneous architectures. In 2013 IEEE 27th international symposium on parallel & distributed processing (IPDPS) (pp. 1299–1308).Google Scholar
  13. 13.
    Horowitz, M. (2014). Computing’s energy problem (and what we can do about it). In 2014 IEEE international solid-state circuits conference digest of technical papers (ISSCC) (pp. 10–14).Google Scholar
  14. 14.
    Kahn, G. (1974). The semantics of a simple language for parallel programming. Information Processing, 74, 471–475.MathSciNetMATHGoogle Scholar
  15. 15.
    Lameter, C. (2013). Numa (non-uniform memory access): an overview. Queue, 11(7), 40.CrossRefGoogle Scholar
  16. 16.
    Lee, E.A., & Parks, T.M. (1995). Dataflow process networks. Proceedings of the IEEE, 83(5), 773–801.CrossRefGoogle Scholar
  17. 17.
    Lee, E.A., & Messerschmitt, D.G. (1987). Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on computers, 100(1), 24–35.CrossRefGoogle Scholar
  18. 18.
    Chao, L-F., & Hsing-Mean, S. (1997). Scheduling data-flow graphs via retiming and unfolding. IEEE Transactions on Parallel and Distributed Systems, 8(12), 1259–1267.CrossRefGoogle Scholar
  19. 19.
    Sahu, P.K., & Chattopadhy S. (2013). A survey on application mapping strategies for Network-on-Chip design. Journal of Systems Architecture, 59(1), 60–76.CrossRefGoogle Scholar
  20. 20.
    Sbîrlea, A, Zou, Y., Budimlíc, Z., Cong, J., & Sarkar, V. (2012). Mapping a data-flow programming model onto heterogeneous platforms. ACM SIGPLAN Notices, 47, 61–70.Google Scholar
  21. 21.
    Schor, L., Bacivarov, I., Rai, D., Yang, H., Kang, S.-H., & Thiele, L. (2012). Scenario-based design flow for mapping streaming applications onto on-chip many-core systems. In Proceedings of the international conference on compilers architecture and synthesis for embedded systems (CASES) (pp. 71–80).Google Scholar
  22. 22.
    Singh, A.K., Shafique, M., Kumar, A., & Henkel, J. (2013). Mapping on multi/many-core systems: survey of current and emerging trends. In Proceedings of the 50th annual design automation conference.Google Scholar
  23. 23.
    Yviquel, H., Casseau, E., Raulet, M., Jääskeläinen, P., & Takala, J. (2013a). Towards run-time actor mapping of dynamic dataflow programs onto multi-core platforms. In 2013 8th international symposium on image and signal processing and analysis (pp. 732–737).Google Scholar
  24. 24.
    Yviquel, H., Casseau, E., Wipliez, M., & Raulet, M. (2011). Efficient multicore scheduling of dataflow process networks. In IEEE workshop on signal processing systems (SiPS) (pp. 198–203). Beyrouth.Google Scholar
  25. 25.
    Yviquel, H., Lorence, A., Jerbi, K., Cocherel, G., Sanchez, A., & Raulet, M. (2013b). Orcc: multimedia development made easy. In Proceedings of the 21st ACM international conference on multimedia (pp. 863–866).Google Scholar
  26. 26.
    Yviquel, H., Sanchez, A., Jääskeläinen, P., Takala, J., Raulet, M., & Casseau, E. (2015). Embedded multi-core systems dedicated to dynamic dataflow programs. Journal of Signal Processing Systems, 80 (1), 121–136.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringUniversity of OuluOuluFinland
  2. 2.Department of Pervasive ComputingTampere University of TechnologyTampereFinland

Personalised recommendations