Skip to main content

Advertisement

Log in

An Adaptive Heterogeneous Runtime Framework for Irregular Applications

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Future computing devices are likely to be based on heterogeneous architectures, which comprise of multi-core CPUs accompanied with GPU or special purpose accelerators. A challenging issue for such devices is how to effectively manage the resources to achieve high efficiency and low energy consumption. With multiple new programming models and advanced framework support for heterogeneous computing, we have seen many regular applications benefit greatly from heterogeneous systems. However, migrating the success of heterogeneous computing to irregulars remains a challenge. An irregular program's attribute may vary during execution and are often unpredictable, making it difficult to allocate heterogeneous resources to achieve the highest efficiency. Moreover, the irregularity in applications may cause control flow divergence, load imbalance and low efficiency in parallel execution. To resolve these issues, we studied and proposed phase guided dynamic work partitioning, a light-weight and fast analysis technique, to collect information during program phases at runtime in order to guide work partitioning in subsequent phases for more efficient work dispatching on heterogeneous systems. We implemented an adaptive Runtime System based on the aforementioned technique and take Ray-Tracing to explore the performance potential of dynamic work distribution techniques in our framework. The experiments have shown that the performance gain of this approach can be as high as 5 times faster than the original system. The proposed techniques can be applied to other irregular applications with similar properties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18

Similar content being viewed by others

References

  1. Nvidia, C. (2011). Nvidia Cuda Programming Guide.

  2. AMD, A. (2008). Ati Close to the Metal (ctm) Guide.

  3. Munshi, A., & et al. (2009). The Opencl Specification. Khronos OpenCL Working Group, 1, l1–15.

    Google Scholar 

  4. Kulkarni, M., Burtscher, M., Inkulu, R., Pingali, K., Casċaval, C. (2009). How much parallelism is there in irregular applications? In ACM sigplan notices (Vol. 44 pp. 3–14). ACM.

    Article  Google Scholar 

  5. Kumar, R., Tullsen, D., Jouppi, N., Ranganathan, P. (2005). Heterogeneous chip multiprocessors. Computer, 38 (11), 32–38.

    Article  Google Scholar 

  6. Pingali, K., Kulkarni, M., Nguyen, D., Burtscher, M., Mendez-Lojo, M., Prountzos, D., Sui, X., Zhong, Z. Amorphous Data-Parallelism in Irregular Algorithms.

  7. Cook, R. L., Porter, T., Carpenter, L. (1984). Distributed ray tracing. In Proceedings of the 11th annual conference on computer graphics and interactive techniques, Ser. SIGGRAPH ’84, (pp. 137–145). New York, NY, USA: ACM. [Online]. Available. doi:10.1145/800031.808590.

  8. Purcell, T. J., Buck, I., Mark, W. R., Hanrahan, P. (2002). Ray tracing on programmable graphics hardware, In Proceedings of the 29th annual conference on computer graphics and interactive techniques, series. SIGGRAPH ’02 (pp. 703–712). New York: ACM. [Online]. Available. doi:10.1145/566570.566640.

  9. Pharr, M., & Humphreys, G. (2010). Physically based rendering: from theory to implementation. Morgan Kaufmann.

  10. Burtscher, M., Nasre, R., Pingali, K. (2012). A quantitative study of irregular programs on Gpus. In 2012 IEEE international symposium on IEEE workload characterization (IISWC) (pp. 141–151).

  11. Zhang, E.Z., Jiang, Y., Guo, Z., Tian, K., Shen, X. (2011). On-the-fly elimination of dynamic irregularities for Gpu computing. ACM SIGARCH Computer Architecture News, 39(1), 369–380. ACM.

    Article  Google Scholar 

  12. Monteiro, P., & Monteiro, M. P. (2010). A pattern language for parallelizing irregular algorithms. In Proceedings of the 2010 workshop on parallel programming patterns, series. ParaPLoP ’10 (pp. 13:1–13:14). New York: ACM. [Online]. Available. doi:10.1145/1953611.1953624.

  13. Nasre, R., Burtscher, M., Pingali, K. (2013). Data-driven versus topology-driven irregular computations on Gpus. In 2013 IEEE 27th international symposium on IEEE parallel and distributed processing(IPDPS) (pp. 463–474).

  14. Gummaraju, J., Morichetti, L., Houston, M., Sander, B., Gaster, B. R., Zheng, B. (2010). Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors. In Proceedings of the 19th international conference on parallel architectures and compilation techniques, series. PACT ’10 (pp. 205–216). New York: ACM. [Online]. Available: doi 10.1145/1854273.1854302.

  15. Lattner, C., & Adve, V. (2004). Llvm: a compilation framework for lifelong program analysis transformation. In International symposium on code generation and optimization, 2004, CGO 2004 (pp. 75–86).

  16. Stratton, J. A., Stone, S. S., Wen-mei, W.H. (2008). Mcuda: an efficient implementation of cuda kernels for multi-core Cpus. In Languages and compilers for parallel computing (pp. 16–30). Springer.

  17. Wang, P. H., Collins, J.D., Chinya, G. N., Jiang, H., Tian, X., Girkar, M., Yang, N. Y., Lueh, G.-Y., Wang, H. (2007). Exochi: architecture and programming environment for a heterogeneous multi-core multithreaded system. In Proceedings of the 2007 ACM SIGPLAN conference on programming language design and implementation. series. PLDI ’07 (pp. 156–166). New York: ACM. [Online]. Available. doi:10.1145/1250734.1250753.

  18. Linderman, M. D., Collins, J. D., Wang, H., Meng, T. H. (2008). Merge: a programming model for heterogeneous multi-core systems. In Proceedings of the 13th international conference on architectural support for programming languages and operating systems series. ASPLOS XIII (pp. 287–296). New York: ACM. [Online]. Available. doi:10.1145/1346281.1346318.

  19. Luk, C.-K., Hong, S., Kim, H. (2009). Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture, series. MICRO 42 (pp. 45–55). New York: ACM. [Online]. Available. doi:10.1145/1669112.1669121.

  20. Lee, C., Ro, W.W., Gaudiot, J.-L. (2012). Cooperative heterogeneous computing for parallel processing on CPU/GPU hybrids. In 2012 16th workshop on IEEE interaction between compilers and computer architectures (INTERACT) (pp. 33–40).

  21. Burtscher, M., & Rabeti, H. (2013). A scalable heterogeneous parallelization framework for iterative local searches. In 2013 IEEE 27th international symposium on parallel distributed processing (IPDPS) (pp. 1289–1298).

  22. Komatsu, K., Sato, K., Arai, Y., Koyama, K., Takizawa, H., Kobayashi, H. (June 2010). Evaluating performance and portability of opencl programs. In The 5th international workshop on automatic performance tuning.

  23. Aila, T., & Laine, S. (2009). Understanding the efficiency of ray traversal on Gpus. In Proceedings of the conference on high performance graphics 2009 (pp. 145–149). ACM.

  24. Benthin, C., Wald, I., Scherbaum, M., Friedrich, H. (2006). Ray tracing on the cell processor. In IEEE symposium on interactive ray tracing 2006 (pp. 15–23). IEEE.

  25. Billeter, M., Olsson, O, Assarsson, U. (2009). Efficient stream compaction on wide simd many-core architectures. In Proceedings of the conference on high performance graphics 2009 (pp. 159–166). ACM.

  26. Wald, I., & Havran, V. (2006). On building fast kd-trees for ray tracing, and on doing that in o(n Log n). In IEEE symposium on interactive ray tracing 2006 (pp. 61–69).

  27. Soupikov, A., Shevtsov, M., Kapustin, A. (2008). Improving Kd-tree quality at a reasonable construction cost. In IEEE Symposium on interactive ray tracing, 2008. RT 2008 (pp. 67–72).

  28. Dammertz, H., Hanika, J., Keller, A. (2008). Shallow bounding volume hierarchies for fast simd ray tracing of incoherent rays. In Proceedings of the 19th eurographics conference on rendering, series. EGSR’08, (pp. 1225–1233). Aire-la-Ville, Switzerland: Eurographics Association. [Online]. Available. doi:10.1111/j.1467-8659.2008.01261.x.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chih-Chen Kao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kao, CC., Hsu, WC. An Adaptive Heterogeneous Runtime Framework for Irregular Applications. J Sign Process Syst 80, 245–259 (2015). https://doi.org/10.1007/s11265-014-0916-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-014-0916-x

Keywords

Navigation