Study of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications
- 29 Downloads
To overcome of the high cost of developing IoT (Internet of Things) services by vertically integrating devices and services, Open IoT has been developed to enable various IoT services to be developed by integrating horizontally separated devices and services. For Open IoT, we have proposed Tacit Computing technology to discover the devices that can provide the data users need on demand and use them dynamically. We have also proposed an automatic GPU (graphics processing unit) offloading method as an elementary technology of Tacit Computing. However, our GPU offloading method can improve only a limited number of applications because it only optimizes the extraction of parallelizable loop statements. Therefore, in this paper, to improve performances of more applications automatically, we propose an improved GPU offloading method with fewer data transfers between the CPU and GPU that can improve performance of many IoT applications. We evaluate our proposed GPU offloading method by applying it to Darknet and Fourier Transform, which are general large applications for CPU, and find that it can process them 3 times and 5 times as quickly as only using CPUs within 10-hour tuning time.
KeywordsOpen IoT GPGPU Tacit computing Data transfer optimization Genetic algorithm Automatic offloading
- Clang Website. (2018). http://llvm.org/. Accessed 20 May 2019.
- Hermann, M., Pentek, T., Otto, B. (2015). Design principles for Industrie 4.0 scenarios, Working Draft, Rechnische Universitat Dortmund. http://www.snom.mb.tu-dortmund.de/cms/de/forschung/Arbeitsberichte/Design-Principles-for-Industrie-4_0-Scenarios.pdf.
- Ishizaki, K. (2016). Transparent GPU exploitation for Java. In The fourth international symposium on computing and networking (CANDAR 2016).Google Scholar
- Laplace Equation Source Website. (2018). https://github.com/parallel-forall/cudacasts/tree/master/ep3-first-openacc-program. Accessed 20 May 2019.
- NAS.FT Website. (2018). https://www.nas.nasa.gov/publications/npb.html. Accessed 20 May 2019.
- Putnam, A., Caulfield, A.M., Chung, E.S., Chiou, D., Constantinides, K., Demme, J., Esmaeilzadeh, H., Fowers, J., Gopal, G.P., Gray, J., Haselman, M., Hauck, S., Heil, S., Hormati, A., Kim, J.-Y., Lanka, S., Larus, J., Peterson, E., Pope, S., Smith, A., Thong, J., Xiao, P.Y., Burger, D. (2014). A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the 41th annual international symposium on computer architecture (ISCA’14) (pp. 13–24).Google Scholar
- Redmon, J., & Angelova, A. (2015). Real-time grasp detection using convolutional neural networks. In IEEE international conference on robotics and automation (ICRA) (p. 2015).Google Scholar
- Sanders, J., & Kandrot, E. (2011). CUDA by example: an introduction to general-purpose GPU programming, Addison-Wesley ISBN-0131387685.Google Scholar
- Shirahata, K., Sato, H., Matsuoka, S. (2010). Hybrid map task scheduling for GPU-based heterogeneous clusters. In IEEE second international conference on cloud computing technology and science (CloudCom) (pp. 733–740).Google Scholar
- Shitara, A., Nakahama, T., Yamada, M., Kamata, T., Nishikawa, Y., Yoshimi, M., Amano, H. (2011). Vegeta: an implementation and evaluation of development-support middleware on multiple opencl platform. In IEEE second international conference on networking and computing (ICNC 2011) (pp. 141–147).Google Scholar
- Su, E., Tian, X., Girkar, M., Haab, G., Shah, S., Petersen, P. (2002). Compiler support of the workqueuing execution model for Intel SMP architectures. In Fourth European workshop on OpenMP.Google Scholar
- Sunaga, H., Yamato, Y., Ohnishi, H., Kaneko, M., Iio, M., Hirano, M. (2008). Service delivery platform architecture for the next-generation network, ICIN 2008, Session 9-A.Google Scholar
- Tanaka, Y., Miki, M., Yoshimi, M., Hiroyasu, T. (2011). Evaluation of optimization method for fortran codes with GPU automatic parallelization compiler. IPSJ SIG Technical Report, 2011(9), 1–6.Google Scholar
- Tomatsu, Y., Hiroyasu, T., Yoshimi, M., Miki, M. (2010). Gpot: intelligent compiler for GPGPU using combinatorial optimization techniques. In The 7th joint symposium between Doshisha University and Chonnam National University.Google Scholar
- Tron Project Web Site. (2018). http://www.tron.org/. Accessed 20 May 2019.
- Wienke, S., Springer, P., Terboven, C., an Mey, D. (2012). Open ACC-first experiences with real-world applications. Euro-Par 2012 Parallel Processing, pp. 859–870.Google Scholar
- Wolfe, M. (2010). Implementing the PGI accelerator model. In ACM the 3rd workshop on general-purpose computation on graphics processing units (pp. 43–50).Google Scholar
- Wuhib, F., Stadler, R., Lindgren, H. (2012). Dynamic resource allocation with management objectives - implementation for an OpenStack cloud. In 2012 8th international conference and 2012 workshop on systems virtualiztion management, Proceedings of Network and service management (pp. 309–315).Google Scholar
- Yamato, Y. (2007). Ubiquitous service composition technology for ubiquitous network environments. IPSJ Journal, 48(2), 562–577.Google Scholar
- Yamato, Y. (2015a). Use case study of HDD-SSD hybrid storage, distributed storage and HDD storage on OpenStack. In 19th international database engineering & applications symposium (IDEAS15) (pp. 228–229).Google Scholar
- Yamato, Y. (2015c). Automatic verification technology of software patches for user virtual environments on IaaS cloud, Journal of Cloud Computing, Springer, 2015, 4:4, https://doi.org/10.1186/s13677-015-0028-6.
- Yamato, Y. (2016b). Performance-aware server architecture recommendation and automatic performance verification technology on IaaS cloud, Service oriented computing and applications, Springer.Google Scholar
- Yamato, Y. (2017a). Server selection, configuration and reconfiguration technology for IaaS cloud with multiple server types, Journal of Network and Systems Management, Springer, https://doi.org/10.1007/s10922-017-9418-z.
- Yamato, Y., & Sunaga, H. (2007). Context-aware service composition and component change-over using semantic web techniques. In IEEE international conference on web services (ICWS 2007) (pp. 687–694).Google Scholar
- Yamato, Y., Tanaka, Y., Sunaga, H. (2006). Context-aware ubiquitous service composition technology. In The IFIP international conference on research and practical issues of enterprise information systems (CONFENIS 2006) (pp. 51–61).Google Scholar
- Yamato, Y., Ohnishi, H., Sunaga, H. (2008). Development of service control server for web-telecom coordination service. In IEEE international conference on web services (ICWS 2008) (pp. 600–607).Google Scholar
- Yamato, Y., Nishizawa, Y., Nagao, S., Sato, K. (2015a). Fast and reliable restoration method of virtual resources on OpenStack, IEEE Transactions on Cloud Computing, https://doi.org/10.1109/TCC.2015.2481392.
- Yamato, Y., Demizu, T., Noguchi, H., Kataoka, M. (2018a). Automatic GPU offloading technology for open IoT environment. IEEE Internet of Things Journal.Google Scholar
- Yamato, Y., Noguchi, H., Kataoka, M., Isoda, T., Demizu, T. (2018b). Proposal of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications. In The 3rd international conference on smart computing and communication (SmartCom 2018) (pp. 39–54).Google Scholar
- Yokohata, Y., Yamato, Y., Takemoto, M., Sunaga, H. (2006a). Service composition architecture for programmability and flexibility in ubiquitous communication networks. In IEEE international symposium on applications and the internet workshops (SAINTW’06) (pp. 142–145).Google Scholar
- Yokohata, Y., Yamato, Y., Takemoto, M., Tanaka, E., Nishiki, K. (2006b). Context-aware content-provision service for shopping malls based on ubiquitous Service-Oriented network framework and authentication and access control agent framework. In IEEE consumer communications and networking conference (CCNC 2006) (pp. 1330–1331).Google Scholar