# Parallel Computing Considerations

## Abstract

This chapter discusses the computational cost of machine learning model. To reduce its training time is a requisite of its industrial applications since a production process usually requires real-time responses. The commonly used method to accelerate the training process is to develop a parallel computing framework. In literature, two kinds of popular methods speeding up the training involves the one with a computer equipped with graphics processor unit (GPU) and the one with computer cluster including a number of computers. This chapter firstly introduces the basic ideas of GPU acceleration (e.g., the compute unified device architecture (CUDA) created by NVIDIA™) and the computer cluster framework (e.g., the MapReduce framework), then gives some specified examples of them. When training an EKF-based Elman network, the inversion operation of a Jacobian matrix is the most time-consuming procedure; a parallel computing strategy for such an operation is therefore proposed by using the CUDA-based GPU acceleration. Besides, with regard to the LSSVM modeling, a CUDA-based parallel PSO is then introduced for its hyper-parameters optimization. As for the computer cluster version, we design a parallelized EKF based on ESN by using MapReduce framework for acceleration. At the end, we also present a series of experimental analysis by using the practical energy data in steel industry to validate the performance of the accelerating approaches.

## Keywords

Parallel computing; Large datasets GPU Acceleration CUDA cuBLAS Computer cluster Hadoop MapReduce EKF Elman networks Matrix inversion Jacobian matrix Online optimization LSSVM PSO## References

- 1.Locans, U., Adelmann, A., Suter, A., et al. (2017). Real-time computation of parameter fitting and image reconstruction using graphical processing units.
*Computer Physics Communications, 215*, 71–80.MathSciNetCrossRefGoogle Scholar - 2.CUDA toolkit, develop, optimize and deploy GPU-accelerated apps. Retrieved from https://developer.nvidia.com/cuda-toolkit
- 3.Apache Hadoop 3.0.0. Retrieved from http://hadoop.apache.org/docs/current/
- 4.Ramírez-Gallego, S., Fernández, A., García, S., et al. (2018). Big data: Tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce.
*Information Fusion, 42*, 51–61.CrossRefGoogle Scholar - 5.Zhao, J., Zhu, X., Wang, W., et al. (2013). Extended Kalman filter-based Elman networks for industrial time series prediction with GPU acceleration.
*Neurocomputing, 118*(6), 215–224.CrossRefGoogle Scholar - 6.Heeswijk, M. V., Miche, Y., Oja, E., & Lendasse, A. (2011). GPU-accelerated and parallelized ELM ensembles for large-scale regression.
*Neurocomputing, 74*, 2430–2437.CrossRefGoogle Scholar - 7.Zhao, J., Wang, W., Pedrycz, W., et al. (2012). Online parameter optimization-based prediction for converter gas system by parallel strategies.
*IEEE Transactions on Control Systems Technology, 20*(3), 835–845.CrossRefGoogle Scholar - 8.Chapelle, O., & Vapnik, V. (2000). Model selection for support vector machines. In
*Advances in neural information processing systems*. Cambridge, MA: MIT Press.Google Scholar - 9.Van, G. T., Suykens, J., Baesens, B., et al. (2004). Benchmarking least squares support vector machine classifiers.
*Machine Learning, 54*(1), 5–32.CrossRefGoogle Scholar - 10.An, S., Liu, W., & Venkatesh, S. (2007). Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression.
*Pattern Recognition, 40*, 2154–2162.CrossRefGoogle Scholar - 11.Scholkopf, B., & Smola, A. J. (2002).
*Learning with kernels: Support vector machines, regularization, optimization, and beyond*. Cambridge, MA: MIT Press.Google Scholar - 12.Sheng, C., Zhao, J., Leung, H, et al. (2013). Extended Kalman filter based echo state network for time series prediction using MapReduce framework. In
*IEEE Ninth International Conference on Mobile Ad-Hoc and Sensor Networks*(pp. 175–180). IEEE.Google Scholar