Fine-Tuning an OpenMP-Based TVD–Hopmoc Method Using Intel® Parallel Studio XE Tools on Intel® Xeon® Architectures

  • Frederico L. Cabral
  • Carla OsthoffEmail author
  • Roberto P. Souto
  • Gabriel P. Costa
  • Sanderson L. Gonzaga de Oliveira
  • Diego Brandão
  • Mauricio Kischinhevsky
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 979)


This paper is concerned with parallelizing the TVD–Hopmoc method for numerical time integration of evolutionary differential equations. Using Intel® Parallel Studio XE tools, we studied three OpenMP implementations of the TVD–Hopmoc method (naive, CoP and EWS-Sync), with executions performed on Intel® Xeon® Many Integrated Core Architecture and Scalable processor. Our implementation, named EWS-Sync, defines an array that represents threads and the scheme consists of synchronizing only adjacent threads. Moreover, this approach reduces the OpenMP scheduling time by employing an explicit work-sharing strategy. Instead of permitting the OpenMP API to perform thread scheduling implicitly, this implementation of the 1-D TVD-Hopmoc method partitions among threads the array that represents the computational mesh of the numerical method. Thereby, this scheme diminishes the OpenMP spin time by avoiding barriers using an explicit synchronization mechanism where a thread only waits for its two adjacent threads. Numerical simulations show that this approach achieves promising performance gains in shared memory for multi-core and many-core environments.


OpenMP Xeon Phi High performance computing Parallel processing Advection–diffusion equation Thread synchronization 



CNPq, CAPES, and FAPERJ supported this work. We would like to thank the Núcleo de Computação Científica at Universidade Estadual Paulista (NCC/UNESP) for letting us execute our simulations on its heterogeneous multi-core cluster. These resources were partially funded by Intel® through the projects entitled Intel Parallel Computing Center, Modern Code Partner, and Intel/Unesp Center of Excellence in Machine Learning.


  1. 1.
    Holstad, A.: The Koren upwind scheme for variable gridsize. Appl. Numer. Math. 37, 459–487 (2001)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Oliveira, S.R.F., Gonzaga de Oliveira, S.L., Kischinhevsky, M.: Convergence analysis of the Hopmoc method. Int. J. Comput. Math. 86, 1375–1393 (2009)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Cabral, F.L., Osthoff, C., Costa, G., Gonzaga de Oliveira, S.L., Brandão, D.N., Kischinhevsky, M.: Tuning up TVD HOPMOC method on Intel MIC Xeon Phi architectures with Intel Parallel Studio Tools. In: Proceedings of the 8th Workshop on Applications for Multi-Core Architectures (2017)Google Scholar
  4. 4.
    Harten, A.: High resolution schemes for hyperbolic conservation laws. J. Comput. Phys. 49, 357–393 (1983)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Brandão, D.N., Gonzaga de Oliveira, S.L., Kischinhevsky, M., Osthoff, C., Cabral, F.: A total variation diminishing Hopmoc scheme for numerical time integration of evolutionary differential equations. In: Gervasi, O., et al. (eds.) ICCSA 2018, Part I. LNCS, vol. 10960, pp. 53–66. Springer, Cham (2018). Scholar
  6. 6.
    Cabral, F.L., Osthoff, C., Costa, G.P., Gonzaga de Oliveira, S.L., Brandão, D., Kischinhevsky, M.: An OpenMP implementation of the TVD–hopmoc method based on a synchronization mechanism using locks between adjacent threads on Xeon Phi (TM) accelerators. In: Shi, Y., et al. (eds.) ICCS 2018. LNCS, vol. 10862, pp. 701–707. Springer, Cham (2018). Scholar
  7. 7.
    Burton, F.W., Sleep, M.R.: Executing functional programs on a virtual tree of processors. In: Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture, Portsmouth, N.H., pp. 187–194. ACM, New York, October 1981Google Scholar
  8. 8.
    Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM (JACM) 46(5), 720–748 (1999)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Penna, P.H., Castro, M., Plentz, P., Freitas, H.C., Broquedis, F., Mehaut, J.F.: BinLPT: a novel worload-aware loop scheduler for irregular parallel loops. Braz. Simp. High Perfom. Comput. 11, 527–536 (2017)Google Scholar
  10. 10.
    Ma, H., Zhao, R., Gao, X., Zhang, Y.: Barrier optimization for OpenMP program. In: Proceedings of 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking, Parallel and Distributed Computing, pp. 495–500 (2009)Google Scholar
  11. 11.
    Caballero, D., Duran, A., Martorell, X.: An OpenMP* barrier using SIMD instructions for Intel® Xeon PhiTM coprocessor. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 99–113. Springer, Heidelberg (2013). Scholar
  12. 12.
    Cabral, F.L., Osthoff, C., Kischinhevsky, M., Brandão, D.: Hybrid MPI/OpenMP/OpenACC implementations for the solution of convection diffusion equations with Hopmoc method. In: Proceedings of 14th International Conference on Computational Science and Its Applications (ICCSA), pp. 196–199 (2014)Google Scholar
  13. 13.
    Intel. Clockticks per Instructions Retired (CPI). Accessed 30 Nov 2017

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Frederico L. Cabral
    • 1
  • Carla Osthoff
    • 1
    Email author
  • Roberto P. Souto
    • 1
  • Gabriel P. Costa
    • 1
  • Sanderson L. Gonzaga de Oliveira
    • 2
  • Diego Brandão
    • 3
  • Mauricio Kischinhevsky
    • 4
  1. 1.Laboratório Nacional de Computação Científica - LNCCPetrópolis-RJBrazil
  2. 2.Universidade Federal de Lavras - UFLALavras-MGBrazil
  3. 3.Centro Federal de Educação Tecnológica Celso Suckow da Fonseca - CEFET-RJRio de JaneiroBrazil
  4. 4.Universidade Federal Fluminense - UFFNiterói-RJBrazil

Personalised recommendations