OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms

  • Zhaoyi MengEmail author
  • Alice KonigesEmail author
  • Yun (Helen) He
  • Samuel Williams
  • Thorsten Kurth
  • Brandon Cook
  • Jack Deslippe
  • Andrea L. Bertozzi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9903)


We investigate the OpenMP parallelization and optimization of two novel data classification algorithms. The new algorithms are based on graph and PDE solution techniques and provide significant accuracy and performance advantages over traditional data classification algorithms in serial mode. The methods leverage the Nystrom extension to calculate eigenvalue/eigenvectors of the graph Laplacian and this is a self-contained module that can be used in conjunction with other graph-Laplacian based methods such as spectral clustering. We use performance tools to collect the hotspots and memory access of the serial codes and use OpenMP as the parallelization language to parallelize the most time-consuming parts. Where possible, we also use library routines. We then optimize the OpenMP implementations and detail the performance on traditional supercomputer nodes (in our case a Cray XC30), and test the optimization steps on emerging testbed systems based on Intel’s Knights Corner and Landing processors. We show both performance improvement and strong scaling behavior. A large number of optimization techniques and analyses are necessary before the algorithm reaches almost ideal scaling.


Semi-supervised Unsupervised Data Algorithms OpenMP Optimization 



This work was supported by NSF grants DMS-1417674 and DMS-1045536 and AFOSR MURI grant FA9550-10-1-0569. We would like to thank Dr. Da Kuang for his suggestions on optimizing the serial codes. This work was also supported by U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.


  1. 1.
    Meng, Z., Merkurjev, E., Koniges, A., Bertozzi, A.L.: Hyperspectral Video Analysis Using Graph Clustering Methods. Image Processing On Line, submittedGoogle Scholar
  2. 2.
    Stoer, M., Wagner, F.: A simple min-cut algorithm. J. ACM (JACM) 44(4), 585–591 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Szlam, A., Bresson, X.: A total variation-based graph clustering algorithm for cheeger ratio cuts. UCLA CAM Report, pp. 09–68 (2009)Google Scholar
  4. 4.
    Bertozzi, A.L., Flenner, A.: Diffuse interface models on graphs for classification of high dimensional data. SIAM Rev. 58(2), 293–328 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Chung, F.: Spectral Graph Theory, vol. 92. American Mathematical Society, Providence (1997)zbMATHGoogle Scholar
  6. 6.
    Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Van Gennip, Y., Bertozzi, A.L.: \( Gamma \)-convergence of graph Ginzburg-Landau functionals. Adv. Differ. Equ. 17(11/12), 1115–1180 (2012)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Bertozzi, A.L., Flenner, A.: Diffuse interface models on graphs for classification of high dimensional data. Multiscale Model. Simul. 10(3), 1090–1118 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Luo, X., Bertozzi, A.L.: Convergence analysis of the graph Allen-Cahn scheme. PreprintGoogle Scholar
  10. 10.
    Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the Nyström method. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 214–225 (2004)CrossRefGoogle Scholar
  11. 11.
    Merkurjev, E., Kostic, T., Bertozzi, A.L.: An MBO scheme on graphs for classification and image processing. SIAM J. Imaging Sci. 6(4), 1903–1930 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Merkurjev, E., Bae, E., Bertozzi, A.L., Tai, X.C.: Global binary optimization on graphs for classification of high-dimensional data. J. Math. Imaging Vis. 52(3), 414–435Google Scholar
  13. 13.
    Hu, H., Sunu, J., Bertozzi, A.L.: Multi-class graph Mumford-Shah model for plume detection using the MBO scheme. In: Tai, X.-C., Bae, E., Chan, T.F., Lysaker, M. (eds.) EMMCVPR 2015. LNCS, vol. 8932, pp. 209–222. Springer, Heidelberg (2015)Google Scholar
  14. 14.
    Kuang, D., Gittens, A., Hamid, R.: Hardware compliant approximate image codes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  15. 15.
    Demmel, J.W.: Applied Numerical Linear Algebra. Siam, Philadelphia (1997)CrossRefzbMATHGoogle Scholar
  16. 16.
    Broadwater, J.B., Limsui, D., Carr, A.K.: A primer for chemical plume detection using LWIR sensors. Technical Paper, National Security Technology Department, Las Vegas, NV (2011)Google Scholar
  17. 17.
    Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRefGoogle Scholar
  18. 18.
  19. 19.
  20. 20.
    Doerfler, D.: Understanding Application Data Movement Characteristics using Intel VTune Amplifier and Software Development Emulator tools, Intel Xeon Phi User Group (IXPUG) (2015)Google Scholar
  21. 21.
  22. 22.

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Zhaoyi Meng
    • 1
    • 2
    Email author
  • Alice Koniges
    • 2
    Email author
  • Yun (Helen) He
    • 2
  • Samuel Williams
    • 2
  • Thorsten Kurth
    • 2
  • Brandon Cook
    • 2
  • Jack Deslippe
    • 2
  • Andrea L. Bertozzi
    • 1
  1. 1.University of CaliforniaLos AngelesUSA
  2. 2.Lawrence Berkeley National LaboratoryBerkeleyUSA

Personalised recommendations