Parallelization of the DIANA Algorithm in OpenMP

  • Hethini Ribeiro
  • Roberta SpolonEmail author
  • Aleardo ManaceroJr.
  • Renata S. Lobato
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 931)


Global data production has been increasing by approximately 40% per year since the beginning of the last decade. These large datasets, also called Big Data, are posing great challenges in many areas and in particular in the Machine Learning (ML) field. Although ML algorithms are able to extract useful information from these large data repositories, they are computationally expensive such as AGNES and DIANA, which have O(n) and O(2n) complexity, respectively. Therefore, the big challenge is to process large amounts of data in a realistic time frame. In this context, this paper proposes the parallelization of the DIANA OpenMP algorithm. Initial tests with a database with 5000 elements presented a speed up of 5,2521. It is believed that, according to Gustafson’s law, for a larger database the results will also be larger.


Machine learning Parallelization DIANA OpenMP 


  1. 1.
    Bell, J.: Machine Learning: Hands-On for Developers and Technical Professionals. Wiley, Hoboken (2015)Google Scholar
  2. 2.
    Lopes, N., Ribeiro, B., Machine learning for adaptive many-core machines: a practical approach (2015)Google Scholar
  3. 3.
    Pacheco, P.S.: An Introduction to Parallel Programming. Morgan Kaufmann Publishers, Burlington (2011)Google Scholar
  4. 4.
    Danalis, A., Mccurdy, C., Vetter, J.S.: Efficient Quality Threshold Clustering for Parallel Architectures (2012)Google Scholar
  5. 5.
    Bhimani, J., Leeser, M., Mi, N.: Accelerating K-means clustering with parallel implementations and GPU computing. In: High Performance Extreme Computing Conference (HPEC) (2015)Google Scholar
  6. 6.
    Naik, D.S.B., Kumar, S.D., Ramakrishna, S.V.: Parallel Processing of enhanced K-means using OpenMP (2014)Google Scholar
  7. 7.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Hoboken (1990)CrossRefGoogle Scholar
  8. 8.
    Johnson, S.: Hierarchical clustering schemes. Psychometrika (1967)Google Scholar
  9. 9.
    Gustafson, J.L.: Reevaluating Amdahl’s Law. Communications of the ACM, Technical Note (1988)CrossRefGoogle Scholar
  10. 10.
    Fränti, P., Rezaei, M., Zhao, Q.: Centroid index: cluster level similarity measure. Patt. Recogn. 47, 3034–3045 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Hethini Ribeiro
    • 1
  • Roberta Spolon
    • 1
    Email author
  • Aleardo ManaceroJr.
    • 2
  • Renata S. Lobato
    • 2
  1. 1.Computer DepartmentUniversidade Estadual Paulista “Júlio de Mesquita Filho” (UNESP)BauruBrazil
  2. 2.Department of Computer Science and StatisticsUniversidade Estadual Paulista “Júlio de Mesquita Filho” (UNESP)São José do Rio PretoBrazil

Personalised recommendations