Skip to main content

Parallelization of the DIANA Algorithm in OpenMP

  • Conference paper
  • First Online:
Parallel and Distributed Computing, Applications and Technologies (PDCAT 2018)

Abstract

Global data production has been increasing by approximately 40% per year since the beginning of the last decade. These large datasets, also called Big Data, are posing great challenges in many areas and in particular in the Machine Learning (ML) field. Although ML algorithms are able to extract useful information from these large data repositories, they are computationally expensive such as AGNES and DIANA, which have O(n) and O(2n) complexity, respectively. Therefore, the big challenge is to process large amounts of data in a realistic time frame. In this context, this paper proposes the parallelization of the DIANA OpenMP algorithm. Initial tests with a database with 5000 elements presented a speed up of 5,2521. It is believed that, according to Gustafson’s law, for a larger database the results will also be larger.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bell, J.: Machine Learning: Hands-On for Developers and Technical Professionals. Wiley, Hoboken (2015)

    Google Scholar 

  2. Lopes, N., Ribeiro, B., Machine learning for adaptive many-core machines: a practical approach (2015)

    Google Scholar 

  3. Pacheco, P.S.: An Introduction to Parallel Programming. Morgan Kaufmann Publishers, Burlington (2011)

    Google Scholar 

  4. Danalis, A., Mccurdy, C., Vetter, J.S.: Efficient Quality Threshold Clustering for Parallel Architectures (2012)

    Google Scholar 

  5. Bhimani, J., Leeser, M., Mi, N.: Accelerating K-means clustering with parallel implementations and GPU computing. In: High Performance Extreme Computing Conference (HPEC) (2015)

    Google Scholar 

  6. Naik, D.S.B., Kumar, S.D., Ramakrishna, S.V.: Parallel Processing of enhanced K-means using OpenMP (2014)

    Google Scholar 

  7. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Hoboken (1990)

    Book  Google Scholar 

  8. Johnson, S.: Hierarchical clustering schemes. Psychometrika (1967)

    Google Scholar 

  9. Gustafson, J.L.: Reevaluating Amdahl’s Law. Communications of the ACM, Technical Note (1988)

    Article  Google Scholar 

  10. Fränti, P., Rezaei, M., Zhao, Q.: Centroid index: cluster level similarity measure. Patt. Recogn. 47, 3034–3045 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roberta Spolon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ribeiro, H., Spolon, R., Manacero, A., Lobato, R.S. (2019). Parallelization of the DIANA Algorithm in OpenMP. In: Park, J., Shen, H., Sung, Y., Tian, H. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2018. Communications in Computer and Information Science, vol 931. Springer, Singapore. https://doi.org/10.1007/978-981-13-5907-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-5907-1_18

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-5906-4

  • Online ISBN: 978-981-13-5907-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics