Parallelization of the DIANA Algorithm in OpenMP

Ribeiro, Hethini; Spolon, Roberta; Manacero, Aleardo; Lobato, Renata S.

doi:10.1007/978-981-13-5907-1_18

Hethini Ribeiro¹²,
Roberta Spolon¹²,
Aleardo Manacero Jr.¹³ &
…
Renata S. Lobato¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 931))

Included in the following conference series:

International Conference on Parallel and Distributed Computing: Applications and Technologies

772 Accesses

Abstract

Global data production has been increasing by approximately 40% per year since the beginning of the last decade. These large datasets, also called Big Data, are posing great challenges in many areas and in particular in the Machine Learning (ML) field. Although ML algorithms are able to extract useful information from these large data repositories, they are computationally expensive such as AGNES and DIANA, which have O(n) and O(2ⁿ) complexity, respectively. Therefore, the big challenge is to process large amounts of data in a realistic time frame. In this context, this paper proposes the parallelization of the DIANA OpenMP algorithm. Initial tests with a database with 5000 elements presented a speed up of 5,2521. It is believed that, according to Gustafson’s law, for a larger database the results will also be larger.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bell, J.: Machine Learning: Hands-On for Developers and Technical Professionals. Wiley, Hoboken (2015)
Google Scholar
Lopes, N., Ribeiro, B., Machine learning for adaptive many-core machines: a practical approach (2015)
Google Scholar
Pacheco, P.S.: An Introduction to Parallel Programming. Morgan Kaufmann Publishers, Burlington (2011)
Google Scholar
Danalis, A., Mccurdy, C., Vetter, J.S.: Efficient Quality Threshold Clustering for Parallel Architectures (2012)
Google Scholar
Bhimani, J., Leeser, M., Mi, N.: Accelerating K-means clustering with parallel implementations and GPU computing. In: High Performance Extreme Computing Conference (HPEC) (2015)
Google Scholar
Naik, D.S.B., Kumar, S.D., Ramakrishna, S.V.: Parallel Processing of enhanced K-means using OpenMP (2014)
Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Hoboken (1990)
Book Google Scholar
Johnson, S.: Hierarchical clustering schemes. Psychometrika (1967)
Google Scholar
Gustafson, J.L.: Reevaluating Amdahl’s Law. Communications of the ACM, Technical Note (1988)
Article Google Scholar
Fränti, P., Rezaei, M., Zhao, Q.: Centroid index: cluster level similarity measure. Patt. Recogn. 47, 3034–3045 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Department, Universidade Estadual Paulista “Júlio de Mesquita Filho” (UNESP), Bauru, SP, Brazil
Hethini Ribeiro & Roberta Spolon
Department of Computer Science and Statistics, Universidade Estadual Paulista “Júlio de Mesquita Filho” (UNESP), São José do Rio Preto, SP, Brazil
Aleardo Manacero Jr. & Renata S. Lobato

Authors

Hethini Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Roberta Spolon
View author publications
You can also search for this author in PubMed Google Scholar
Aleardo Manacero Jr.
View author publications
You can also search for this author in PubMed Google Scholar
Renata S. Lobato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roberta Spolon .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Seoul National University of Science and Technology, Seoul, Korea (Republic of)
Jong Hyuk Park
School of Computer Science, University of Adelaide, Adelaide, SA, Australia
Hong Shen
Department of Multimedia Engineering, Dongguk University, Seoul, Korea (Republic of)
Yunsick Sung
School of ICT, Griffith University, Gold Coast, Australia
Hui Tian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ribeiro, H., Spolon, R., Manacero, A., Lobato, R.S. (2019). Parallelization of the DIANA Algorithm in OpenMP. In: Park, J., Shen, H., Sung, Y., Tian, H. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2018. Communications in Computer and Information Science, vol 931. Springer, Singapore. https://doi.org/10.1007/978-981-13-5907-1_18

Download citation

DOI: https://doi.org/10.1007/978-981-13-5907-1_18
Published: 08 February 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-5906-4
Online ISBN: 978-981-13-5907-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics