Abstract
Global data production has been increasing by approximately 40% per year since the beginning of the last decade. These large datasets, also called Big Data, are posing great challenges in many areas and in particular in the Machine Learning (ML) field. Although ML algorithms are able to extract useful information from these large data repositories, they are computationally expensive such as AGNES and DIANA, which have O(n) and O(2n) complexity, respectively. Therefore, the big challenge is to process large amounts of data in a realistic time frame. In this context, this paper proposes the parallelization of the DIANA OpenMP algorithm. Initial tests with a database with 5000 elements presented a speed up of 5,2521. It is believed that, according to Gustafson’s law, for a larger database the results will also be larger.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bell, J.: Machine Learning: Hands-On for Developers and Technical Professionals. Wiley, Hoboken (2015)
Lopes, N., Ribeiro, B., Machine learning for adaptive many-core machines: a practical approach (2015)
Pacheco, P.S.: An Introduction to Parallel Programming. Morgan Kaufmann Publishers, Burlington (2011)
Danalis, A., Mccurdy, C., Vetter, J.S.: Efficient Quality Threshold Clustering for Parallel Architectures (2012)
Bhimani, J., Leeser, M., Mi, N.: Accelerating K-means clustering with parallel implementations and GPU computing. In: High Performance Extreme Computing Conference (HPEC) (2015)
Naik, D.S.B., Kumar, S.D., Ramakrishna, S.V.: Parallel Processing of enhanced K-means using OpenMP (2014)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Hoboken (1990)
Johnson, S.: Hierarchical clustering schemes. Psychometrika (1967)
Gustafson, J.L.: Reevaluating Amdahl’s Law. Communications of the ACM, Technical Note (1988)
Fränti, P., Rezaei, M., Zhao, Q.: Centroid index: cluster level similarity measure. Patt. Recogn. 47, 3034–3045 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ribeiro, H., Spolon, R., Manacero, A., Lobato, R.S. (2019). Parallelization of the DIANA Algorithm in OpenMP. In: Park, J., Shen, H., Sung, Y., Tian, H. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2018. Communications in Computer and Information Science, vol 931. Springer, Singapore. https://doi.org/10.1007/978-981-13-5907-1_18
Download citation
DOI: https://doi.org/10.1007/978-981-13-5907-1_18
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-5906-4
Online ISBN: 978-981-13-5907-1
eBook Packages: Computer ScienceComputer Science (R0)