Abstract
Big data is called to a large or complex data from traditional ones, which is unstructured in many case. Accessing to a specific value in a huge data that is not sorted or organized can be time consuming and require a high processing. With growing of data, clustering can be a most important unsupervised approach that finds a structure for data. In this paper, we demonstrate two approaches to cluster data with high accuracy, and then we sort data by implementing merge sort algorithm finally, we use binary search to find a data value point in a specific range of data. This research presents a high value efficiency combo method in big data by using genetic and k-means. After clustering with k-means total sum of the Euclidean distances is 3.37233e+09 for 4 clusters, and after genetic algorithm this number reduce to 0.0300344 in the best fit. In the second and third stage we show that after this implementation, we can access to a particular data much faster and accurate than other older methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tian, W.D. and Y.D. Zhao, Optimized Cloud Resource Management and Scheduling: Theories and Practices. 2014: Morgan Kaufmann.
Gupta, R., H. Gupta, and M. Mohania, Cloud Computing and Big Data Analytics: What Is New from Databases Perspective?, in Big Data Analytics. 2012, Springer. p. 42–61.
Hashem, I.A.T., et al., The rise of “big data” on cloud computing: Review and open research issues. Information Systems, 2015. 47: p. 98–115.
Fadiya, S.O., S. Saydam, and V.V. Zira, Advancing big data for humanitarian needs. Procedia Engineering, 2014. 78: p. 88–95.
Young, S.D., A “big data” approach to HIV epidemiology and prevention. Preventive medicine, 2015. 70: p. 17–18.
Liu, Z.-g., et al., Credal c-means clustering method based on belief functions. Knowledge-Based Systems, 2015. 74: p. 119–132.
Jain, A.K., Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 2010. 31(8): p. 651–666.
Ebadati E, O.M. and S. Babaie, Implementation of Two Stages k-Means Algorithm to Apply a Payment System Provider Framework in Banking Systems, in Artificial Intelligence Perspectives and Applications, R. Silhavy, et al., Editors. 2015, Springer International Publishing. p. 203–213.
Liu, Y., X. Wu, and Y. Shen, Automatic clustering using genetic algorithms. Applied Mathematics and Computation, 2011. 218(4): p. 1267–1279.
Razavi, S., et al., An Efficient Grouping Genetic Algorithm for Data Clustering and Big Data Analysis, in Computational Intelligence for Big Data Analysis, Springer International Publishing. 2015, p. 119–142.
Ebadati E., O.M., et al., Impact of genetic algorithm for meta-heuristic methods to solve multi depot vehicle routing problems with time windows. Ciencia e Tecnica, A Science and Technology, 2014. 29(7): p. 9.
Barthélemy, J.-P. and F. Brucker, Binary clustering. Discrete Applied Mathematics, 2008. 156(8): p. 1237–1250.
Alzate, C. and J.A. Suykens, Hierarchical kernel spectral clustering. Neural Networks, 2012. 35: p. 21–30.
Rahman, M.A. and M.Z. Islam, A hybrid clustering technique combining a novel genetic algorithm with K-Means. Knowledge-Based Systems, 2014. 71: p. 345–365.
Villalba, L.J.G., A.L.S. Orozco, and J.R. Corripio, Smartphone image clustering. Expert Systems with Applications, 2015. 42(4): p. 1927–1940.
Yu, J., et al., Image clustering based on sparse patch alignment framework. Pattern Recognition, 2014.
Adhau, S., R. Moharil, and P. Adhau, K-Means clustering technique applied to availability of micro hydro power. Sustainable Energy Technologies and Assessments, 2014. 8: p. 191–201.
Pavithra, M. and V.M. Aradhya, A comprehensive of transforms, Gabor filter and k-means clustering for text detection in images and video. Applied Computing and Informatics, 2014.
Yao, M., D. Pi, and X. Cong, Chinese text clustering algorithm based k-means. Physics Procedia, 2012. 33: p. 301–307.
Lipschutz, S., Data Structures With C (Sie) (Sos). Vol. 4.19–4.27. McGraw-Hill Education (India) Pvt Limited.
Hatamlou, A., In search of optimal centroids on data clustering using a binary search algorithm. Pattern Recognition Letters, 2012. 33(13): p. 1756–1760.
UCI Machine Learning Repository: Perfume Data Data Set. 2002–2003 cited 2015; Available from: https://archive.ics.uci.edu/ml/datasets/Perfume+Data.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer India
About this paper
Cite this paper
Ebadati, E.O.M., Tabrizi, M.M. (2016). A Hybrid Clustering Technique to Improve Big Data Accessibility Based on Machine Learning Approaches. In: Satapathy, S., Mandal, J., Udgata, S., Bhateja, V. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 433. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2755-7_43
Download citation
DOI: https://doi.org/10.1007/978-81-322-2755-7_43
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2753-3
Online ISBN: 978-81-322-2755-7
eBook Packages: EngineeringEngineering (R0)