Abstract
Nowadays there are studies in different fields aimed to extract relevant information on trends, challenges and opportunities; all these studies have something in common: they work with large volumes of data. This work analyzes different studies carried out on the use of Machine Learning (ML) for processing large volumes of data (Big Data). Most of these datasets, are complex and come from various sources with structured or unstructured data. For this reason, it is necessary to find mechanisms that allow classification and, in a certain way, organize them to facilitate to the users the extraction of the required information. The processing of these data requires the use of classification techniques that will also be reviewed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Simulated Annealing.
- 2.
Tabu search.
References
Agrawal, A.: Global K-means (GKM) clustering algorithm: a survey. Int. J. Comput. Appl. 79(2), 20–24 (2013)
Al-Jarrah, O.Y., Yoo, P.D., Muhaidat, S., Karagiannidis, G.K., Taha, K.: Efficient machine learning for big data: a review. Big Data Res. 2(3), 87–93 (2015)
Al Malki, A., Rizk, M.M., El-Shorbagy, M.A., Mousa, A.A., Malki, A.A., Rizk, M.M., Mousa, A.A., Mousa, A.A.: Hybrid genetic algorithm with K-means for clustering problems. Open J. Optim. 5(02), 71 (2016)
Al-Sultana, K.S., Khan, M.M.: Computational experience on four algorithms for the hard clustering problem. Pattern Recogn. Lett. 17(3), 295–308 (1996)
Arellano-Verdejo, J., Alba, E., Godoy-Calderon, S.: Efficiently finding the optimum number of clusters in a dataset with a new hybrid differential evolution algorithm: DELA. Soft. Comput. 20(3), 895–905 (2016)
Backlund, H., Hedblom, A., Neijman, N.: A density-based spatial clustering of application with noise. Data Mining TNM033, pp. 11–30 (2011)
Bobadilla, J., Ortega, F., Hernando, A., de Rivera, G.G.: A similarity metric designed to speed up, using hardware, the recommender systems k-nearest neighbors algorithm. Knowl.-Based Syst. 51, 27–34 (2013)
Cai, X., Nie, F., Huang, H.: Multi-view K-means clustering on big data. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 2598–2604 (2013)
De Carvalho, F.A.T.: Fuzzy c-means clustering methods for symbolic interval data. Pattern Recogn. Lett. 28(4), 423–437 (2007)
Cui, X., Potok, T.E.: Document clustering analysis based on hybrid PSO + K-means algorithm. J. Comput. Sci. 27(special issue), 33 (2005)
Dai, W., Ji, W.: A MapReduce implementation of C4. 5 decision tree algorithm. Int. J. Database Theory Appl. 7(1), 49–60 (2014)
Pascual, D., Pla, F., Sánchez, J.S.: A density-based hierarchical clustering algorithm for highly overlapped distributions with noisy points. In: CCIA, vol. 220, pp. 183–192 (2010)
Derrac, J., Chiclana, F., García, S., Herrera, F.: Evolutionary fuzzy k-nearest neighbors algorithm using interval-valued fuzzy sets. Inf. Sci. 329, 144–163 (2016)
Fan, W., Bifet, A.: Mining big data : current status, and forecast to the future. ACM SIGKDD Explor. Newsl. 14(2), 1–5 (2013)
Feng, X., Wang, Z., Yin, G., Wang, Y.: PSO-based DBSCAN with obstacle constraints. J. Theor. Appl. Inf. Technol. 46(1), 377–383 (2012)
Hatamlou, A.: Black hole: a new heuristic optimization approach for data clustering. Inf. Sci. 222, 175–184 (2013)
Ho, R.: Big data machine learning: patterns for predictive analytics. DZone Refcardz 158, 1–6 (2012)
Jadhav, D.K.: Big data: the new challenges in data mining. Int. J. Innov. Res. Comput. Sci. Technol. 1(2), 39–42 (2013)
Jain, R.: A hybrid clustering algorithm for data mining, pp. 387–393 (2012). arXiv preprint arXiv:1205.5353
Jiang, M., Ding, Y., Goertzel, B., Huang, Z., Zhou, C., Chao, F.: Improving machine vision via incorporating expectation-maximization into deep spatio-temporal learning. In: Proceedings of the International Joint Conference on Neural Networks, pp. 1804–1811 (2014)
Jin, H., Shum, W.-H., Leung, K.-S., Wong, M.-L.: Expanding self-organizing map for data visualization and cluster analysis. Inf. Sci. 163(1–3), 157–173 (2004)
Kohonen, T.: Essentials of the self-organizing map. Neural Netw. 37, 52–65 (2013)
Liu, X., Lathauwer, L., Janssens, F., Moor, B.: Hybrid clustering of multiple information sources via HOSVD. In: Zhang, L., Lu, B.-L., Kwok, J. (eds.) ISNN 2010. LNCS, vol. 6064, pp. 337–345. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13318-3_42
Luo, W., Nguyen, T., Nichols, M., Tran, T., Rana, S., Gupta, S., Phung, D., Venkatesh, S., Allender, S.: Is demography destiny? application of machine learning techniques to accurately predict population health outcomes from a minimal demographic dataset. PLoS ONE 10(5), e0125602 (2015)
Mishra, S.K., Raghavan, V.V.: An empirical study of the performance of heuristic methods for clustering. In: Pattern Recognition in Practice IV - Multiple Paradigms, Comparative Studies and Hybrid Systems, pp. 425–436. Elsevier BV (1994)
Mujeeb, S., Naidu, L.K.: A relative study on big data applications and techniques. Int. J. Eng. Innov. Technol. (IJEIT) 4(10), 133–138 (2015)
Murugesan, K., Jun, Z.: Hybrid bisect K-means clustering algorithm. In: International Conference on Business Computing and Global Informatization (BCGIN), pp. 216–219. IEEE (2011)
Niknam, T., Fard, E.T., Pourjafarian, N., Rousta, A.: An efficient hybrid algorithm based on modified imperialist competitive algorithm and k-means for data clustering. Eng. Appl. Artif. Intell. 24(2), 306–317 (2011)
Park, H.-S., Jun, C.-H.: A simple and fast algorithm for k-medoids clustering. Expert Syst. Appl. 36(2), 3336–3341 (2009)
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data. ACM SIGKDD Explor. Newsl. 6(1), 90–105 (2004)
Qi, Z., Tian, Y., Shi, Y.: Robust twin support vector machine for pattern classification. Pattern Recogn. 46(1), 305–316 (2013)
Rebentrost, P., Mohseni, M., Lloyd, S.: Quantum support vector machine for big data classification. Phys. Rev. Lett. 113(3), 1–5 (2014)
Roy, D.K., Sharma, L.K.: Genetic k-Means clustering algorithm for mixed numeric and categorical data sets. Int. J. Artif. Intell. Appl. 1, 23–28 (2010)
Ruiz, R., Riquelme, J.C., Aguilar-Ruiz, J.S., García-Torres, M.: Fast feature selection aimed at high-dimensional data via hybrid-sequential-ranked searches. Expert Syst. Appl. 39(12), 11094–11102 (2012)
Sheng, W., Liu, X.: A genetic k-medoids clustering algorithm. J. Heuristics 12(6), 447–466 (2006)
Shim, K.: MapReduce algorithms for big data analysis. In: Madaan, A., Kikuchi, S., Bhalla, S. (eds.) DNIS 2013. LNCS, vol. 7813, pp. 44–48. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37134-9_3
Tsai, M.-C., Chen, K.-H., Su, C.-T., Lin, H.-C.: An Application of PSO algorithm and decision tree for medical problem. In: 2nd Internatonal Conference on Intelligent Computational System, pp. 124–126 (2012)
van der Laan, M.J., Pollard, K.S.: A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap. J. Stat. Plann. Infer. 117, 275–303 (2003)
Venkatesh, H., Perur, S.D., Jalihal, N.: A study on use of big data in cloud computing environment. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 6(3), 2076–2078 (2015)
Wu, X., Zhu, X., Wu, G.-Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)
Xu, X., Ester, M., Kriegel, H.-P., Sander, J.: A distribution-based clustering algorithm for mining in large spatial databases. In: 14th International Conference on Data Engineering ( ICDE 1998) (1998)
Yang, F., Sun, T., Zhang, C.: An efficient hybrid data clustering method based on k-harmonic means and particle swarm optimization. Expert Syst. Appl. 36(6), 9847–9852 (2009)
Yang, Y., Liao, Y., Meng, G., Lee, J.: A hybrid feature selection scheme for unsupervised learning and its application in bearing fault diagnosis. Expert Syst. Appl. 38(9), 11311–11320 (2011)
Zhang, H., Berg, A.C., Maire, M., Malik, J.: SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recogn. 2, 2126–2136 (2006)
Acknowledgements
This work has been funded by the Spanish Government TIN2016-76515-R grant for the COMBAHO project, supported with Feder funds.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ruiz, Z., Salvador, J., Garcia-Rodriguez, J. (2017). A Survey of Machine Learning Methods for Big Data. In: Ferrández Vicente, J., Álvarez-Sánchez, J., de la Paz López, F., Toledo Moreo, J., Adeli, H. (eds) Biomedical Applications Based on Natural and Artificial Computing. IWINAC 2017. Lecture Notes in Computer Science(), vol 10338. Springer, Cham. https://doi.org/10.1007/978-3-319-59773-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-59773-7_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59772-0
Online ISBN: 978-3-319-59773-7
eBook Packages: Computer ScienceComputer Science (R0)