A Survey of Machine Learning Methods for Big Data

Ruiz, Zoila; Salvador, Jaime; Garcia-Rodriguez, Jose

doi:10.1007/978-3-319-59773-7_27

Zoila Ruiz¹⁸,
Jaime Salvador¹⁸ &
Jose Garcia-Rodriguez¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10338))

Included in the following conference series:

International Work-Conference on the Interplay Between Natural and Artificial Computation

2161 Accesses
4 Citations

Abstract

Nowadays there are studies in different fields aimed to extract relevant information on trends, challenges and opportunities; all these studies have something in common: they work with large volumes of data. This work analyzes different studies carried out on the use of Machine Learning (ML) for processing large volumes of data (Big Data). Most of these datasets, are complex and come from various sources with structured or unstructured data. For this reason, it is necessary to find mechanisms that allow classification and, in a certain way, organize them to facilitate to the users the extraction of the required information. The processing of these data requires the use of classification techniques that will also be reviewed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Simulated Annealing.
2.
Tabu search.

References

Agrawal, A.: Global K-means (GKM) clustering algorithm: a survey. Int. J. Comput. Appl. 79(2), 20–24 (2013)
Google Scholar
Al-Jarrah, O.Y., Yoo, P.D., Muhaidat, S., Karagiannidis, G.K., Taha, K.: Efficient machine learning for big data: a review. Big Data Res. 2(3), 87–93 (2015)
Article Google Scholar
Al Malki, A., Rizk, M.M., El-Shorbagy, M.A., Mousa, A.A., Malki, A.A., Rizk, M.M., Mousa, A.A., Mousa, A.A.: Hybrid genetic algorithm with K-means for clustering problems. Open J. Optim. 5(02), 71 (2016)
Article Google Scholar
Al-Sultana, K.S., Khan, M.M.: Computational experience on four algorithms for the hard clustering problem. Pattern Recogn. Lett. 17(3), 295–308 (1996)
Article Google Scholar
Arellano-Verdejo, J., Alba, E., Godoy-Calderon, S.: Efficiently finding the optimum number of clusters in a dataset with a new hybrid differential evolution algorithm: DELA. Soft. Comput. 20(3), 895–905 (2016)
Article Google Scholar
Backlund, H., Hedblom, A., Neijman, N.: A density-based spatial clustering of application with noise. Data Mining TNM033, pp. 11–30 (2011)
Google Scholar
Bobadilla, J., Ortega, F., Hernando, A., de Rivera, G.G.: A similarity metric designed to speed up, using hardware, the recommender systems k-nearest neighbors algorithm. Knowl.-Based Syst. 51, 27–34 (2013)
Article Google Scholar
Cai, X., Nie, F., Huang, H.: Multi-view K-means clustering on big data. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 2598–2604 (2013)
Google Scholar
De Carvalho, F.A.T.: Fuzzy c-means clustering methods for symbolic interval data. Pattern Recogn. Lett. 28(4), 423–437 (2007)
Article Google Scholar
Cui, X., Potok, T.E.: Document clustering analysis based on hybrid PSO + K-means algorithm. J. Comput. Sci. 27(special issue), 33 (2005)
Google Scholar
Dai, W., Ji, W.: A MapReduce implementation of C4. 5 decision tree algorithm. Int. J. Database Theory Appl. 7(1), 49–60 (2014)
Article Google Scholar
Pascual, D., Pla, F., Sánchez, J.S.: A density-based hierarchical clustering algorithm for highly overlapped distributions with noisy points. In: CCIA, vol. 220, pp. 183–192 (2010)
Google Scholar
Derrac, J., Chiclana, F., García, S., Herrera, F.: Evolutionary fuzzy k-nearest neighbors algorithm using interval-valued fuzzy sets. Inf. Sci. 329, 144–163 (2016)
Article Google Scholar
Fan, W., Bifet, A.: Mining big data : current status, and forecast to the future. ACM SIGKDD Explor. Newsl. 14(2), 1–5 (2013)
Article Google Scholar
Feng, X., Wang, Z., Yin, G., Wang, Y.: PSO-based DBSCAN with obstacle constraints. J. Theor. Appl. Inf. Technol. 46(1), 377–383 (2012)
Google Scholar
Hatamlou, A.: Black hole: a new heuristic optimization approach for data clustering. Inf. Sci. 222, 175–184 (2013)
Article MathSciNet Google Scholar
Ho, R.: Big data machine learning: patterns for predictive analytics. DZone Refcardz 158, 1–6 (2012)
Google Scholar
Jadhav, D.K.: Big data: the new challenges in data mining. Int. J. Innov. Res. Comput. Sci. Technol. 1(2), 39–42 (2013)
MathSciNet Google Scholar
Jain, R.: A hybrid clustering algorithm for data mining, pp. 387–393 (2012). arXiv preprint arXiv:1205.5353
Jiang, M., Ding, Y., Goertzel, B., Huang, Z., Zhou, C., Chao, F.: Improving machine vision via incorporating expectation-maximization into deep spatio-temporal learning. In: Proceedings of the International Joint Conference on Neural Networks, pp. 1804–1811 (2014)
Google Scholar
Jin, H., Shum, W.-H., Leung, K.-S., Wong, M.-L.: Expanding self-organizing map for data visualization and cluster analysis. Inf. Sci. 163(1–3), 157–173 (2004)
Article MathSciNet Google Scholar
Kohonen, T.: Essentials of the self-organizing map. Neural Netw. 37, 52–65 (2013)
Article Google Scholar
Liu, X., Lathauwer, L., Janssens, F., Moor, B.: Hybrid clustering of multiple information sources via HOSVD. In: Zhang, L., Lu, B.-L., Kwok, J. (eds.) ISNN 2010. LNCS, vol. 6064, pp. 337–345. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13318-3_42
Chapter Google Scholar
Luo, W., Nguyen, T., Nichols, M., Tran, T., Rana, S., Gupta, S., Phung, D., Venkatesh, S., Allender, S.: Is demography destiny? application of machine learning techniques to accurately predict population health outcomes from a minimal demographic dataset. PLoS ONE 10(5), e0125602 (2015)
Article Google Scholar
Mishra, S.K., Raghavan, V.V.: An empirical study of the performance of heuristic methods for clustering. In: Pattern Recognition in Practice IV - Multiple Paradigms, Comparative Studies and Hybrid Systems, pp. 425–436. Elsevier BV (1994)
Google Scholar
Mujeeb, S., Naidu, L.K.: A relative study on big data applications and techniques. Int. J. Eng. Innov. Technol. (IJEIT) 4(10), 133–138 (2015)
Google Scholar
Murugesan, K., Jun, Z.: Hybrid bisect K-means clustering algorithm. In: International Conference on Business Computing and Global Informatization (BCGIN), pp. 216–219. IEEE (2011)
Google Scholar
Niknam, T., Fard, E.T., Pourjafarian, N., Rousta, A.: An efficient hybrid algorithm based on modified imperialist competitive algorithm and k-means for data clustering. Eng. Appl. Artif. Intell. 24(2), 306–317 (2011)
Article Google Scholar
Park, H.-S., Jun, C.-H.: A simple and fast algorithm for k-medoids clustering. Expert Syst. Appl. 36(2), 3336–3341 (2009)
Article Google Scholar
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data. ACM SIGKDD Explor. Newsl. 6(1), 90–105 (2004)
Article Google Scholar
Qi, Z., Tian, Y., Shi, Y.: Robust twin support vector machine for pattern classification. Pattern Recogn. 46(1), 305–316 (2013)
Article MATH Google Scholar
Rebentrost, P., Mohseni, M., Lloyd, S.: Quantum support vector machine for big data classification. Phys. Rev. Lett. 113(3), 1–5 (2014)
Google Scholar
Roy, D.K., Sharma, L.K.: Genetic k-Means clustering algorithm for mixed numeric and categorical data sets. Int. J. Artif. Intell. Appl. 1, 23–28 (2010)
Google Scholar
Ruiz, R., Riquelme, J.C., Aguilar-Ruiz, J.S., García-Torres, M.: Fast feature selection aimed at high-dimensional data via hybrid-sequential-ranked searches. Expert Syst. Appl. 39(12), 11094–11102 (2012)
Article Google Scholar
Sheng, W., Liu, X.: A genetic k-medoids clustering algorithm. J. Heuristics 12(6), 447–466 (2006)
Article Google Scholar
Shim, K.: MapReduce algorithms for big data analysis. In: Madaan, A., Kikuchi, S., Bhalla, S. (eds.) DNIS 2013. LNCS, vol. 7813, pp. 44–48. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37134-9_3
Chapter Google Scholar
Tsai, M.-C., Chen, K.-H., Su, C.-T., Lin, H.-C.: An Application of PSO algorithm and decision tree for medical problem. In: 2nd Internatonal Conference on Intelligent Computational System, pp. 124–126 (2012)
Google Scholar
van der Laan, M.J., Pollard, K.S.: A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap. J. Stat. Plann. Infer. 117, 275–303 (2003)
Article MathSciNet MATH Google Scholar
Venkatesh, H., Perur, S.D., Jalihal, N.: A study on use of big data in cloud computing environment. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 6(3), 2076–2078 (2015)
Google Scholar
Wu, X., Zhu, X., Wu, G.-Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)
Article Google Scholar
Xu, X., Ester, M., Kriegel, H.-P., Sander, J.: A distribution-based clustering algorithm for mining in large spatial databases. In: 14th International Conference on Data Engineering ( ICDE 1998) (1998)
Google Scholar
Yang, F., Sun, T., Zhang, C.: An efficient hybrid data clustering method based on k-harmonic means and particle swarm optimization. Expert Syst. Appl. 36(6), 9847–9852 (2009)
Article Google Scholar
Yang, Y., Liao, Y., Meng, G., Lee, J.: A hybrid feature selection scheme for unsupervised learning and its application in bearing fault diagnosis. Expert Syst. Appl. 38(9), 11311–11320 (2011)
Article Google Scholar
Zhang, H., Berg, A.C., Maire, M., Malik, J.: SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recogn. 2, 2126–2136 (2006)
Google Scholar

Download references

Acknowledgements

This work has been funded by the Spanish Government TIN2016-76515-R grant for the COMBAHO project, supported with Feder funds.

Author information

Authors and Affiliations

Universidad Central Del Ecuador, Ciudadela Universitaria, Quito, Ecuador
Zoila Ruiz & Jaime Salvador
Universidad de Alicante, Ap. 99, 03080, Alicante, Spain
Jose Garcia-Rodriguez

Authors

Zoila Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Jaime Salvador
View author publications
You can also search for this author in PubMed Google Scholar
Jose Garcia-Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jose Garcia-Rodriguez .

Editor information

Editors and Affiliations

Departamento de Electrónica, Tecnología de Computadoras y Proyectos, Universidad Politécnica de Cartagena, Cartagena, Spain
José Manuel Ferrández Vicente
Departamento de Inteligencia Articial, Universidad Nacional de Educación a Distancia, Madrid, Spain
José Ramón Álvarez-Sánchez
Departamento de Inteligencia Articial, Universidad Nacional de Educación a Distancia, Madrid, Spain
Félix de la Paz López
Departamento de Electrónica, Tecnología de Computadoras y Proyectos, Universidad Politécnica de Cartagena, Cartagena, Spain
Javier Toledo Moreo
The Ohio State University, Columbus, Ohio, USA
Hojjat Adeli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ruiz, Z., Salvador, J., Garcia-Rodriguez, J. (2017). A Survey of Machine Learning Methods for Big Data. In: Ferrández Vicente, J., Álvarez-Sánchez, J., de la Paz López, F., Toledo Moreo, J., Adeli, H. (eds) Biomedical Applications Based on Natural and Artificial Computing. IWINAC 2017. Lecture Notes in Computer Science(), vol 10338. Springer, Cham. https://doi.org/10.1007/978-3-319-59773-7_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-59773-7_27
Published: 27 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59772-0
Online ISBN: 978-3-319-59773-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics