Skip to main content

A Survey of Machine Learning Methods for Big Data

  • Conference paper
  • First Online:
Biomedical Applications Based on Natural and Artificial Computing (IWINAC 2017)

Abstract

Nowadays there are studies in different fields aimed to extract relevant information on trends, challenges and opportunities; all these studies have something in common: they work with large volumes of data. This work analyzes different studies carried out on the use of Machine Learning (ML) for processing large volumes of data (Big Data). Most of these datasets, are complex and come from various sources with structured or unstructured data. For this reason, it is necessary to find mechanisms that allow classification and, in a certain way, organize them to facilitate to the users the extraction of the required information. The processing of these data requires the use of classification techniques that will also be reviewed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Simulated Annealing.

  2. 2.

    Tabu search.

References

  1. Agrawal, A.: Global K-means (GKM) clustering algorithm: a survey. Int. J. Comput. Appl. 79(2), 20–24 (2013)

    Google Scholar 

  2. Al-Jarrah, O.Y., Yoo, P.D., Muhaidat, S., Karagiannidis, G.K., Taha, K.: Efficient machine learning for big data: a review. Big Data Res. 2(3), 87–93 (2015)

    Article  Google Scholar 

  3. Al Malki, A., Rizk, M.M., El-Shorbagy, M.A., Mousa, A.A., Malki, A.A., Rizk, M.M., Mousa, A.A., Mousa, A.A.: Hybrid genetic algorithm with K-means for clustering problems. Open J. Optim. 5(02), 71 (2016)

    Article  Google Scholar 

  4. Al-Sultana, K.S., Khan, M.M.: Computational experience on four algorithms for the hard clustering problem. Pattern Recogn. Lett. 17(3), 295–308 (1996)

    Article  Google Scholar 

  5. Arellano-Verdejo, J., Alba, E., Godoy-Calderon, S.: Efficiently finding the optimum number of clusters in a dataset with a new hybrid differential evolution algorithm: DELA. Soft. Comput. 20(3), 895–905 (2016)

    Article  Google Scholar 

  6. Backlund, H., Hedblom, A., Neijman, N.: A density-based spatial clustering of application with noise. Data Mining TNM033, pp. 11–30 (2011)

    Google Scholar 

  7. Bobadilla, J., Ortega, F., Hernando, A., de Rivera, G.G.: A similarity metric designed to speed up, using hardware, the recommender systems k-nearest neighbors algorithm. Knowl.-Based Syst. 51, 27–34 (2013)

    Article  Google Scholar 

  8. Cai, X., Nie, F., Huang, H.: Multi-view K-means clustering on big data. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 2598–2604 (2013)

    Google Scholar 

  9. De Carvalho, F.A.T.: Fuzzy c-means clustering methods for symbolic interval data. Pattern Recogn. Lett. 28(4), 423–437 (2007)

    Article  Google Scholar 

  10. Cui, X., Potok, T.E.: Document clustering analysis based on hybrid PSO + K-means algorithm. J. Comput. Sci. 27(special issue), 33 (2005)

    Google Scholar 

  11. Dai, W., Ji, W.: A MapReduce implementation of C4. 5 decision tree algorithm. Int. J. Database Theory Appl. 7(1), 49–60 (2014)

    Article  Google Scholar 

  12. Pascual, D., Pla, F., Sánchez, J.S.: A density-based hierarchical clustering algorithm for highly overlapped distributions with noisy points. In: CCIA, vol. 220, pp. 183–192 (2010)

    Google Scholar 

  13. Derrac, J., Chiclana, F., García, S., Herrera, F.: Evolutionary fuzzy k-nearest neighbors algorithm using interval-valued fuzzy sets. Inf. Sci. 329, 144–163 (2016)

    Article  Google Scholar 

  14. Fan, W., Bifet, A.: Mining big data : current status, and forecast to the future. ACM SIGKDD Explor. Newsl. 14(2), 1–5 (2013)

    Article  Google Scholar 

  15. Feng, X., Wang, Z., Yin, G., Wang, Y.: PSO-based DBSCAN with obstacle constraints. J. Theor. Appl. Inf. Technol. 46(1), 377–383 (2012)

    Google Scholar 

  16. Hatamlou, A.: Black hole: a new heuristic optimization approach for data clustering. Inf. Sci. 222, 175–184 (2013)

    Article  MathSciNet  Google Scholar 

  17. Ho, R.: Big data machine learning: patterns for predictive analytics. DZone Refcardz 158, 1–6 (2012)

    Google Scholar 

  18. Jadhav, D.K.: Big data: the new challenges in data mining. Int. J. Innov. Res. Comput. Sci. Technol. 1(2), 39–42 (2013)

    MathSciNet  Google Scholar 

  19. Jain, R.: A hybrid clustering algorithm for data mining, pp. 387–393 (2012). arXiv preprint arXiv:1205.5353

  20. Jiang, M., Ding, Y., Goertzel, B., Huang, Z., Zhou, C., Chao, F.: Improving machine vision via incorporating expectation-maximization into deep spatio-temporal learning. In: Proceedings of the International Joint Conference on Neural Networks, pp. 1804–1811 (2014)

    Google Scholar 

  21. Jin, H., Shum, W.-H., Leung, K.-S., Wong, M.-L.: Expanding self-organizing map for data visualization and cluster analysis. Inf. Sci. 163(1–3), 157–173 (2004)

    Article  MathSciNet  Google Scholar 

  22. Kohonen, T.: Essentials of the self-organizing map. Neural Netw. 37, 52–65 (2013)

    Article  Google Scholar 

  23. Liu, X., Lathauwer, L., Janssens, F., Moor, B.: Hybrid clustering of multiple information sources via HOSVD. In: Zhang, L., Lu, B.-L., Kwok, J. (eds.) ISNN 2010. LNCS, vol. 6064, pp. 337–345. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13318-3_42

    Chapter  Google Scholar 

  24. Luo, W., Nguyen, T., Nichols, M., Tran, T., Rana, S., Gupta, S., Phung, D., Venkatesh, S., Allender, S.: Is demography destiny? application of machine learning techniques to accurately predict population health outcomes from a minimal demographic dataset. PLoS ONE 10(5), e0125602 (2015)

    Article  Google Scholar 

  25. Mishra, S.K., Raghavan, V.V.: An empirical study of the performance of heuristic methods for clustering. In: Pattern Recognition in Practice IV - Multiple Paradigms, Comparative Studies and Hybrid Systems, pp. 425–436. Elsevier BV (1994)

    Google Scholar 

  26. Mujeeb, S., Naidu, L.K.: A relative study on big data applications and techniques. Int. J. Eng. Innov. Technol. (IJEIT) 4(10), 133–138 (2015)

    Google Scholar 

  27. Murugesan, K., Jun, Z.: Hybrid bisect K-means clustering algorithm. In: International Conference on Business Computing and Global Informatization (BCGIN), pp. 216–219. IEEE (2011)

    Google Scholar 

  28. Niknam, T., Fard, E.T., Pourjafarian, N., Rousta, A.: An efficient hybrid algorithm based on modified imperialist competitive algorithm and k-means for data clustering. Eng. Appl. Artif. Intell. 24(2), 306–317 (2011)

    Article  Google Scholar 

  29. Park, H.-S., Jun, C.-H.: A simple and fast algorithm for k-medoids clustering. Expert Syst. Appl. 36(2), 3336–3341 (2009)

    Article  Google Scholar 

  30. Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data. ACM SIGKDD Explor. Newsl. 6(1), 90–105 (2004)

    Article  Google Scholar 

  31. Qi, Z., Tian, Y., Shi, Y.: Robust twin support vector machine for pattern classification. Pattern Recogn. 46(1), 305–316 (2013)

    Article  MATH  Google Scholar 

  32. Rebentrost, P., Mohseni, M., Lloyd, S.: Quantum support vector machine for big data classification. Phys. Rev. Lett. 113(3), 1–5 (2014)

    Google Scholar 

  33. Roy, D.K., Sharma, L.K.: Genetic k-Means clustering algorithm for mixed numeric and categorical data sets. Int. J. Artif. Intell. Appl. 1, 23–28 (2010)

    Google Scholar 

  34. Ruiz, R., Riquelme, J.C., Aguilar-Ruiz, J.S., García-Torres, M.: Fast feature selection aimed at high-dimensional data via hybrid-sequential-ranked searches. Expert Syst. Appl. 39(12), 11094–11102 (2012)

    Article  Google Scholar 

  35. Sheng, W., Liu, X.: A genetic k-medoids clustering algorithm. J. Heuristics 12(6), 447–466 (2006)

    Article  Google Scholar 

  36. Shim, K.: MapReduce algorithms for big data analysis. In: Madaan, A., Kikuchi, S., Bhalla, S. (eds.) DNIS 2013. LNCS, vol. 7813, pp. 44–48. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37134-9_3

    Chapter  Google Scholar 

  37. Tsai, M.-C., Chen, K.-H., Su, C.-T., Lin, H.-C.: An Application of PSO algorithm and decision tree for medical problem. In: 2nd Internatonal Conference on Intelligent Computational System, pp. 124–126 (2012)

    Google Scholar 

  38. van der Laan, M.J., Pollard, K.S.: A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap. J. Stat. Plann. Infer. 117, 275–303 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  39. Venkatesh, H., Perur, S.D., Jalihal, N.: A study on use of big data in cloud computing environment. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 6(3), 2076–2078 (2015)

    Google Scholar 

  40. Wu, X., Zhu, X., Wu, G.-Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)

    Article  Google Scholar 

  41. Xu, X., Ester, M., Kriegel, H.-P., Sander, J.: A distribution-based clustering algorithm for mining in large spatial databases. In: 14th International Conference on Data Engineering ( ICDE 1998) (1998)

    Google Scholar 

  42. Yang, F., Sun, T., Zhang, C.: An efficient hybrid data clustering method based on k-harmonic means and particle swarm optimization. Expert Syst. Appl. 36(6), 9847–9852 (2009)

    Article  Google Scholar 

  43. Yang, Y., Liao, Y., Meng, G., Lee, J.: A hybrid feature selection scheme for unsupervised learning and its application in bearing fault diagnosis. Expert Syst. Appl. 38(9), 11311–11320 (2011)

    Article  Google Scholar 

  44. Zhang, H., Berg, A.C., Maire, M., Malik, J.: SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recogn. 2, 2126–2136 (2006)

    Google Scholar 

Download references

Acknowledgements

This work has been funded by the Spanish Government TIN2016-76515-R grant for the COMBAHO project, supported with Feder funds.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jose Garcia-Rodriguez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Ruiz, Z., Salvador, J., Garcia-Rodriguez, J. (2017). A Survey of Machine Learning Methods for Big Data. In: Ferrández Vicente, J., Álvarez-Sánchez, J., de la Paz López, F., Toledo Moreo, J., Adeli, H. (eds) Biomedical Applications Based on Natural and Artificial Computing. IWINAC 2017. Lecture Notes in Computer Science(), vol 10338. Springer, Cham. https://doi.org/10.1007/978-3-319-59773-7_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59773-7_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59772-0

  • Online ISBN: 978-3-319-59773-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics