Advertisement

Cost-Balance Setting of MapReduce and Spark-Based Architectures for SVM

  • Mario Alberto Giraldo LondoñoEmail author
  • John Freddy DuitamaEmail author
  • Julián David Arias-LondoñoEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 833)

Abstract

Support Vector Machine (SVM) is a classifier widely used in machine learning because of its high generalization capacity. The sequential minimal optimization (SMO) its most popular implementation, scales somewhere between linear and quadratic in the training set size for various test problems. This fact makes using SVM to train large data sets have a high computational cost. SVM implementations on distributed systems such as MapReduce and Spark have shown efficiency to improve computational cost; this paper analyzes how data subset size and number of mapping tasks affects SVM performance on MapReduce and Spark. Also, a cost model as a useful tool for setting data subset size according to available hardware and data to be processed is proposed.

Keywords

Support vector machine Classification MapReduce Spark 

References

  1. 1.
    Aguilar, L.J.: Big Data. Análisis De Grandes Volúmenes De Datos En Organizaciones (2013)Google Scholar
  2. 2.
    Graf, H.P., Cosatto, E., Bottou, L., Durdanovic, I., Vapnik, V.: Parallel support vector machines: the cascade SVM. In: Advances in Neural Information Processing Systems, pp. 521–528 (2005)Google Scholar
  3. 3.
    Martínez-Trinidad, J., et al.: Support vector machines for pattern classification, vol. 6256 (2010)Google Scholar
  4. 4.
    Keerthi, S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to platt’s SMO algorithm for SVM classifier design. Neural Comput. 13, 637–649 (2001)CrossRefGoogle Scholar
  5. 5.
    You, Z., Yu, J., Zhu, L., Li, S., Wen, Z.: A MapReduce based parallel SVM for large-scale predicting protein – protein interactions. Neurocomputing 145, 37–43 (2014)CrossRefGoogle Scholar
  6. 6.
    Priyadarshini, A.: A map reduce based support vector machine for big data classification. Int. J. Database Theory Appl. 8(5), 77–98 (2015)CrossRefGoogle Scholar
  7. 7.
    Alham, N.K., Li, M., Liu, Y., Qi, M.: A MapReduce-based distributed SVM ensemble for scalable image classification and annotation. Comput. Math Appl. 66(10), 1920–1934 (2013)CrossRefGoogle Scholar
  8. 8.
    Çatak, F.Ö., Balaban, M.E.: A MapReduce-based distributed SVM algorithm for binary classification. Turk. J. Electr. Eng. Comput. Sci. 24(3), 863–873 (2016)CrossRefGoogle Scholar
  9. 9.
    Liu, C., Wu, B., Yang, Y., Guo, Z.: Multiple submodels parallel support vector machine on spark. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 945–950 (2016)Google Scholar
  10. 10.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)CrossRefGoogle Scholar
  11. 11.
    Zhang, Z., Cherkasova, L., Loo, B.T.: Parameterizable benchmarking framework for designing a MapReduce performance model. Concurr. Comput. Pract. Exp. 26(12), 2005–20026 (2014)CrossRefGoogle Scholar
  12. 12.
    Chen, K., Powers, J., Guo, S., Tian, F.: CRESP: towards optimal resource provisioning for MapReduce computing in public clouds. IEEE Trans. Parallel Distrib. Syst. 25(6), 1403–1412 (2014)CrossRefGoogle Scholar
  13. 13.
    Verma, A., Cherkasova, L., Campbell, R.H.: ARIA: automatic resource inference and allocation for MapReduce environments. In: 8th ACM International Conference on Autonomic Computing, ICAC 2011, Karlsruhe, Germany (2011)Google Scholar
  14. 14.
    White, T.: Hadoop: The Definitive Guide, vol. 54. O’Reilly, Cambridge (2012)Google Scholar
  15. 15.
    The Apache Software Foundation, Spark Overview (2014). https://spark.apache.org/docs/latest/
  16. 16.
    Meng, X., Bradley, J., Yavuz, B., et al.: MLlib: machine learning in apache spark. J. Mach. Learn. Res. 17, 1–7 (2016)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Universidad de AntioquiaMedellinColombia

Personalised recommendations