Abstract
This paper presents a new method for selecting valuable training data for support vector machines (SVM) from large, noisy sets using a genetic algorithm (GA). SVM training data selection is a known, however not extensively investigated problem. The existing methods rely mainly on analyzing the geometric properties of the data or adapt a randomized selection, and to the best of our knowledge, GA-based approaches have not been applied for this purpose yet. Our work was inspired by the problems encountered when using SVM for skin segmentation. Due to a very large set size, the existing methods are too time-consuming, and random selection is not effective because of the set noisiness. In the work reported here we demonstrate how a GA can be used to optimize the training set, and we present extensive experimental results which confirm that the new method is highly effective for real-world data.
Chapter PDF
Similar content being viewed by others
Keywords
- Support Vector Machine
- Support Vector Machine Training
- Genetic Algorithm Process
- Skin Segmentation
- Genetic Algorithm Strategy
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20(3), 273–297 (1995)
Khan, R., Hanbury, A., Stöttinger, J., Bais, A.: Color based skin classification. Pattern Recogn. Lett. 33(2), 157–163 (2012)
Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in kernel methods, pp. 169–184. MIT Press, USA (1999)
Balc’azar, J., Dai, Y., Watanabe, O.: A Random Sampling Technique for Training Support Vector Machines. In: Abe, N., Khardon, R., Zeugmann, T. (eds.) ALT 2001. LNCS (LNAI), vol. 2225, pp. 119–134. Springer, Heidelberg (2001)
Lee, Y.J., Huang, S.Y.: Reduced support vector machines: A statistical theory. IEEE Trans. on Neural Networks 18(1), 1–13 (2007)
Chien, L.J., Chang, C.C., Lee, Y.J.: Variant methods of reduced set selection for reduced support vector machines. J. Inf. Sci. Eng. 26(1), 183–196 (2010)
Koggalage, R., Halgamuge, S.: Reducing the number of training samples for fast support vector machine classification. Neural Information Process. Lett. and Reviews 2(3), 57–65 (2004)
Li, Y.: Selecting training points for one-class support vector machines. Pattern Recogn. Lett. 32(11), 1517–1522 (2011)
Shin, H., Cho, S.: Neighborhood property–based pattern selection for support vector machines. Neural Comput. 19(3), 816–855 (2007)
Abe, S., Inoue, T.: Fast Training of Support Vector Machines by Extracting Boundary Data. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) ICANN 2001. LNCS, vol. 2130, pp. 308–313. Springer, Heidelberg (2001)
Wang, D., Shi, L.: Selecting valuable training samples for SVMs via data structure analysis. Neurocomputing 71, 2772–2781 (2008)
Chang, C.C., Pao, H.K., Lee, Y.J.: An RSVM based two-teachers-one-student semi-supervised learning algorithm. Neural Networks 25, 57–69 (2012)
Wang, J., Neskovic, P., Cooper, L.N.: Training Data Selection for Support Vector Machines. In: Wang, L., Chen, K., S. Ong, Y. (eds.) ICNC 2005. LNCS, vol. 3610, pp. 554–564. Springer, Heidelberg (2005)
Zhang, W., King, I.: Locating support vectors via β-skeleton technique. In: Int. Conf. on Neural Information Process, pp. 1423–1427 (2002)
Tsang, I.W., Kwok, J.T., Cheung, P.M.: Core vector machines: Fast SVM training on very large data sets. J. of Machine Learning Research 6, 363–392 (2005)
Zeng, Z.Q., Xu, H.R., Xie, Y.Q., Gao, J.: A geometric approach to train SVM on very large data sets. Intell. System and Knowledge Eng. 1, 991–996 (2008)
Schohn, G., Cohn, D.: Less is more: Active learning with support vector machines. In: 17th Int. Conf. on Machine Learning, pp. 839–846. Morgan Kaufmann Publishers Inc., USA (2000)
Musicant, D.R., Feinberg, A.: Active set support vector regression. IEEE Trans. on Neural Networks 15(2), 268–275 (2004)
Holland, J.H.: Adaptation in Natural and Artificial Systems. The University of Michigan Press (1975)
Corne, D., Dorigo, M., Glover, F., Dasgupta, D., Moscato, P., Poli, R., Price, K.V. (eds.): New ideas in optimization. McGraw-Hill Ltd., UK (1999)
Elamin, E.E.A.: A proposed genetic algorithm selection method. In: 1st National Symposium, NITS (2006)
Nagata, Y., Bräysy, O., Dullaert, W.: A penalty-based edge assembly memetic algorithm for the vehicle routing problem with time windows. Computers & OR 37(4), 724–737 (2010)
Nalepa, J., Czech, Z.J.: A parallel heuristic algorithm to solve the vehicle routing problem with time windows. Studia Informatica 33(1), 91–106 (2012)
Phung, S.L., Chai, D., Bouzerdoum, A.: Adaptive skin segmentation in color images. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal, pp. 353–356 (2003)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Trans. on Intell. Systems and Technology 2, 27:1–27:27 (2011)
Staelin, C.: Parameter selection for support vector machines. Technical Report HPL-2002-354. HP Laboratories, Israel (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kawulok, M., Nalepa, J. (2012). Support Vector Machines Training Data Selection Using a Genetic Algorithm. In: Gimel’farb, G., et al. Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2012. Lecture Notes in Computer Science, vol 7626. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34166-3_61
Download citation
DOI: https://doi.org/10.1007/978-3-642-34166-3_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34165-6
Online ISBN: 978-3-642-34166-3
eBook Packages: Computer ScienceComputer Science (R0)