Fast Support Vector Machine Classification of Very Large Datasets

Fehr, Janis; Arreola, Karina Zapién; Burkhardt, Hans

doi:10.1007/978-3-540-78246-9_2

Janis Fehr⁵,
Karina Zapién Arreola⁶ &
Hans Burkhardt⁵

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

6122 Accesses
4 Citations

Abstract

In many classification applications, Support Vector Machines (SVMs) have proven to be highly performing and easy to handle classifiers with very good generalization abilities. However, one drawback of the SVM is its rather high classification complexity which scales linearly with the number of Support Vectors (SVs). This is due to the fact that for the classification of one sample, the kernel function has to be evaluated for all SVs. To speed up classification, different approaches have been published, most which of try to reduce the number of SVs. In our work, which is especially suitable for very large datasets, we follow a different approach: as we showed in (Zapien et al. 2006), it is effectively possible to approximate large SVM problems by decomposing the original problem into linear subproblems, where each subproblem can be evaluated in Ω(1). This approach is especially successful, when the assumption holds that a large classification problem can be split into mainly easy and only a few hard subproblems. On standard benchmark datasets, this approach achieved great speedups while suffering only sightly in terms of classification accuracy and generalization ability. In this contribution, we extend the methods introduced in (Zapien et al. 2006) using not only linear, but also non-linear subproblems for the decomposition of the original problem which further increases the classification performance with only a little loss in terms of speed. An implementation of our method is available in (Ronneberger and et al.) Due to page limitations, we had to move some of theoretic details (e.g. proofs) and extensive experimental results to a technical report (Zapien et al. 2007).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

V. VAPNIK (1995): The Nature of Statistical Learning Theory, New York: Springer Verlag.
MATH Google Scholar
Y. LEE and O. MANGASARIAN (2001): RSVM: Reduced Support Vector Machines, Pro-ceedings of the first SIAM International Conference onData Mining, 2001 SIAM Inter-national Conference, Chicago, Philadelphia.
Google Scholar
H. LEI and V. GOVINDARAJU (2005): Speeding Up Multi-class SVM Evaluation by PCA andFeature Selection, Proceedings of the Workshop on Feature Selection for DataMin-ing: Interfacing Machine Learning and Statistics, 2005 SIAM Workshop.
Google Scholar
C. BURGES and B. SCHOELKOPF (1997): Improving Speed and Accuracy of Support Vector Learning Machines, Advances in Neural Information Processing Systems9, MIT Press, MA, pp 375-381.
Google Scholar
K. P. BENNETT and E. J. BREDENSTEINER (2000): Duality and Geometry in SVM Clas-sifiers, Proc. 17th International Conf. on Machine Learning, pp 57-64.
Google Scholar
C. HSU and C. LIN (2001): A Comparison of Methods for Multi-Class Support Vector Ma-chines, Technical report, Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan.
Google Scholar
T. K. HO AND E. M. KLEINBERG (1996): Building projectable classifiers of arbitrary com-plexity, Proceedings of the 13th International Conference onPattern Recognition, pp 880-885, Vienna, Austria.
Google Scholar
B. SCHOELKOPF and A. SMOLA (2002): Learning with Kernels, The MIT Press, Cam-bridge,MA, USA.
Google Scholar
S. KEERTHI and S. SHEVADE and C. Bhattacharyya and K. Murthy (1999): Improvements to Platt’s SMO Algorithm for SVM Classifier Design, Technical report, Dept. ofCSA, Banglore, India.
Google Scholar
P. CARBONETTO: Face data base, http://www.cs.ubc.ca/pcarbo/, University of British Columbia Computer Science Deptartment.
J. J. HULL (1994): A database for handwritten text recognition research, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 16, No 5, pp 550-554.
Article Google Scholar
K. ZAPIEN, J. FEHR and H. BURKHARDT (2006): Fast Support Vector Machine Classifi-cation using linear SVMs, in Proceedings: ICPR, pp. 366- 369 ,Hong Kong 2006.
Google Scholar
O. RONNEBERGER and et al.: SVM template library, University of Freiburg, Depart-ment of Computer Science, Chair of Pattern Recognition and Image Processing, http://lmb.informatik.uni-freiburg.de/lmbsoft/libsvmtl/index.en.html
K. ZAPIEN, J. FEHR and H. BURKHARDT (2007): Fast Support Vector Machine Classi-fication of very large Datasets, Technical Report 2/2007, University of Freiburg, De-partment of Computer Science, Chair of Pattern Recognition and Image Processing. http://lmb.informatik.uni-freiburg.de/people/fehr/svm_tree.pdf

Download references

Author information

Authors and Affiliations

Chair of Pattern Recognition and Image Processing, University of Freiburg, 79110, Freiburg, Germany
Janis Fehr & Hans Burkhardt
INSA de Rouen, LITIS, 76801, St Etienne du Rouvray, France
Karina Zapién Arreola

Authors

Janis Fehr
View author publications
You can also search for this author in PubMed Google Scholar
Karina Zapién Arreola
View author publications
You can also search for this author in PubMed Google Scholar
Hans Burkhardt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science and Institute of Business Economics and Information Systems, University of Hildesheim, Marienburgerplatz 22, 31141, Hildesheim, Germany
Christine Preisach
Lehrstuhl für Mustererkennung und Bildverarbeitung, Universität Freiburg, Gebäude 052, 79110, Freiburg i. Br, Germany
Hans Burkhardt
Institute of Computer Science and Institute of Business Economics and Information Systems, Marienburgerplatz 22, 31141, Hildesheim, Germany
Lars Schmidt-Thieme
Fakultät für Wirtschaftswissenschaften, Lehrstuhl für Betriebswirtschaftslehre, insbes. Marketing, Universitätsstraße 25, 33615, Bielefeld, Germany
Reinhold Decker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fehr, J., Arreola, K.Z., Burkhardt, H. (2008). Fast Support Vector Machine Classification of Very Large Datasets. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78246-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-78246-9_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78239-1
Online ISBN: 978-3-540-78246-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics