A Fast Data Preprocessing Procedure for Support Vector Regression

Hao, Zhifeng; Wen, Wen; Yang, Xiaowei; Lu, Jie; Zhang, Guangquan

doi:10.1007/11875581_6

Zhifeng Hao²⁰,
Wen Wen²¹,
Xiaowei Yang^20,22,
Jie Lu²² &
…
Guangquan Zhang²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4224))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1304 Accesses

Abstract

A fast data preprocessing procedure (FDPP) for support vector regression (SVR) is proposed in this paper. In the presented method, the dataset is firstly divided into several subsets and then K-means clustering is implemented in each subset. The clusters are classified by their group size. The centroids with small group size are eliminated and the rest centroids are used for SVR training. The relationships between the group sizes and the noisy clusters are discussed and simulations are also given. Results show that FDPP cleans most of the noises, preserves the useful statistical information and reduces the training samples. Most importantly, FDPP runs very fast and maintains the good regression performance of SVR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Vapnik, V.: The Nature of Statistical Learning Theory. John Wiley, New York (1995)
MATH Google Scholar
Wu, C.H.: Travel-Time Prediction with Support Vector Regression. IEEE Transactions on Intelligent Transportation Systems 5, 276–281 (2004)
Article Google Scholar
Yang, H.Q., Chan, L.W., King, I.: Support Vector Machine Regression for Volatile Stock Market Prediction. In: Proceedings of the Third Intelligent Data Engineering and Automated Learning, pp. 391–396 (2002)
Google Scholar
Frie, T.T., Chistianini, V.N., Campbell, C.: The Kernel Adatron Algorithm: A Fast and Simple Learning Procedure for Support Vector Machines. In: Proceedings of the 15th International Conference of Machine Learning. Morgan Kaufmann, San Fransisco (1998)
Google Scholar
Vapnik, V.: Estimation of Dependence Based on Empirical Data. Springer, New York (1982)
Google Scholar
Joachims, T.: Making large-scale support vector machine learning practical. In: Advances in Kernel Methods: Support Vector Learning, pp. 169–184. MIT Press, Cambridge (1998)
Google Scholar
Mangasarian, O.L., Musicant, D.R.: Successive Overrelaxation for Support Vector Machines. IEEE Transactions on Neural Networks 10, 1032–1037 (1999)
Article Google Scholar
Yu, H.J., Yang, J., Han, J.W., Li, X.L.: Making SVMs Scalable to Large Data Sets using Hierarchical Cluster Indexing. Data Mining and Knowledge Discovery (2005) (Published online)
Google Scholar
Wang, W.J., Xu, Z.B.: A Heuristic Training for Support Vector Regression. Neurocomputing 61, 259–275 (2004)
Article Google Scholar
Quan, Y., Yang, J., Yao, L.X., Ye, C.Z.: Successive Overrelaxation for Support Vector Regression. Journal of Software on 15, 200–206 (2004)
MATH Google Scholar
Platt, J.: Fast Training of Support Vector Machines Using Sequential Minimal Optimizationg. Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1998)
Google Scholar
Webb, A.R.: K-means clustering, Statistical Pattern Recognition, pp. 296–299. John Wiley & Sons, Inc., Chichester (2002)
Book Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematical Science, South China University of Technology, Guangzhou, 510641, China
Zhifeng Hao & Xiaowei Yang
College of Computer Science and Engineering, South China University of Technology, Guangzhou, 510641, China
Wen Wen
Faculty of Information Technology University of technology Sydney, PO Box 123, Broadway, NSW 2007, Australia
Xiaowei Yang, Jie Lu & Guangquan Zhang

Authors

Zhifeng Hao
View author publications
You can also search for this author in PubMed Google Scholar
Wen Wen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Lu
View author publications
You can also search for this author in PubMed Google Scholar
Guangquan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politécnica Superior, GICAP Research Group, Universidad de Burgo, Calle Francisco de Vitoria S/N, Edifico C, Campus Vena, 09006, Burgos, Spain
Emilio Corchado
School of Electrical and Electronic Engineering, University of Manchester, UK
Hujun Yin
Department of Information Systems and Computation, Technical University of Valencia, Camino de Vera, Valencia, Spain
Vicente Botti
University of West Scotland, PA1 2BE, Paisley, Scotland
Colin Fyfe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hao, Z., Wen, W., Yang, X., Lu, J., Zhang, G. (2006). A Fast Data Preprocessing Procedure for Support Vector Regression. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_6

Download citation

DOI: https://doi.org/10.1007/11875581_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45485-4
Online ISBN: 978-3-540-45487-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics