Abstract
A fast data preprocessing procedure (FDPP) for support vector regression (SVR) is proposed in this paper. In the presented method, the dataset is firstly divided into several subsets and then K-means clustering is implemented in each subset. The clusters are classified by their group size. The centroids with small group size are eliminated and the rest centroids are used for SVR training. The relationships between the group sizes and the noisy clusters are discussed and simulations are also given. Results show that FDPP cleans most of the noises, preserves the useful statistical information and reduces the training samples. Most importantly, FDPP runs very fast and maintains the good regression performance of SVR.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Vapnik, V.: The Nature of Statistical Learning Theory. John Wiley, New York (1995)
Wu, C.H.: Travel-Time Prediction with Support Vector Regression. IEEE Transactions on Intelligent Transportation Systems 5, 276–281 (2004)
Yang, H.Q., Chan, L.W., King, I.: Support Vector Machine Regression for Volatile Stock Market Prediction. In: Proceedings of the Third Intelligent Data Engineering and Automated Learning, pp. 391–396 (2002)
Frie, T.T., Chistianini, V.N., Campbell, C.: The Kernel Adatron Algorithm: A Fast and Simple Learning Procedure for Support Vector Machines. In: Proceedings of the 15th International Conference of Machine Learning. Morgan Kaufmann, San Fransisco (1998)
Vapnik, V.: Estimation of Dependence Based on Empirical Data. Springer, New York (1982)
Joachims, T.: Making large-scale support vector machine learning practical. In: Advances in Kernel Methods: Support Vector Learning, pp. 169–184. MIT Press, Cambridge (1998)
Mangasarian, O.L., Musicant, D.R.: Successive Overrelaxation for Support Vector Machines. IEEE Transactions on Neural Networks 10, 1032–1037 (1999)
Yu, H.J., Yang, J., Han, J.W., Li, X.L.: Making SVMs Scalable to Large Data Sets using Hierarchical Cluster Indexing. Data Mining and Knowledge Discovery (2005) (Published online)
Wang, W.J., Xu, Z.B.: A Heuristic Training for Support Vector Regression. Neurocomputing 61, 259–275 (2004)
Quan, Y., Yang, J., Yao, L.X., Ye, C.Z.: Successive Overrelaxation for Support Vector Regression. Journal of Software on 15, 200–206 (2004)
Platt, J.: Fast Training of Support Vector Machines Using Sequential Minimal Optimizationg. Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1998)
Webb, A.R.: K-means clustering, Statistical Pattern Recognition, pp. 296–299. John Wiley & Sons, Inc., Chichester (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hao, Z., Wen, W., Yang, X., Lu, J., Zhang, G. (2006). A Fast Data Preprocessing Procedure for Support Vector Regression. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_6
Download citation
DOI: https://doi.org/10.1007/11875581_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45485-4
Online ISBN: 978-3-540-45487-8
eBook Packages: Computer ScienceComputer Science (R0)