Statistical Learning Theory and Kernel-Based Methods

Aldrich, Chris; Auret, Lidia

doi:10.1007/978-1-4471-5185-2_4

Chris Aldrich^3,4 &
Lidia Auret⁴

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

4463 Accesses
3 Citations

Abstract

The basics of kernel methods and their position in the generalized data-driven fault diagnostic framework are reviewed. The review starts out with statistical learning theory, covering concepts such as loss functions, overfitting and structural and empirical risk minimization. This is followed by linear margin classifiers, kernels and support vector machines. Transductive support vector machines are discussed and illustrated by way of an example related to multivariate image analysis of coal particles on conveyor belts. Finally, unsupervised kernel methods, such as kernel principal component analysis, are considered in detail, analogous to the application of linear principal component analysis in multivariate statistical process control. Fault diagnosis in a simulated nonlinear system by the use of kernel principal component analysis is included as an example to illustrate the concepts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
As long as the data points are not situated exactly on a linear hyperplane.
2.
The general formulation of constrained optimization problems states the inequality constraints as less than or equal to zero. For ease of visualization and generalization to SVM, inequality constraints are stated here as larger than or equal to zero, without loss of generality.
3.
For the progression of the ridge regression explanation, the general statistical nomenclature of x for independent variables and y for dependent variables will be used. KPCA reconstruction by learning has the input space as output, and the KPCA feature space as input. The KPCA nomenclature will be returned to once the ridge regression explanation is completed.

References

Belousov, A. I., Verzakov, S. A., & von Frese, J. (2002). Applicational aspects of support vector machines. Journal of Chemometrics, 16(8–10), 482–489.
Article Google Scholar
Berk, R. A. (2008). Statistical learning from a regression perspective (1st ed.). New York: Springer.
MATH Google Scholar
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational Learning Theory – COLT’92. The 5th annual workshop (pp. 144–152), Pittsburgh, PA, USA. Available at: http://portal.acm.org/citation.cfm?doid=130385.130401. Accessed 27 May 2011.
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge/New York: Cambridge University Press.
MATH Google Scholar
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.
Article Google Scholar
Chapelle, O., & Zien, A. (2005). Semi-supervised classification by low-density separation. In Proceedings of the 10th international workshop on Artificial Intelligence and Statistics (pp. 57–64).
Google Scholar
Cortes, C., & Vapnik, V. (1995). Support vector networks. Machine Learning, 20(3), 273–297.
MATH Google Scholar
Cover, T. M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers, EC-14(3), 326–334.
Article Google Scholar
Dong, D., & McAvoy, T. J. (1992). Nonlinear principal component analysis – Based on principal curves and neural networks. Computers and Chemical Engineering, 16, 313–328.
Article Google Scholar
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.
Article Google Scholar
Franc, V., Schlesinger, M. I., & Hlavac, V. (2008). Statistical pattern recognition toolbox for Matlab. Available at: http://cmp.felk.cvut.cz/cmp/software/stprtool/. Accessed 12 Dec 2011.
Hastie, T., & Stuetzle, W. (1989). Principal curves. Journal of the American Statistical Association, 84, 502–516.
Article MathSciNet MATH Google Scholar
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24(6), 417–441. Available at: Accessed 13 Apr 2011.
Article Google Scholar
Hsieh, W. (2009). Machine learning methods in the environmental sciences: Neural networks and kernels. Cambridge/New York: Cambridge University Press.
Book Google Scholar
Jemwa, G. T., & Aldrich, C. (2006). Kernel-based fault diagnosis on mineral processing plants. Minerals Engineering, 19(11), 1149–1162.
Article Google Scholar
Jemwa, G. T., & Aldrich, C. (2012). Estimating size fraction categories of coal particles on conveyor belts using image texture modelling methods. Expert Systems with Applications, 39(9), 7947–7960.
Article Google Scholar
Kaartinen, J., Hätönen, J., Hyötyniemi, H., & Miettunen, J. (2006). Machine-visionbasedcontrol of zinc flotation – A case study. Control Engineering Practice, 14, 1455–1466.
Article Google Scholar
Kwok, J. T.-Y., & Tsang, I. W.-H. (2004). The pre-image problem in kernel methods. IEEE Transactions on Neural Networks, 15(6), 1517–1525. Available at: Accessed 19 Aug 2011.
Article Google Scholar
Mika, S., Schölkopf, B., Smola, A., Müller, K.-R., Scholz, M., & Rätsch, G. (1999). Kernel PCA and de-noising in feature spaces. In Advances in neural information processing systems 11 (pp. 536–542). Cambridge: MIT Press.
Google Scholar
Moolman, D. W., Aldrich, C., van Deventer, J. S. J., & Stange, W. W. (1995). The classification offroth structures in a copper flotation plant by means of a neural net. International Journal of Mineral Processing, 43, 23–30.
Article Google Scholar
Müller, K.-R., Mika, S., Ratsch, G., Tsuda, K., & Scholkopf, B. (2001). An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 12(2), 181–201.
Article Google Scholar
Schölkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond (1st ed.). Cambridge: MIT Press.
Google Scholar
Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319.
Article Google Scholar
Schölkopf, B., Mika, S., Burges, C. J. C., Knirsch, P., Muller, K.-R., Ratsch, G., & Smola, A. J. (1999). Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10(5), 1000–1017. Available at: Accessed 19 Aug 2011.
Article Google Scholar
Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13(7), 1443–1471. Available at: Accessed 30 May 2011.
Article MATH Google Scholar
Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.
Book Google Scholar
Smola, A. J., & Schölkopf, B. (2000). Sparse greedy matrix approximation for machine learning. In Proceedings of the seventeenth International Conference on Machine Learning. ICML’00 (pp. 911–918). San Francisco: Morgan Kaufmann Publishers Inc.. Available at: http://dl.acm.org/citation.cfm?id=645529.657980.
Google Scholar
Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199–222. Available at: Accessed 30 May 2011.
Article MathSciNet Google Scholar
Smola, A. J., Mangasarian, O. L., & Schölkopf, B. (1999). Sparse kernel feature analysis. Madison: Data Mining Institute.
Google Scholar
Tessier, J., Duchesne, C., & Bartolacci, G. (2007). A machine vision approach to on-line estimation of run-of-mine ore composition on conveyor belts. Minerals Engineering, 20(12), 1129–1144.
Article Google Scholar
Tipping, M. (2001). Sparse kernel principal component analysis. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems (Neural Information Processing Systems 13 (NIPS 2000), pp. 633–639). Cambridge, MA: MIT Press.
Google Scholar
Vapnik, V. (2006). Transductive inference and semi-supervisedlearning. In O. Chapelle, B. Schölkopf, & A. Zien (Eds.), Semi-supervised learning (pp. 453–472). Cambridge, MA: MIT Press.
Google Scholar

Download references

Author information

Authors and Affiliations

Western Australian School of Mines, Curtin University, Perth, WA, Australia
Chris Aldrich
Department of Process Engineering, University of Stellenbosch, Stellenbosch, South Africa
Chris Aldrich & Lidia Auret

Authors

Chris Aldrich
View author publications
You can also search for this author in PubMed Google Scholar
Lidia Auret
View author publications
You can also search for this author in PubMed Google Scholar

Nomenclature

Symbol	Description
\( \tilde{\mathrm{x}} \)	Point satisfying all inequality constraints of an optimization problem
\( {K_{ij }} \)	Element of Gram matrix in ith row and jth column
\( {N_{SV }} \)	Number of support vectors in a training data set
\( {R_e}(f) \)	Empirical risk of overfitting function f
\( {f_0}(\mathbf{x}) \)	Objective function in an optimization problem
\( {f_0}(\tilde{\mathbf{x}} ) \)	Objective function value at point where all inequality constraints of optimization problem are satisfied
\( f_0^{*} \)	Optimal value of objective function
\( {m_G}(\mathbf{x},y) \)	Geometrical margin, i.e. the distance of a point x with associated label y from a separating hyperplane defined by parameters w and b
\( {\xi_i} \)	ith slack variable of an optimization problem
C _p*	Optimal parameter in a kernel function
C _q	Parameter in a kernel function
D	Diameter of sphere enclosing a set of (training) data
D _s	Diameter of smallest sphere enclosing a set of (training) data
\( \mathcal{F} \)	Function space, class of functions
f*	Function associated with lowest risk bound
h	Capacity parameter of a model, such as VC dimension
\( \mathcal{H} \)	Feature space
K	Gram matrix
L(y, f(x))	Loss function
M	Dimensionality of input space
m	Margin or shortest distance between two separating hyperplanes
P(x, y)	Joint probability distribution between x and y
Q	Arbitrary number of parameters
δ	Confidence
ξ	Vector of slack variables
ρ	Bias defining location of a hyperplane
σ	Width of Gaussian kernel
\( \varPhi (\mathbf{x}) \)	Mapping function from space to space
C	Covariance matrix of mean-centred data matrix X
v	Arbitrary vector
\( L(\tilde{\mathbf{x}}, \propto ) \)	Lagrangian function value at point where all inequality constraints of optimization problem are satisfied
\( L(\mathbf{x},\propto ) \)	Lagrangian function
N	Number of samples in a training data set
R(f)	Risk of overfitting function f
C(h, N, δ)	Capacity of a model
\( g(\propto ) \)	Lagrangian dual function
\( k(\mathbf{x},{\mathbf{x}}^{\prime}) \)	Kernel function
\( m(\mathbf{w},b) \)	Margin of a separating hyperplane
	Principal component score space
	Vector space of x
∝	Vector of Lagrangian multipliers
θ	Angle
κ	Sigmoidal kernel parameter
v	Parameter in the optimization of a soft margin classifier
ρ	Parameter in the optimization of a soft margin classifier
\( \vartheta \)	Sigmoidal kernel parameter

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aldrich, C., Auret, L. (2013). Statistical Learning Theory and Kernel-Based Methods. In: Unsupervised Process Monitoring and Fault Diagnosis with Machine Learning Methods. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-5185-2_4

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5185-2_4
Published: 17 May 2013
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5184-5
Online ISBN: 978-1-4471-5185-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Statistical Learning Theory and Kernel-Based Methods

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Nomenclature

Nomenclature

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation