Skip to main content

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

Abstract

The basics of kernel methods and their position in the generalized data-driven fault diagnostic framework are reviewed. The review starts out with statistical learning theory, covering concepts such as loss functions, overfitting and structural and empirical risk minimization. This is followed by linear margin classifiers, kernels and support vector machines. Transductive support vector machines are discussed and illustrated by way of an example related to multivariate image analysis of coal particles on conveyor belts. Finally, unsupervised kernel methods, such as kernel principal component analysis, are considered in detail, analogous to the application of linear principal component analysis in multivariate statistical process control. Fault diagnosis in a simulated nonlinear system by the use of kernel principal component analysis is included as an example to illustrate the concepts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    As long as the data points are not situated exactly on a linear hyperplane.

  2. 2.

    The general formulation of constrained optimization problems states the inequality constraints as less than or equal to zero. For ease of visualization and generalization to SVM, inequality constraints are stated here as larger than or equal to zero, without loss of generality.

  3. 3.

    For the progression of the ridge regression explanation, the general statistical nomenclature of x for independent variables and y for dependent variables will be used. KPCA reconstruction by learning has the input space as output, and the KPCA feature space as input. The KPCA nomenclature will be returned to once the ridge regression explanation is completed.

References

  • Belousov, A. I., Verzakov, S. A., & von Frese, J. (2002). Applicational aspects of support vector machines. Journal of Chemometrics, 16(8–10), 482–489.

    Article  Google Scholar 

  • Berk, R. A. (2008). Statistical learning from a regression perspective (1st ed.). New York: Springer.

    MATH  Google Scholar 

  • Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational Learning Theory – COLT’92. The 5th annual workshop (pp. 144–152), Pittsburgh, PA, USA. Available at: http://portal.acm.org/citation.cfm?doid=130385.130401. Accessed 27 May 2011.

  • Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge/New York: Cambridge University Press.

    MATH  Google Scholar 

  • Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.

    Article  Google Scholar 

  • Chapelle, O., & Zien, A. (2005). Semi-supervised classification by low-density separation. In Proceedings of the 10th international workshop on Artificial Intelligence and Statistics (pp. 57–64).

    Google Scholar 

  • Cortes, C., & Vapnik, V. (1995). Support vector networks. Machine Learning, 20(3), 273–297.

    MATH  Google Scholar 

  • Cover, T. M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers, EC-14(3), 326–334.

    Article  Google Scholar 

  • Dong, D., & McAvoy, T. J. (1992). Nonlinear principal component analysis – Based on principal curves and neural networks. Computers and Chemical Engineering, 16, 313–328.

    Article  Google Scholar 

  • Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.

    Article  Google Scholar 

  • Franc, V., Schlesinger, M. I., & Hlavac, V. (2008). Statistical pattern recognition toolbox for Matlab. Available at: http://cmp.felk.cvut.cz/cmp/software/stprtool/. Accessed 12 Dec 2011.

  • Hastie, T., & Stuetzle, W. (1989). Principal curves. Journal of the American Statistical Association, 84, 502–516.

    Article  MathSciNet  MATH  Google Scholar 

  • Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24(6), 417–441. Available at: Accessed 13 Apr 2011.

    Article  Google Scholar 

  • Hsieh, W. (2009). Machine learning methods in the environmental sciences: Neural networks and kernels. Cambridge/New York: Cambridge University Press.

    Book  Google Scholar 

  • Jemwa, G. T., & Aldrich, C. (2006). Kernel-based fault diagnosis on mineral processing plants. Minerals Engineering, 19(11), 1149–1162.

    Article  Google Scholar 

  • Jemwa, G. T., & Aldrich, C. (2012). Estimating size fraction categories of coal particles on conveyor belts using image texture modelling methods. Expert Systems with Applications, 39(9), 7947–7960.

    Article  Google Scholar 

  • Kaartinen, J., Hätönen, J., Hyötyniemi, H., & Miettunen, J. (2006). Machine-visionbasedcontrol of zinc flotation – A case study. Control Engineering Practice, 14, 1455–1466.

    Article  Google Scholar 

  • Kwok, J. T.-Y., & Tsang, I. W.-H. (2004). The pre-image problem in kernel methods. IEEE Transactions on Neural Networks, 15(6), 1517–1525. Available at: Accessed 19 Aug 2011.

    Article  Google Scholar 

  • Mika, S., Schölkopf, B., Smola, A., Müller, K.-R., Scholz, M., & Rätsch, G. (1999). Kernel PCA and de-noising in feature spaces. In Advances in neural information processing systems 11 (pp. 536–542). Cambridge: MIT Press.

    Google Scholar 

  • Moolman, D. W., Aldrich, C., van Deventer, J. S. J., & Stange, W. W. (1995). The classification offroth structures in a copper flotation plant by means of a neural net. International Journal of Mineral Processing, 43, 23–30.

    Article  Google Scholar 

  • Müller, K.-R., Mika, S., Ratsch, G., Tsuda, K., & Scholkopf, B. (2001). An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 12(2), 181–201.

    Article  Google Scholar 

  • Schölkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond (1st ed.). Cambridge: MIT Press.

    Google Scholar 

  • Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319.

    Article  Google Scholar 

  • Schölkopf, B., Mika, S., Burges, C. J. C., Knirsch, P., Muller, K.-R., Ratsch, G., & Smola, A. J. (1999). Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10(5), 1000–1017. Available at: Accessed 19 Aug 2011.

    Article  Google Scholar 

  • Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13(7), 1443–1471. Available at: Accessed 30 May 2011.

    Article  MATH  Google Scholar 

  • Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Smola, A. J., & Schölkopf, B. (2000). Sparse greedy matrix approximation for machine learning. In Proceedings of the seventeenth International Conference on Machine Learning. ICML’00 (pp. 911–918). San Francisco: Morgan Kaufmann Publishers Inc.. Available at: http://dl.acm.org/citation.cfm?id=645529.657980.

    Google Scholar 

  • Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199–222. Available at: Accessed 30 May 2011.

    Article  MathSciNet  Google Scholar 

  • Smola, A. J., Mangasarian, O. L., & Schölkopf, B. (1999). Sparse kernel feature analysis. Madison: Data Mining Institute.

    Google Scholar 

  • Tessier, J., Duchesne, C., & Bartolacci, G. (2007). A machine vision approach to on-line estimation of run-of-mine ore composition on conveyor belts. Minerals Engineering, 20(12), 1129–1144.

    Article  Google Scholar 

  • Tipping, M. (2001). Sparse kernel principal component analysis. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems (Neural Information Processing Systems 13 (NIPS 2000), pp. 633–639). Cambridge, MA: MIT Press.

    Google Scholar 

  • Vapnik, V. (2006). Transductive inference and semi-supervisedlearning. In O. Chapelle, B. Schölkopf, & A. Zien (Eds.), Semi-supervised learning (pp. 453–472). Cambridge, MA: MIT Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Nomenclature

Nomenclature

Symbol

Description

\( \tilde{\mathrm{x}} \)

Point satisfying all inequality constraints of an optimization problem

\( {K_{ij }} \)

Element of Gram matrix in ith row and jth column

\( {N_{SV }} \)

Number of support vectors in a training data set

\( {R_e}(f) \)

Empirical risk of overfitting function f

\( {f_0}(\mathbf{x}) \)

Objective function in an optimization problem

\( {f_0}(\tilde{\mathbf{x}} ) \)

Objective function value at point where all inequality constraints of optimization problem are satisfied

\( f_0^{*} \)

Optimal value of objective function

\( {m_G}(\mathbf{x},y) \)

Geometrical margin, i.e. the distance of a point x with associated label y from a separating hyperplane defined by parameters w and b

\( {\xi_i} \)

ith slack variable of an optimization problem

C p *

Optimal parameter in a kernel function

C q

Parameter in a kernel function

D

Diameter of sphere enclosing a set of (training) data

D s

Diameter of smallest sphere enclosing a set of (training) data

\( \mathcal{F} \)

Function space, class of functions

f*

Function associated with lowest risk bound

h

Capacity parameter of a model, such as VC dimension

\( \mathcal{H} \)

Feature space

K

Gram matrix

L(y, f(x))

Loss function

M

Dimensionality of input space

m

Margin or shortest distance between two separating hyperplanes

P(x, y)

Joint probability distribution between x and y

Q

Arbitrary number of parameters

δ

Confidence

ξ

Vector of slack variables

ρ

Bias defining location of a hyperplane

σ

Width of Gaussian kernel

\( \varPhi (\mathbf{x}) \)

Mapping function from space to space

C

Covariance matrix of mean-centred data matrix X

v

Arbitrary vector

\( L(\tilde{\mathbf{x}}, \propto ) \)

Lagrangian function value at point where all inequality constraints of optimization problem are satisfied

\( L(\mathbf{x},\propto ) \)

Lagrangian function

N

Number of samples in a training data set

R(f)

Risk of overfitting function f

C(h, N, δ)

Capacity of a model

\( g(\propto ) \)

Lagrangian dual function

\( k(\mathbf{x},{\mathbf{x}}^{\prime}) \)

Kernel function

\( m(\mathbf{w},b) \)

Margin of a separating hyperplane

Principal component score space

Vector space of x

Vector of Lagrangian multipliers

θ

Angle

κ

Sigmoidal kernel parameter

v

Parameter in the optimization of a soft margin classifier

ρ

Parameter in the optimization of a soft margin classifier

\( \vartheta \)

Sigmoidal kernel parameter

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this chapter

Cite this chapter

Aldrich, C., Auret, L. (2013). Statistical Learning Theory and Kernel-Based Methods. In: Unsupervised Process Monitoring and Fault Diagnosis with Machine Learning Methods. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-5185-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5185-2_4

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5184-5

  • Online ISBN: 978-1-4471-5185-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics