Skip to main content

Abstract

Notions of calculus. What the reader should know to understand this chapter \(\bullet \) Notions of calculus. \(\bullet \) Chapters 56, and 7. \(\bullet \) Although the reading of Appendix D is not mandatory, it represents an advantage for the chapter understanding.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    If the input dimensionality is higher than 2, the line has to be replaced with a plane or a hyperplane.

  2. 2.

    The number of solutions is (at least) \(\infty ^1\).

  3. 3.

    The function signum sgn(u) is defined as follows: \(sgn(u)=1\) if \(u>0\); \(sgn(u)=-1\) if \(u<0\); \(sgn(u)=0\) if \(u=0\).

  4. 4.

    This convention is adopted in the rest of the chapter.

  5. 5.

    The term regularization constant is motivated in Sect. 9.3.6.

  6. 6.

    \(\theta (\beta )\) is 1 if \(\beta >0\), 0 otherwise.

  7. 7.

    In [102] the continuity requirement is replaced with the stability.

  8. 8.

    \(\delta _{ij}\) is 1 if \(i=j\), 0 otherwise.

  9. 9.

    \(\mathrm{MATLAB}^{\copyright}\) is a registered trademark of The Mathworks, Inc.

References

  1. M. Aizerman, E. Braverman, and L. Rozonoer. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Controld, 25:821–837, 1964.

    Google Scholar 

  2. F. R. Bach and M. I. Jordan. Learning spectral clustering. Technical report, EECS Department, University of California, 2003.

    Google Scholar 

  3. A. Barla, E. Franceschi, F. Odone, and F. Verri. Image kernels. In Proceedings of SVM2002, pages 83–96, 2002.

    Google Scholar 

  4. A. Ben Hur, D. Horn, H.T. Siegelmann, and V. Vapnik. A support vector method for clustering. In Advances in Neural Information and Processing Systems, volume 12, pages 125–137, 2000.

    Google Scholar 

  5. A. Ben-Hur, D. Horn, H.T. Siegelmann, and V. Vapnik. Support vector clustering. Journal of Machine Learning Research, 2(2):125–137, 2001.

    Google Scholar 

  6. Y. Bengio, O. Dellaleau, N. Le Roux, J.F. Paiement, Vincent. P., and M. Ouimet. Learning eigenfunction links spectral embedding and kernel pca. Neural Computation, 16(10):2197–2219, 2004.

    Google Scholar 

  7. Y. Bengio, Vincent. P., and J.F. Paiement. Spectral clustering and kernel pca are learning eigenfunctions. Technical report, CIRANO, 2003.

    Google Scholar 

  8. C. Berg, J.P.R. Christensen, and P. Ressel. Harmonic analysis on semigroups. Springer-Verlag, 1984.

    Google Scholar 

  9. C.M. Bishop. Neural Networks for Pattern Recognition. Cambridge University Press, 1995.

    Google Scholar 

  10. M. Brand and K. Huang. A unifying theorem for spectral embedding and clustering. In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, 2003.

    Google Scholar 

  11. L. M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7:200–217, 1967.

    Google Scholar 

  12. F. Camastra and A. Verri. A novel kernel method for clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5):801–805, 2005.

    Google Scholar 

  13. N. Cancedda, E. Gaussier, C. Goutte, and J.-M. Renders. Word-sequence kernels. Journal of Machine Learning Research, 3(1):1059–1082, 2003.

    Google Scholar 

  14. S. Canu, Y. Grandvalet, V. Guigue, and A. Rakotomamonjy. SVM and kernel methods Matlab toolbox. Technical report, Perception Systemes et Information, INSA de Rouen, 2005.

    Google Scholar 

  15. Y. Censor. Row-action methods for huge and sparse systems and their applications. SIAM Reviews, 23(4):444–467, 1981.

    Google Scholar 

  16. Y. Censor and A. Lent. An iterative row-action method for interval convex programming. Journal of Optimization Theory and Application, 34(3):321–353, 1981.

    Google Scholar 

  17. P.K. Chan, M. Schlag, and J.Y. Zien. Spectral k-way radio-cut partitioning and clustering. In Proceedings of the 1993 International Symposium on Research on Integrated Systems, pages 123–142. MIT Press, 1993.

    Google Scholar 

  18. J.H. Chiang. A new kernel-based fuzzy clustering approach: support vector clustering with cell growing. IEEE Transactions on Fuzzy Systems, 11(4):518–527, 2003.

    Google Scholar 

  19. F.R.K. Chung. Spectral Graph Theory. American Mathematical Society, 1997.

    Google Scholar 

  20. R. Collobert and S. Bengio. SVMTorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research, 1(2):143–160, 2001.

    Google Scholar 

  21. R. Collobert, S. Bengio, and J. Mariethoz. Torch: a modular machine learning software library. Technical report, IDIAP, 2002.

    Google Scholar 

  22. C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20(3):1–25, 1995.

    Google Scholar 

  23. N. Cressie. Statistics for Spatial Data. John Wiley, 1993.

    Google Scholar 

  24. N. Cristianini, J.S. Taylor, and J. S. Kandola. Spectral kernel methods for clustering. In Advances in Neural Information Processing Systems 14, pages 649–655. MIT Press, 2001.

    Google Scholar 

  25. A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal Royal Statistical Society, 39(1):1–38, 1977.

    Google Scholar 

  26. I.S. Dhillon, Y. Guan, and B. Kullis. Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the \(10^{th}\) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 551–556. ACM Press, 2004.

    Google Scholar 

  27. I.S. Dhillon, Y. Guan, and B. Kullis. A unified view of kernel k-means, spectral clustering and graph partitioning. Technical report, UTCS, 2005.

    Google Scholar 

  28. I.S. Dhillon, Y. Guan, and B. Kullis. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(11):1944–1957, 2007.

    Google Scholar 

  29. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley, 2001.

    Google Scholar 

  30. T. Evgeniou, M. Pontil, and T. Poggio. Regularization networks and support vector machines. Advances in Computational Mathematics, 13(1):1–50, 2001.

    Google Scholar 

  31. P.-H. Fan, R.-E. andChen and C.-J. Lin. Working set selection using the second order information for training SVM. Journal of Machine Learning Research, 6:1889–1918, 2005.

    Google Scholar 

  32. P. Fermat. Methodus ad disquirendam maximam et minimam. In Oeuvres de Fermat. MIT Press, 1891 (First Edition 1679).

    Google Scholar 

  33. M. Ferris and T. Munson. Interior point method for massive support vector machines. Technical report, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, 2000.

    Google Scholar 

  34. M. Ferris and T. Munson. Semi-smooth support vector machines. Technical report, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, 2000.

    Google Scholar 

  35. M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Math. J., 23(98):298–305, 1973.

    MathSciNet  Google Scholar 

  36. M. Filippone, F. Camastra, F. Masulli, and S. Rovetta. A survey of spectral and kernel methods for clustering. Pattern Recognition, 41(1):176–190, 2008.

    Google Scholar 

  37. I. Fischer and I. Poland. New methods for spectral clustering. Technical report, IDSIA, 2004.

    Google Scholar 

  38. R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179–188, 1936.

    Google Scholar 

  39. J. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association, 84(405):165–175, 1989.

    Google Scholar 

  40. T.T. Friess, N. Cristianini, and C. Campbell. The kernel adatron algorithm: a fast and simple learning procedure for support vector machines. In Proceedings of \(15^{th}\) International Conference on Machine Learning, pages 188–196. Morgan Kaufman Publishers, 1998.

    Google Scholar 

  41. K. Fukunaga. An Introduction to Statistical Pattern Recognition. Academic Press, 1990.

    Google Scholar 

  42. T. Gärtner, J.W. Lloyd, and P.A. Flach. Kernels and distances for structured data. Machine Learning, 57(3):205–232, 2004.

    Google Scholar 

  43. M. Girolami. Mercer kernel based clustering in feature space. IEEE Transactions on Neural Networks, 13(3):780–784, 2002.

    Google Scholar 

  44. F. Girosi, M. Jones, and T. Poggio. Regularization theory and neural network architectures. Neural Computation, 7(2):219–269, 1995.

    Google Scholar 

  45. G.H. Golub and C.F.V. Loan. Matrix computation. The Johns Hopkins University Press, 1996.

    Google Scholar 

  46. T. Graepel and K. Obermayer. Fuzzy topographic kernel clustering. In Proceedings of the Fifth GI Workshop Fuzzy Neuro Systems’98, pages 90–97, 1998.

    Google Scholar 

  47. J. Hadamard. Sur les problemes aux derivees partielles et leur signification physique. Bull. Univ. Princeton, 13:49–52, 1902.

    Google Scholar 

  48. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001.

    Google Scholar 

  49. R. Herbrich. Learning Kernel Classifiers: Theory and Algorithms. MIT Press, 2004.

    Google Scholar 

  50. R. Inokuchi and S. Miyamoto. LVQ clustering and SOM using a kernel function. In Proceedings of IEEE International Conference on Fuzzy Systems, pages 367–373, 2004.

    Google Scholar 

  51. T. Joachims. Making large-scale SVM learning practical. In Advances in Kernel Methods, pages 169–184. MIT Press, 1999.

    Google Scholar 

  52. T. Joachims, N. Cristianini, and J. Shawe-Taylor. Composite kernels for hypertext classification. In Proceedings of the \(18^{th}\) International Conference on Machine Learning, pages 250–257. IEEE Press, 2001.

    Google Scholar 

  53. R. Kannan, S. Vempala, and A. Vetta. On clusterings: Good, bad and spectral. In Proceedings of the 41\(^{st}\) Annual Symposium on the Foundation of Computer Science, pages 367–380. IEEE Press, 2000.

    Google Scholar 

  54. A. Karatzoglou, A. Smola, K. Hornik, and A. Zeleis. kernlab- an s4 package for kernel methods in r. Journal of Statistical Software, 11(9):1–20, 2004.

    Google Scholar 

  55. S. Keerthi, S. Shevde, C. Bhattacharyya, and K. Murthy. Improvements to platt’s smo algorithm for SVM classifier design. Technical report, Department of CSA, Bangalore, India,, 1999.

    Google Scholar 

  56. S. Keerthi, S. Shevde, C. Bhattacharyya, and K. Murthy. A fast iterative nearest point algorithm for support vector machine design. IEEE Transaction on Neural Networks, 11(1):124–136, 2000.

    Google Scholar 

  57. B.W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal, 49(1):291–307, 1970.

    Google Scholar 

  58. G.A. Korn and T.M. Korn. Mathematical Handbook for Scientists and Engineers. Mc Graw-Hill, 1968.

    Google Scholar 

  59. R. Krishnapuram and J.M. Keller. A possibilistic approach to clustering. IEEE Transactions on Fuzzy Sets, 1(2):98–110, 1993.

    Google Scholar 

  60. R. Krishnapuram and J.M. Keller. The possibilistic c-means algorithms: insight and recommandations. IEEE Transactions on Fuzzy Sets, 4(3):385–393, 1996.

    Google Scholar 

  61. H.W. Kuhn and A.W. Tucker. Nonlinear programming. In Proceedings of \(2^{nd}\) Berkeley Symposium on Mathematical Statistics and Probabilistics, pages 367–380. University of California Press, 1951.

    Google Scholar 

  62. J.-L. Lagrange. Mecanique analytique. Chez La Veuve Desaint Libraire, 1788.<!– Missing/Wrong Year –>

    Google Scholar 

  63. D. Lee. An improved cluster labeling method for support vector clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3):461–464, 2005.

    Google Scholar 

  64. C. Leslie, E. Eskin, A. Cohen, J. Weston, and A. Noble. Mismatch string kernels for discriminative protein classification. Bioinformatics, 20(4):467–476, 2004.

    Google Scholar 

  65. D. Lueberger. Linear and Nonlinear Programming. Addison-Wesley, 1984.

    Google Scholar 

  66. D. Macdonald and C. Fyfe. The kernel self-organizing map. In Fourth International Conference on Knowledge-based Intelligent Engineering Systems and Allied Technologies, pages 317–320, 2000.

    Google Scholar 

  67. D.J.C. MacKay. A practical bayesian framework for backpropagation networks. Neural Computation, 4(3):448–472, 1992.

    Google Scholar 

  68. O.L. Mangasarian. Linear and non-linear separation of patterns by linear programming. Operations Research, 13(3):444–452, 1965.

    Google Scholar 

  69. O.L. Mangasarian and D. Musicant. Lagrangian support vector regression. Technical report, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, June 2000.

    Google Scholar 

  70. G. Matheron. Principles of geostatistics. Economic Geology, 58:1246–1266, 1963.

    Google Scholar 

  71. M. Meila and J. Shi. Spectral methods for clustering. In Advances in Neural Information Processing Systems 12, pages 873–879. MIT Press, 2000.

    Google Scholar 

  72. S. Mika, G. Rätsch, J. Weston, B. Schölkopf, and K.R. Müller. Fisher discriminant analysis with kernels. In Proceedings of IEEE Neural Networks for Signal Processing Workshop, pages 41–48. IEEE Press, 2001.

    Google Scholar 

  73. M.L. Minsky and S.A. Papert. Perceptrons. MIT Press, 1969.

    Google Scholar 

  74. J. Moody and C. Darken. Fast learning in networks of locally-tuned processing units. Neural Computation, 1(2):281–294, 1989.

    Google Scholar 

  75. R. Neal. Bayesian Learning in Neural Networks. Springer-Verlag, 1996.

    Google Scholar 

  76. A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, pages 849–856. MIT Press, 2002.

    Google Scholar 

  77. E. Osuna, R. Freund, and F. Girosi. An improved training algorithm for support vector machines. In Neural Networks for Signal Processing VII, Proceedings of the 1997 IEEE Workshop, pages 276–285. IEEE Press, 1997.

    Google Scholar 

  78. E. Osuna and F. Girosi. Reducing the run-time complexity in support vector machines. In Advances in Kernel Methods, pages 271–284. MIT Press, 1999.

    Google Scholar 

  79. A. Paccanaro, C. Chennubhotla, J.A. Casbon, and M.A.S. Saqi. Spectral clustering of protein sequences. In Proceedings of International Joint Conference on Neural Networks, pages 3083–3088. IEEE Press, 2003.

    Google Scholar 

  80. J.C. Platt. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods, pages 185–208. MIT Press, 1999.

    Google Scholar 

  81. J.C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin dags for multiclass classification. In Advances in Neural Information Processing Systems 12, pages 547–553. MIT Press, 2000.

    Google Scholar 

  82. T. Poggio and F. Girosi. Networks for approximation and learning. Proceedings of the IEEE, 78(9):1481–1497, 1990.

    Google Scholar 

  83. M. Pontil and A. Verri. Support vector machines for 3-d object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(6):637–646, 1998.

    Google Scholar 

  84. M.J.D. Powell. Radial basis functions for multivariable interpolation: A review. In Algorithms for Approximation, pages 143–167. Clarendon Press, 1987.

    Google Scholar 

  85. A.K. Qinand and P.N. Sugantham. Kernel neural gas algorithms with application to cluster analysis. In iCPR- 17th International Conference on Fuzzy Systems, pages 617–620. Clarendon Press, 2004.

    Google Scholar 

  86. C.E. Rasmussen and C. Willims. Gaussian Processes for Machine Learning. MIT Press, 2006.

    Google Scholar 

  87. K. Rose. Deterministic annealing for clustering, compression, classification, regression, and related optimization problem. Proceedings of the IEEE, 86(11):2210–2239, 1998.

    Google Scholar 

  88. R. Rosipal and M. Girolami. An expectation maximization approach to nonlinear component analysis. Neural Computation, 13(3):505–510, 2001.

    Google Scholar 

  89. V. Roth, J. Laub, M. Kawanabe, and J.M. Buhmann. Optimal cluster preserving embedding of nonmetric proximity data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12):1540–1551, 2003.

    Google Scholar 

  90. B. Schölkopf and A.J. Smola. Learning with Kernels. MIT Press, 2002.

    Google Scholar 

  91. B. Schölkopf, A.J. Smola, and K.R. Muller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299–1319, 1998.

    Google Scholar 

  92. B. Schölkopf, A.J. Smola, and K.R. Muller. Nonlinear component analysis as a kernel eigenvalue problem. Technical report, Max Planck Institut für Biologische Kybernetik, 1998.

    Google Scholar 

  93. B. Schölkopf, R.C. Williamson, A.J. Smola, J. Shawe-Taylor, and J. Platt. Support vector method for novelty detection. In Advances in Neural Information Processing Systems 12, pages 526–532. MIT Press, 2000.

    Google Scholar 

  94. J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.

    Google Scholar 

  95. J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.

    Google Scholar 

  96. D.M.J. Tax and R.P.W. Duin. Support vector domain description. Pattern Recognition Letters, 20(11–13):1191–1199, 1999.

    Google Scholar 

  97. A.N. Tikhonov. On solving ill-posed problem and method of regularization. Dokl. Acad. Nauk USSR, 153:501–504, 1963.

    Google Scholar 

  98. A.N. Tikhonov and V.Y. Arsenin. Solution of ill-posed problems. W.H. Winston, 2002.

    Google Scholar 

  99. I. Tsochantaridis, T. Hoffman, T. Joachims, and Y. Altun. Support vector learning for interdependent and structured output spaces. In Proceedings of ICML04. IEEE Press, 2004.

    Google Scholar 

  100. C.J. Twining and C.J. Taylor. The use of kernel principal component analysis to model data distributions. Pattern Recognition, 36(1):217–227, 2003.

    Google Scholar 

  101. V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, 1995.

    Google Scholar 

  102. V.N. Vapnik. Statistical Learning Theory. John Wiley, 1998.

    Google Scholar 

  103. V.N. Vapnik and A.Ya. Chervonenkis. A note on one class of perceptron. Automation and Remote Control, 25:103–109, 1964.

    Google Scholar 

  104. V.N. Vapnik and A. Lerner. Pattern recognition using generalized portrait method. Automation and Remote Control, 24:774–780, 1963.

    Google Scholar 

  105. S. Vishwanathan and A.J. Smola. Fast kernels for string and tree matching. In Advances in Neural Information Processing Systems 15, pages 569–576. MIT Press, 2003.

    Google Scholar 

  106. U. von Luxburg, M. Belkin, and O. Bosquet. Consistency of spectral clustering. Technical report, Max Planck Institut für Biologische Kybernetik, 2004.

    Google Scholar 

  107. U. von Luxburg, M. Belkin, and O. Bosquet. Limits of spectral clustering. In Advances in Neural Information Processing Systems 17. MIT Press, 2005.

    Google Scholar 

  108. D. Wagner and F. Wagner. Between min cut and graph bisection. In Mathematical Foundations of Kernel Methods, pages 744–750, 1993.

    Google Scholar 

  109. G. Wahba. Spline Models for Observational Data. SIAM, 1990.

    Google Scholar 

  110. J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, and C. Watkins. Support vector density estimation. In Advances in Kernel Methods, pages 293–306. MIT Press, 1999.

    Google Scholar 

  111. J. Weston and C. Watkins. Multi-class support vector machines. In Proceedings of ESANN99, pages 219–224. D. Facto Press, 1999.

    Google Scholar 

  112. C.K.I. Williams and D. Barber. Bayesian classification with Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12):1342–1351, 1998.

    Google Scholar 

  113. W.H. Wolberg and O. Mangasarian. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, U.S.A., 87:9193–9196, 1990.

    Google Scholar 

  114. Z.D. Wu, W.X. Xie, and J.P. Yu. Fuzzy c-means clustering algorithm based on kernel method. In Proceedings of the Fifth International Conference on Computational Intelligence and Multimedia Applications, ICCIMA 2003, pages 49–54. IEEE, 2003.

    Google Scholar 

  115. J. Yang, V. Estvill-Castro, and S.K. Chalup. Support vector clustering through proximity graph modelling. In Neural Information Processing 2002, ICONIP’02, pages 898–903, 2002.

    Google Scholar 

  116. S.X. Yu and J. Shi. Multiclass spectral clustering. In ICCV’03: Proceedings of the Ninth IEEE Conference on Computer Vision. IEEE Computer Society, 2003.

    Google Scholar 

  117. D.-Q. Zhang and S.-C. Chen. Fuzzy clustering using kernel method. In The 2002 International Conference on Control and Automation, pages 162–163, 2002.

    Google Scholar 

  118. D.-Q. Zhang and S.-C. Chen. Kernel based fuzzy and possibilistic c-means clustering. In Proceedings of the Fifth International Conference on Artificial Neural Networks, ICANN 2003, pages 122–125, 2003.

    Google Scholar 

  119. D.-Q. Zhang and S.-C. Chen. A novel kernelized fuzzy c-means algorithms with applications in image segmentation. Artificial Intelligence in Medicine, 32(1):37–50, 2004.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Camastra .

Problems

Problems

9.1

Consider the function \(K: X \times X \rightarrow \mathbb {R}\), where \(X \subseteq \mathbb {R}^n\). Prove that if \(K(\mathbf {x}, \mathbf {y}) = \varPhi ( \mathbf {x}) \cdot \varPhi (\mathbf {y})\) then \(K(\cdot )\) is a Mercer kernel.

9.2

Prove that the Cauchy kernel \(C(\mathbf {x}, \mathbf {y})= \alpha (1 + \Vert \mathbf {x}- \mathbf {y}\Vert ^2)\) is positive definite for \(\alpha > 0\). (Hint: Read Appendix D).

9.3

Prove that the Epanechnikov kernel , defined by

$$\begin{aligned} E(x,y)= 0.75(1- \Vert \mathbf {x}- \mathbf {y} \Vert ^2)\mathbf {I}(\Vert \mathbf {x}- \mathbf {y} \Vert \le 1) \end{aligned}$$
(9.251)

is conditionally positive definite . (Hint: Read Appendix D).

9.4

Prove that the optimal hyperplane is unique.

9.5

Consider the SMO algorithm for classification. What is the minimum number of Lagrange multipliers which can be optimized in an iteration? Explain your answer.

9.6

Consider the SMO algorithm for classification. Show that in the case of unconstrained maximum we obtain the following updating rule

$$\begin{aligned} \alpha _2(t+1)= \alpha _2(t) -\frac{y_2(E_1-E_2)}{2K(\mathbf {x}_1,\mathbf {x}_2)- K(\mathbf {x}_1, \mathbf {x}_1) - K(\mathbf {x}_2, \mathbf {x}_2)} \end{aligned}$$
(9.252)

where \(E_i = f(\mathbf {x}_i - y_i) \).

9.7

Consider the data Set A of the SantaFe time series competition. Using a public domain SVM regression package and the four preceeding values of the time series as input, predict the actual value of the time series. The data set A can be downloaded from http://www-psych.stanford.edu/~andreas/Time-Series/SantaFe.html. Implement a Gaussian process for regression and repeat the exercise replacing SVM with the Gaussian process. Discuss the results.

9.8

Using the o-v-r method and a public domain SVM binary classifier (e.g., SVMLight or SVMTorch), test a multiclass SVM on Iris Data [38] that can be dowloaded by ftp.ics.uci.edu/pub/machine-learning-databases/iris. Repeat the same experiment replacing the o-v-r method with the o-v-o strategy. Discuss the results.

9.9

Implement kernel PCA and test it on a dataset (e.g. Iris Data). Use as Mercer kernel the Gaussian and verify the Twining and Taylor’s result [100], that is, that for large values of the variance the kernel PCA eigenspectrum tends to PCA eigenspectrum.

9.10

Consider one-class SVM. Prove there are no bounded support vector when the regularization constant C is equal to 1.

9.11

Implement Kernel K-Means and test your implementation on a dataset (e.g. Iris Data). Verify that when you choose as Mercer kernel the inner product you obtain the same results of batch K-Means.

9.12

Implement the Ng-Jordan algorithm using a mathematical toolbox. Test your implementation on Iris data. Compare your results with the ones reported in [12].

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag London

About this chapter

Cite this chapter

Camastra, F., Vinciarelli, A. (2015). Kernel Methods. In: Machine Learning for Audio, Image and Video Analysis. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-4471-6735-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-6735-8_9

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-6734-1

  • Online ISBN: 978-1-4471-6735-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics