Skip to main content

Compactness Hypothesis, Potential Functions, and Rectifying Linear Space in Machine Learning

  • Chapter
  • First Online:
Braverman Readings in Machine Learning. Key Ideas from Inception to Current State

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11100))

Abstract

Emmanuel Braverman was one of the very few thinkers who, during his extremely short life, managed to inseminate several seemingly completely different areas of science. This paper overviews one of the knowledge areas he essentially affected in the sixties years of the last century, namely, the area of Machine Learning. Later, Vladimir Vapnik proposed a more engineering-oriented name of this knowledge area – Estimation of Dependencies Based on Empirical Data. We shall consider these titles as synonyms. The aim of the paper is to briefly trace the way how three notions introduced by Braverman formed the core of the contemporary Machine Learning doctrine. These notions are: (1) compactness hypothesis, (2) potential function, and (3) the rectifying linear space, in which the former two have resulted. There will be little new in this paper. Almost all the constructions we are going to speak about had been published by numerous scientists. The novelty is, perhaps, only in that all these issues will be systematically considered together as immediate consequences of Braveman’s basic principles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The latter developments are a particular case of a more general relational approach to dependence estimation, which allows for asymmetric comparison functions [16, 17].

  2. 2.

    Otherwise, we would be forced to assume that the distance is a metric, and the respective metric space is separable, i.e., contains a countable everywhere dense subset. All the mathematical construction would become much more complicated without any gain in generalization of the resulting class of dependence models.

References

  1. Braverman, E.M.: Experiments on machine learning to recognize visual patterns. Autom. Remote Control 23, 315–327 (1962). Translated from Russian Autimat. i Telemekh. 23, 349–364 (1962)

    Google Scholar 

  2. Arkadʹev, A.G., Braverman, E.M.: Computers and Pattern Recognition. Thompson Book Company, Washington (1967). 115 p.

    Google Scholar 

  3. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  4. Vapnik, V.: Estimation of Dependences Based on Empirical Data. Springer, New York (1982). https://doi.org/10.1007/0-387-34239-7

    Book  MATH  Google Scholar 

  5. Duin, R.P.W.: Compactness and complexity of pattern recognition problems. In: Proceedings of International Symposium on Pattern Recognition “In Memoriam Pierre Devijver”, Brussels, B, 12 February, Royal Military Academy, pp. 124–128 (1999)

    Google Scholar 

  6. Aizerman, M., Braverman, E., Rozonoer, L.: Theoretical foundations of the potential function method in pattern recognition learning. Autom. Remote Control 25, 917–936 (1964)

    Google Scholar 

  7. Mercer, J.: Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. A 209, 415–446 (1909)

    Article  Google Scholar 

  8. Goldfarb, L.: A unified approach to pattern recognition. Pattern Recogn. 17, 575–582 (1984)

    Article  MathSciNet  Google Scholar 

  9. Goldfarb, L.: A New Approach to Pattern Recognition. Progress in Pattern Recognition, Elsevier Science Publishers BV 2, 241–402 (1985)

    MathSciNet  MATH  Google Scholar 

  10. Pękalska, E., Duin, R.P.W.: Dissimilarity representations allow for building good classifiers. Pattern Recogn. Lett. 23(8), 943–956 (2002)

    Article  Google Scholar 

  11. Pekalska, E., Duin, R.P.W.: The Dissimilarity Representation for Pattern Recognition: Foundations and Applications. World Scientific Publishing Co. Inc., River Edge (2005)

    Book  Google Scholar 

  12. Haasdonk, B., Pekalska, E.: Indefinite kernel Fisher discriminant. In: Proceedings of the 19th International Conference on Pattern Recognition, Tampa, USA, 8–11 December 2008

    Google Scholar 

  13. Duin, R.P.W., Pękalska, E.: Non-Euclidean dissimilarities: causes and informativeness. In: Hancock, E.R., Wilson, R.C., Windeatt, T., Ulusoy, I., Escolano, F. (eds.) SSPR/SPR 2010. LNCS, vol. 6218, pp. 871–880. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14980-1_31

    Chapter  Google Scholar 

  14. Haasdonk, B.: Feature space interpretation of SVMs with indefinite kernels. TPAMI 25, 482–492 (2005)

    Article  Google Scholar 

  15. Pękalska, E., Harol, A., Duin, R.P.W., Spillmann, B., Bunke, H.: Non-Euclidean or non-metric measures can be informative. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) SSPR/SPR 2006. LNCS, vol. 4109, pp. 871–880. Springer, Heidelberg (2006). https://doi.org/10.1007/11815921_96

    Chapter  Google Scholar 

  16. Duin, R., Pekalska, E., De Ridder, D.: Relational discriminant analysis. Pattern Recogn. Lett. 20, 1175–1181 (1999)

    Article  Google Scholar 

  17. Maria-Florina Balcan, M.-F., Blum, A., Srebro, N.: A theory of learning with similarity functions. Mach. Learn. 72, 89–112 (2008)

    Article  Google Scholar 

  18. Nelder, J., Wedderburn, R.: Generalized linear models. J. Roy. Stat. Soc. Ser. A (Gen.) 135(3), 370–384 (1972)

    Article  Google Scholar 

  19. McCullagh, P., Nelder, J.: Generalized Linear Models, 511 p., 2nd edn. Chapman and Hall, London (1989)

    Chapter  Google Scholar 

  20. Mottl, V., Krasotkina, O., Seredin, O., Muchnik, I.: Principles of multi-kernel data mining. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 52–61. Springer, Heidelberg (2005). https://doi.org/10.1007/11510888_6

    Chapter  Google Scholar 

  21. Tatarchuk, A., Urlov, E., Mottl, V., Windridge, D.: A support kernel machine for supervised selective combining of diverse pattern-recognition modalities. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS, vol. 5997, pp. 165–174. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12127-2_17

    Chapter  Google Scholar 

  22. Gonen, M., Alpaydın, E.: Multiple kernel learning algorithms. J. Mach. Learn. Res. 12, 2211–2268 (2011)

    MathSciNet  MATH  Google Scholar 

  23. Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  24. Deza, M., Deza, E.: Encyclopedia of Distances. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-642-00234-2

    Book  MATH  Google Scholar 

  25. Azizov, T.Y., Iokhvidov, I.S.: Linear Operators in Spaces with an Indefinite Metric. Wiley, Chichester (1989)

    MATH  Google Scholar 

  26. Langer, H.: Krein space. In: Hazewinkel, M. (ed.) Encyclopaedia of Mathematics (set). Springer, Netherlands (1994)

    Google Scholar 

  27. Ong, C.S., Mary, X., Canu, S., Smola, A.: Learning with non-positive kernels. In: Proceedings of the Twenty-First International Conference on Machine learning, ICML 2004, Banff, Alberta, Canada, 04–08 July 2004

    Google Scholar 

  28. Bugrov, S., Nikolsky, S.M.: Fundamentals of Linear Algebra and Analytical Geometry. Mir, Moscow (1982)

    MATH  Google Scholar 

  29. Vapnik, V.: The Nature of Statistical Learning Theory. Information Science and Statistics. Springer, New York (2000). https://doi.org/10.1007/978-1-4757-3264-1

    Book  MATH  Google Scholar 

  30. Guyon, I., Vapnik, V.N., Boser, B.E., Bottou, L., Solla, S.A.: Structural risk minimization for character recognition. In: Advances in Neural Information Processing Systems, vol. 4. Morgan Kaufman, Denver (1992)

    Google Scholar 

  31. Wilson, J.R., Lorenz, K.A.: Short history of the logistic regression model. Modeling Binary Correlated Responses using SAS, SPSS and R. IBSS, vol. 9, pp. 17–23. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23805-0_2

    Chapter  MATH  Google Scholar 

  32. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learning 20, 273–297 (1995)

    MATH  Google Scholar 

  33. Tikhonov, A.N.: On the stability of inverse problems. Dokl. Akad. Nauk SSSR 39(5), 195–198 (1943)

    MathSciNet  Google Scholar 

  34. Tikhonov, A.N.: Solution of incorrectly formulated problems and the regularization method. Sov. Math. 4, 1035–1038 (1963)

    MATH  Google Scholar 

  35. Tikhonov, A.N., Arsenin, V.Y.: Solution of Ill-Posed Problems. Winston & Sons, Washington (1977)

    MATH  Google Scholar 

  36. Hoerl, A.E., Kennard, D.J.: Application of ridge analysis to regression problems. Chem. Eng. Prog. 58, 54–59 (1962)

    Google Scholar 

  37. Vinod, H.D., Ullah, A.: Recent advances in regression methods, vol. 41. In: Statistics: Textbooks and Monographs. Marcel Dekker Inc., New York (1981)

    Google Scholar 

  38. Mottl, V., Dvoenko, S., Seredin, O., Kulikowski, C., Muchnik, I.: Featureless pattern recognition in an imaginary Hilbert space and its application to protein fold classification. In: Perner, P. (ed.) MLDM 2001. LNCS (LNAI), vol. 2123, pp. 322–336. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44596-X_26

    Chapter  MATH  Google Scholar 

  39. Frank, I.E., Friedman, J.H.: A statistical view of some chemometrics regression tools. Technometrics 35, 109–148 (1993)

    Article  Google Scholar 

  40. Fu, W.J.: Penalized regression: the bridge versus the LASSO. J. Comput. Graph. Stat. 7, 397–416 (1998)

    MathSciNet  Google Scholar 

  41. Mottl, V., Seredin, O., Krasotkina, O., Muchnik, I.: Fusion of Euclidean metrics in featureless data analysis: an equivalent of the classical problem of feature selection. Pattern Recogn. Image Anal. 15(1), 83–86 (2005)

    Google Scholar 

  42. Mottl, V., Seredin, O., Krasotkina, O., Mochnik, I.: Kernel fusion and feature selection in machine learning. In: Proceedings of the 8th IASTED International Conference on Intelligent Systems and Control, Cambridge, USA, 31 October–2 November, 2005, pp. 477–482

    Google Scholar 

  43. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. 67, 301–320 (2005)

    Article  MathSciNet  Google Scholar 

  44. Wang, L., Zhu, J., Zou, H.: The doubly regularized support vector machine. Statistica Sinica 16, 589–615 (2006)

    MathSciNet  MATH  Google Scholar 

  45. Tibshirani, R.J.: Regression shrinkage and selection via the LASSO. J. Roy. Stat. Soc. Ser. B 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  46. Tibshirani, R.J.: The LASSO method for variable selection in the Cox model. Stat. Med. 16, 385–395 (1997)

    Article  Google Scholar 

  47. Tatarchuk, A., Mottl, V., Eliseyev, A., Windridge, D.: Selectivity supervision in combining pattern-recognition modalities by feature- and kernel-selective support vector machines. In: Proceedings of the 19th International Conference on Pattern Recognition, ICPR-2008, vol. 1–6, pp. 2336–2339 (2008)

    Google Scholar 

  48. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. Theor. Methods 96(456), 1348–1360 (2001)

    Article  MathSciNet  Google Scholar 

  49. Krasotkina, O., Mottl, V.A.: Bayesian approach to sparse Cox regression in high-dimensional survival analysis. In: Proceedings of the 11th International Conference on Machine Learning and Data Mining (MLDM 2015), Hamburg, Germany, 20–23 July 2015, pp. 425–437

    Chapter  Google Scholar 

  50. Krasotkina, O., Mottl, V.A.: Bayesian approach to sparse learning-to-rank for search engine optimization. In: Proceedings of the 11th International Conference on Machine Learning and Data Mining (MLDM 2015), Hamburg, Germany, 20–23 July 2015, pp. 382–394

    Chapter  Google Scholar 

  51. Tatarchuk, A., Sulimova, V., Windridge, D., Mottl, V., Lange, M.: Supervised selective combining pattern recognition modalities and its application to signature verification by fusing on-line and off-line kernels. In: Benediktsson, J.A., Kittler, J., Roli, F. (eds.) MCS 2009. LNCS, vol. 5519, pp. 324–334. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02326-2_33

    Chapter  Google Scholar 

  52. Razin, N., et al.: Application of the multi-modal relevance vector machine to the problem of protein secondary structure prediction. In: Shibuya, T., Kashima, H., Sese, J., Ahmad, S. (eds.) PRIB 2012. LNCS, vol. 7632, pp. 153–165. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34123-6_14

    Chapter  Google Scholar 

  53. Tatarchuk, A., Sulimova, V., Torshin, I., Mottl, V., Windridge, D.: Supervised selective kernel fusion for membrane protein prediction. In: Comin, M., Käll, L., Marchiori, E., Ngom, A., Rajapakse, J. (eds.) PRIB 2014. LNCS, vol. 8626, pp. 98–109. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09192-1_9

    Chapter  Google Scholar 

  54. Berg, C., Christensen, J.P.R., Ressel, P.: Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions. Springer, New York (1984). https://doi.org/10.1007/978-1-4612-1128-0

    Book  MATH  Google Scholar 

Download references

Acknowledgements

We would like to acknowledge support from grants of the Russian Foundation for Basic Research 14-07-00527, 16-57-52042, 17-07-00436, 17-07-00993, 18-07-01087, 18-07-00942, and from Tula State University within the framework of the scientific project № 2017-62PUBL.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vadim Mottl .

Editor information

Editors and Affiliations

Appendix. The Proofs of Theorems

Appendix. The Proofs of Theorems

Proof of Theorem 1.

Let \( \upomega^{\prime } ,\upomega^{\prime \prime } ,\upomega^{\prime \prime \prime } \) be three arbitrary objects. Due to (10), \( h \ge \mathop {\sup }\limits_{{\widetilde{\upomega}^{\prime } = \widetilde{\upomega}^{\prime \prime } = \widetilde{\upomega}^{\prime \prime \prime } \in\Omega }} \left\{ {\uprho(\widetilde{\upomega}^{\prime } ,\widetilde{\upomega}^{\prime \prime \prime } )\, - } \right. \) \( - \,[\uprho(\widetilde{\upomega}^{\prime } ,\widetilde{\upomega}^{\prime \prime } ) +\uprho(\widetilde{\upomega}^{\prime \prime } ,\widetilde{\upomega}^{\prime \prime \prime } )\left. ] \right\} = 0 \), thus, \( r(\upalpha,\upbeta) \) is nonnegative \( r(\upalpha,\upbeta) =\uprho(\upalpha,\upbeta) + h \ge 0 \). Since \( r(\upomega^{\prime } ,\upomega^{\prime \prime } ) =\uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) + h \), we have \( r(\upomega^{\prime } ,\upomega^{\prime \prime } ) + r(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) - r(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) = \left[ {\uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) + h} \right] + \) \( \left[ {\uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) + h} \right] - \left[ {\uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) + h} \right] =\uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) +\uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) -\uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) + h \). Here, on the force of (10), \( \uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) +\uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) -\uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) \ge - h \), and \( \uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) +\uprho(\upomega^{\prime \prime } ,\upomega^{\prime \prime \prime } ) -\uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) + h \ge \) \( - h + \,h = 0 \), whence it follows that \( r(\upomega^{\prime } ,\upomega^{\prime \prime } ) + r(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) - r(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) \ge 0 \). ■

Proof of Theorem 2.

Let \( \left\{ {\upomega_{1} , \ldots ,\upomega_{M} } \right\} \subset\Omega \) be a finite collection of objects within a distance space, \( \upphi =\upomega_{k} \) be chosen as the center, and \( {\mathbf{K}}_{\upphi} = {\mathbf{K}}_{{\upomega_{k} }} = \left[ {K_{{\upomega_{k} }} (\upomega_{i} ,\upomega_{j} ),\;i,j = 1, \ldots ,M} \right] \) be the matrix formed by similarity function (11). Let, further, another object be assigned as a new center \( {\upvarphi}^{{\prime}} =\upomega_{l} \) and \( {\mathbf{K}}_{{\upphi^{\text{'}} }} = {\mathbf{K}}_{{\upomega_{l} }} = \left[ {K_{{\upomega_{l} }} (\upomega_{i} ,\upomega_{j} ),\;i,j = 1, \ldots ,M} \right] \) be another similarity matrix. In accordance with (14),

$$ K_{{\upphi^{\text{'}} }} (\upomega^{\text{'}} ,\upomega^{{\text{''}}} ) = K_{{\upomega_{l} }} (\upomega_{i} ,\upomega_{j} ) = K_{{\upomega_{k} }} (\upomega_{i} ,\upomega_{j} ) - K_{{\upomega_{k} }} (\upomega_{i} ,\upomega_{l} ) - K_{{\upomega_{k} }} (\upomega_{j} ,\upomega_{l} ) + K_{{\upomega_{k} }} (\upomega_{l} ,\upomega_{l} ). $$

Let notation \( {\mathbf{E}}_{{\upomega_{k} \to\upomega_{l} }} (M \times M) \) stand for matrices in which all the elements are zeros except the \( l \) th row of units \( (a_{li} = 1,\;i = 1, \ldots ,M) \) and one additional unit element \( a_{kl} = 1 \):

$$ \begin{aligned} {\mathbf{E}}_{{\upomega_{k} \to\upomega_{l} }} = \left( {\begin{array}{*{20}c} 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \end{array} } \right)\begin{array}{*{20}c} 1 \\ \vdots \\ k \\ \vdots \\ l \\ \vdots \\ M \\ \end{array} \hfill \\ \, \begin{array}{*{20}c} {1\,} & \cdots & {k\,} & \cdots & {\,l} & { \cdots \,} & M \\ \end{array} \hfill \\ \end{aligned} $$

It is clear that the matrices \( {\mathbf{S}}_{{\upomega_{k} \to\upomega_{l} }} = {\mathbf{I}} - {\mathbf{E}}_{{\upomega_{k} \to\upomega_{l} }} \) are nondegenerate.

Let us consider two quadratic forms \( {\mathbf{x}}^{T} {\mathbf{K}}_{{\upomega_{k} }} {\mathbf{x}} \) and \( {\mathbf{y}}^{T} {\mathbf{K}}_{{\upomega_{l} }} {\mathbf{y}} \), \( {\mathbf{x}},{\mathbf{y}} \in {\mathbf{\mathbb{R}}}^{M} \). Here

$$ {\mathbf{y}}^{T} {\mathbf{K}}_{{\upomega_{l} }} {\mathbf{y}} = {\mathbf{y}}^{T} {\mathbf{S}}_{{\upomega_{k} \to\upomega_{l} }}^{T} {\mathbf{K}}_{{\upomega_{k} }} {\mathbf{S}}_{{\upomega_{k} \to\upomega_{l} }} {\mathbf{y}} = \left( {{\mathbf{S}}_{{\upomega_{k} \to\upomega_{l} }} {\mathbf{y}}} \right)^{T} {\mathbf{K}}_{{\upomega_{k} }} \left( {{\mathbf{S}}_{{\upomega_{k} \to\upomega_{l} }} {\mathbf{y}}} \right). $$

As we see, the quadratic forms coincide after the one-to-one substitution \( {\mathbf{x}} = {\mathbf{S}}_{{\upomega_{k} \to\upomega_{l} }} {\mathbf{y}} \). In accordance with Sylvester’s law of inertia for quadratic forms, the numbers of positive, negative, and zero eigenvalues of matrices \( {\mathbf{K}}_{{\upomega_{k} }} \) and \( {\mathbf{K}}_{{\upomega_{l} }} \) coincide, so, their signatures coincide, too. ■

Proof of Theorem 3.

Proof is based on the following lemma [54, p. 74].

Lemma 3.1.

Function \( K_{\upphi} (\upomega^{\prime } ,\upomega^{\prime \prime } ) = \tilde{K}(\upomega^{\prime } ,\upomega^{\prime \prime } ) - \tilde{K}(\upomega^{\prime } ,\upphi) - \tilde{K}(\upomega^{\prime \prime } ,\upphi) + \tilde{K}(\upphi,\upphi) \), \( \upomega^{\prime } ,\upomega^{\prime \prime } , \) \( \upphi \in\Omega \), is kernel if and only if \( \tilde{K}(\upvartheta^{\prime } ,\upvartheta^{\prime \prime } ) \), \( \upvartheta^{\prime } ,\upvartheta^{\prime \prime } \in\Omega \), is conditional kernel – matrix \( \left[ {\tilde{K}(\upvartheta_{k} ,\upvartheta_{l} ),\,k,l = 1, \ldots ,M} \right] \) is conditionally positive definite (30) for any finite set \( \{\upvartheta_{1} , \ldots ,\upvartheta_{M} \} \).

Proof of the Theorem.

In notations of this paper, the conditionally positive kernel is defined by the proto-Euclidean metric \( \tilde{K}(\upvartheta^{\prime } ,\upvartheta^{\prime \prime } ) = -\uprho^{2} (\upvartheta^{\prime } ,\upvartheta^{\prime \prime } ) \). The substitution of this equality in the assertion of Lemma 3.1 yields \( K_{\upphi} (\upomega^{\prime } ,\upomega^{\prime \prime } ) = -\uprho^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } ) +\uprho^{2} (\upomega^{\prime } ,\upphi) + \) \( \uprho^{2} (\upomega^{\prime \prime } ,\upphi) +\uprho^{2} (\upphi,\upphi) \). Since the last summand is zero, we have \( K_{\upphi} (\upomega_{i} ,\upomega_{j} ) = \) \( (1/2)\left[ {\uprho^{2} (\upomega_{i} ,\upphi) + } \right.\uprho^{2} (\upomega_{j} ,\upphi)\left. { -\uprho^{2} (\upomega_{i} ,\upomega_{j} )} \right] \). ■

Proof of Theorem 4.

In accordance with (88), we have in the case (89)

$$ \begin{aligned} & \mathop {\lim }\limits_{{\upalpha \to 0}}\uprho^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } |\upalpha) = \mathop {\lim }\limits_{{\upalpha \to 0}} \frac{{\left[ {1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)} \right] -\upalpha\left[ {1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)} \right]}}{\upalpha} = \\ & \mathop {\lim }\limits_{{\upalpha \to 0}} \left\{ {\frac{{1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)}}{\upalpha} - \left[ {1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)} \right]} \right\} = \mathop {\lim }\limits_{{\upalpha \to 0}} \frac{{1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)}}{\upalpha} - \\ & \mathop {\lim }\limits_{{\upalpha \to 0}} \left[ {1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)} \right] = \mathop {\lim }\limits_{{\upalpha \to 0}} \frac{{1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)}}{\upalpha} = \frac{0}{0}, \\ \end{aligned} $$
$$ \begin{aligned} & \mathop {\lim }\limits_{{\upalpha \to 0}}\uprho^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } |\upalpha) = \mathop {\lim }\limits_{{\upalpha \to 0}} {\kern 1pt} \frac{{({\partial }/{\partial }\,\upalpha)\left[ {1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)} \right]}}{{({\partial }/{\partial }\,\upalpha)\upalpha}} = \mathop {\lim }\limits_{{\upalpha \to 0}} \frac{{\partial }}{{{\partial }\,\upalpha}}\left[ {1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)} \right]{\kern 1pt} = \\ & - \mathop {\lim }\limits_{{\upalpha \to 0}} \frac{{\partial }}{{{\partial }\,\upalpha}}\exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right) = - {\kern 1pt} \mathop {\lim }\limits_{{\upalpha \to 0}} \left[ {\left( { -\uprho^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)\exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)} \right] =\uprho^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } ). \\ \end{aligned} $$

Equality (90) immediately follows from (88). ■

Proof of Theorem 5.

Let \( \upomega_{1} ,\upomega_{2} ,\upomega_{3} \in\Omega \) be three objects, \( \uprho_{12} =\uprho(\upomega_{1} ,\upomega_{2} ) \), \( \uprho_{23} =\uprho(\upomega_{2} ,\upomega_{3} ) \), \( \uprho_{13} =\uprho(\upomega_{1} ,\upomega_{3} ) \), and let \( \uprho_{12} \le\uprho_{23} \). Under notations (88) \( \uprho(\upomega^{\prime } ,\upomega^{\prime \prime } |\upalpha) =\uprho_{\upalpha} \left( {\uprho(\upomega^{\prime } ,\upomega^{\prime \prime } )} \right) \) and \( \uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) =\uprho \), the function \( \uprho_{\upalpha} (\uprho) \) is concave and increasing \( ({\partial }/{\partial }\,\uprho)\,\uprho_{\upalpha} (\uprho) > 0 \). Thus, the following inequalities hold:

$$ \begin{aligned} &\uprho_{\upalpha} (\uprho_{12} +\uprho_{23} ) \le\uprho_{\upalpha} (\uprho_{23} ) + \left[ {({\partial }/{\partial }\,\uprho)\uprho_{\upalpha} (\uprho)} \right]_{{\,\uprho_{23} }}\uprho_{12} ,\;\;\left[ {({\partial }/{\partial }\,\uprho)\uprho_{\upalpha} (\uprho)} \right]_{{\,\uprho_{23} }} \le \left[ {({\partial }/{\partial }\,\uprho)\uprho_{\upalpha} (\uprho)} \right]_{{\,\uprho_{12} }} , \\ & \left[ {({\partial }/{\partial }\,\uprho)\uprho_{\upalpha} (\uprho)} \right]_{{\,\uprho_{12} }}\uprho_{12} \le\uprho_{\upalpha} (\uprho_{12} ). \\ \end{aligned} $$

Here all the derivatives are positive, so, \( \uprho_{\upalpha} (\uprho_{12} +\uprho_{23} ) \le\uprho_{\upalpha} (\uprho_{12} ) +\uprho_{\upalpha} (\uprho_{23} ) \). Since the original metric satisfied the triangle inequality \( \uprho_{13} \le\uprho_{12} +\uprho_{23} \), we have \( \uprho_{\upalpha} (\uprho_{13} )\, \le \) \( \uprho_{\upalpha} (\uprho_{12} ) +\uprho_{\upalpha} (\uprho_{23} ) \). ■

Proof of Theorem 6.

It is enough to prove that for any finite set of objects \( \left\{ {\upomega_{1} , \ldots ,\upomega_{n} } \right\} \subset\Omega \) and any center \( \upphi \subset\Omega \) the matrix

$$ {\mathbf{K}}_{\upphi} (\upalpha) = \frac{1}{2}\left[ {\uprho^{2} (\upomega_{i} ,\upphi|\upalpha) +\uprho^{2} (\upomega_{j} ,\upphi|\upalpha) -\uprho^{2} (\upomega_{i} ,\upomega_{j} |\upalpha),\;i,j = 1, \ldots ,n} \right] $$

is positive definite if matrix

$$ {\mathbf{K}}_{\upphi} = \frac{1}{2}\left[ {\uprho^{2} (\upomega_{i} ,\upphi) +\uprho^{2} (\upomega_{j} ,\upphi) -\uprho^{2} (\upomega_{i} ,\upomega_{j} ),\;i,j = 1, \ldots ,n} \right] $$
(103)

is positive definite.

In accordance with Lemma 3.1 (proof of Theorem 3), it is necessary and sufficient for positive definiteness of matrix (103) that matrix

$$ {\mathbf{B}}(\upalpha) = \left[ { -\uprho^{2} (\upomega_{i} ,\upomega_{j} |\upalpha),\;i,j = 1, \ldots ,n} \right] $$

would be conditionally positive definite.

At the same time, due to (85), \( \uprho^{2} (\upomega_{i} ,\upomega_{j} |\upalpha){ \propto }1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega_{i} ,\upomega_{j} )} \right) \), so,

$$ {\mathbf{B}}(\upalpha){ \propto }{\mathbf{C}}(\upalpha) - {\mathbf{D}},{\mathbf{D}} = \left( {\begin{array}{*{20}c} 1 & 1 & \cdots & 1 \\ 1 & 1 & \cdots & 1 \\ \vdots & \vdots & \ddots & \vdots \\ 1 & 1 & \cdots & 1 \\ \end{array} } \right) = {\mathbf{11}}^{\text{T}} , $$
(104)
$$ {\mathbf{C}}(\upalpha) = \left( {\begin{array}{*{20}c} {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{1} ,\upomega_{1} )} \right)} & {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{1} ,\upomega_{2} )} \right)} & \cdots & {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{1} ,\upomega_{n} )} \right)} \\ {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{2} ,\upomega_{1} )} \right)} & {\exp \left( { -\upalpha \uprho ^{2} (\omega_{2} ,\omega_{2} )} \right)} & \cdots & {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{2} ,\upomega_{n} )} \right)} \\ {} & \vdots & \ddots & \vdots \\ {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{n} ,\upomega_{1} )} \right)} & {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{n} ,\upomega_{2} )} \right)} & \cdots & {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{n} ,\upomega_{n} )} \right)} \\ \end{array} } \right). $$
(105)

Here matrix \( {\mathbf{C}}(\upalpha) \) is always positive definite for any proto-Euclidean metric \( \uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) \) (103) on the force of Mercer’s theorem (Sect. 1.4), i.e., \( {\varvec{\upbeta}}^{\text{T}} {\mathbf{C}}(\upalpha){\varvec{\upbeta}}\,{\mathbf{ > }}0 \) if \( {\varvec{\upbeta}} \ne {\mathbf{0}} \in {\mathbf{\mathbb{R}}}^{n} \).

Let us consider the quadratic function \( q({\varvec{\upbeta}}|\upalpha) = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}(\upalpha){\varvec{\upbeta}} = {\varvec{\upbeta}}^{\text{T}} \left( {{\mathbf{C}}(\upalpha) - {\mathbf{D}}} \right){\varvec{\upbeta}} = \) \( {\varvec{\upbeta}}^{\text{T}} \left( {{\mathbf{C}}(\upalpha) - {\mathbf{11}}^{\text{T}} } \right){\varvec{\upbeta}} = {\varvec{\upbeta}}^{\text{T}} {\mathbf{C}}(\upalpha){\varvec{\upbeta}} - ({\varvec{\upbeta}}^{\text{T}} {\mathbf{1}})({\mathbf{1}}^{\text{T}} {\varvec{\upbeta}}) \) on the hyperplane \( {\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0 \). It is clear that \( \left. {q({\varvec{\upbeta}}|\upalpha)} \right|_{{{\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0}} = {\varvec{\upbeta}}^{\text{T}} {\mathbf{C}}(\upalpha){\varvec{\upbeta}}\,{\mathbf{ > }}0 \). Thus, matrix \( {\mathbf{B}}(\upalpha) \) is conditionally positive definite. ■

Proof of Theorem 7.

Proof is facilitated by the following two Lemmas.

Lemma 7.1.

Matrix \( {\mathbf{B}} \) has the main eigenvalues \( \uplambda_{1} = \ldots =\uplambda_{n - 1} = 1 \) of multiplicity \( n - 1 \), the last one \( \uplambda_{n} {\kern 1pt} = \, - \,(n\, - \,1) \) of multiplicity 1, the main eigenvectors \( {\mathbf{z}}_{i} ,i = 1, \ldots ,n - 1 \) with zero sums of elements \( {\mathbf{1}}^{\text{T}} {\mathbf{z}}_{i} = 0 \), \( i = 1, \ldots ,n \), and the last eigenvector \( {\mathbf{z}}_{n} = {\mathbf{1}} \in {\mathbf{\mathbb{R}}}^{n} . \)

$$ {\mathbf{B}} = \left( {\begin{array}{*{20}c} 0 & { - {\kern 1pt} 1} & \cdots & { - {\kern 1pt} 1} \\ { - {\kern 1pt} 1} & 0 & \cdots & { - {\kern 1pt} 1} \\ \cdots & \cdots & \ddots & \cdots \\ { - {\kern 1pt} 1} & { - {\kern 1pt} 1} & \cdots & 0 \\ \end{array} } \right) $$
(106)

Indeed, the main eigenvalues and eigenvectors meet the equalities

$$ \left( {\begin{array}{*{20}c} 0 & { - 1} & \cdots & { - 1} \\ { - 1} & 0 & \cdots & { - 1} \\ \cdots & \cdots & \ddots & \cdots \\ { - 1} & { - 1} & \cdots & 0 \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {z_{1} } \\ {z_{2} } \\ \vdots \\ {z_{n} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} { - \sum\nolimits_{j = 1,j \ne 1}^{n} {z_{j} } } \\ { - \sum\nolimits_{j = 1,j \ne 2}^{n} {z_{j} } } \\ \vdots \\ { - \sum\nolimits_{j = 1,j \ne n}^{n} {z_{j} } } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {z_{1} } \\ {z_{2} } \\ \vdots \\ {z_{n} } \\ \end{array} } \right),\;\;\left( {\begin{array}{*{20}c} {z_{1} = - \sum\nolimits_{j = 1,j \ne 1}^{n} {z_{j} } = - \sum\nolimits_{j = 1}^{n} {z_{j} } + z_{1} } \\ {z_{2} = - \sum\nolimits_{j = 1,j \ne 2}^{n} {z_{j} } = - \sum\nolimits_{j = 1}^{n} {z_{j} } + z_{2} } \\ \vdots \\ {z_{n} = - \sum\nolimits_{j = 1,j \ne n}^{n} {z_{j} } = - \sum\nolimits_{j = 1}^{n} {z_{j} } + z_{n} } \\ \end{array} } \right), $$
$$ \sum\limits_{l = 1}^{n} {z_{l} } = - n\sum\limits_{j = 1}^{n} {z_{j} } + \sum\limits_{l = 1}^{n} {z_{l} } ,\;\;n\sum\limits_{j = 1}^{n} {z_{l} } = \sum\limits_{l = 1}^{n} {z_{l} } - \sum\limits_{l = 1}^{n} {z_{l} } = 0,{\text{ i}}.{\text{e}}.,{\mathbf{1}}^{\text{T}} {\mathbf{z}}_{i} = 0,i = 1, \ldots ,n. $$

The respective equalities for the last eigenvalue and eigenvector have the form

$$ \left( {\begin{array}{*{20}c} 0 & { - {\kern 1pt} 1} & \cdots & { - {\kern 1pt} 1} \\ { - {\kern 1pt} 1} & 0 & \cdots & { - {\kern 1pt} 1} \\ \cdots & \cdots & \ddots & \cdots \\ { - {\kern 1pt} 1} & { - {\kern 1pt} 1} & \cdots & 0 \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {z_{1} } \\ {z_{2} } \\ \vdots \\ {z_{n} } \\ \end{array} } \right) = - (n - 1)\left( {\begin{array}{*{20}c} {z_{1} } \\ {z_{2} } \\ \vdots \\ {z_{n} } \\ \end{array} } \right),\;z_{1} = z_{2} = \ldots = z_{n} ,\;{\text{i}}.{\text{e}}.\;{\mathbf{z}}_{n} = {\mathbf{1}}\, \in \,{\mathbf{\mathbb{R}}}^{n} ,{\text{QED}}. $$

Lemma 7.2.

Quadratic form \( q({\varvec{\upbeta}}) = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}\,{\varvec{\upbeta}} \) is positive \( q({\varvec{\upbeta}}) > 0 \), when \( {\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0 \) and \( {\varvec{\upbeta}} \ne {\mathbf{0}} \in {\mathbf{\mathbb{R}}}^{n} \), hence, matrix \( {\mathbf{B}} \) (106) is conditionally positive definite.

Indeed, due to Lemma 7.1, \( {\mathbf{B}} = \sum\nolimits_{i = 1}^{n} {\uplambda_{i} {\mathbf{z}}_{i} {\mathbf{z}}_{i}^{\text{T}} } = \sum\nolimits_{i = 1}^{n - 1} {{\mathbf{z}}_{i} {\mathbf{z}}_{i}^{\text{T}} } - (n - 1){\mathbf{z}}_{n} {\mathbf{z}}_{n}^{\text{T}} = \)\( \sum\nolimits_{i = 1}^{n - 1} {{\mathbf{z}}_{i} {\mathbf{z}}_{i}^{\text{T}} } - (n - 1){\mathbf{11}}^{\text{T}} \). Let us consider the quadratic form \( q({\varvec{\upbeta}}) = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}\,{\varvec{\upbeta}} = \sum\nolimits_{i = 1}^{n - 1} {{\varvec{\upbeta}}^{\text{T}} {\mathbf{z}}_{i} {\mathbf{z}}_{i}^{\text{T}} {\varvec{\upbeta}}} - (n - 1){\varvec{\upbeta}}^{\text{T}} {\mathbf{11}}^{\text{T}} {\varvec{\upbeta}} \). On the hyperplane \( {\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0 \), it has the values \( \left. {q({\varvec{\upbeta}})} \right|_{{{\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0}} = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}\,{\varvec{\upbeta}} = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}\,{\varvec{\upbeta}} \). Here matrix \( {\mathbf{B}} = \sum\nolimits_{i = 1}^{n - 1} {{\mathbf{z}}_{i} {\mathbf{z}}_{i}^{\text{T}} } \) is positive definite, i.e., \( \left. {q({\varvec{\upbeta}})} \right|_{{{\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0}} > 0 \), \( {\varvec{\upbeta}} \ne {\mathbf{0}} \in {\mathbf{\mathbb{R}}}^{n} \), QED.

Now we are ready to prove Theorem 7.

We have to prove that matrix \( {\mathbf{B}}(\upalpha) \) is conditionally positive definite if \( \upalpha \) is large enough. Let us consider the quadratic form

$$ {\mathbf{B}}(\upalpha) = \left( {\begin{array}{*{20}c} 0 & { -\uprho^{2} (\upomega_{1} ,\upomega_{2} |\upalpha)} & \cdots & { -\uprho^{2} (\upomega_{1} ,\upomega_{n} |\upalpha)} \\ { -\uprho^{2} (\upomega_{2} ,\upomega_{1} |\upalpha)} & 0 & \cdots & { -\uprho^{2} (\upomega_{2} ,\upomega_{n} |\upalpha)} \\ \cdots & \cdots & \ddots & \cdots \\ { -\uprho^{2} (\upomega_{n} ,\upomega_{1} |\upalpha)} & { -\uprho^{2} (\upomega_{n} ,\upomega_{2} |\upalpha)} & \cdots & 0 \\ \end{array} } \right) $$

\( q({\mathbf{\beta |}}\upalpha) = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}(\upalpha){\varvec{\upbeta}} \) on the intersection of the hyperplane \( {\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0 \) and hypersphere \( {\varvec{\upbeta}}^{\text{T}} \,{{\varvec{\upbeta}} = 1} \). Since \( q({\varvec{\upbeta}}\,{\mathbf{|}}\,\upalpha) \) is continuous function of \( \upalpha \), \( \mathop {\lim }\limits_{{\upalpha \to \infty }} {\mathbf{B}}(\upalpha) = {\mathbf{B}} \) and, so, \( \mathop {\lim }\limits_{{\upalpha \to \infty }} q({\varvec{\upbeta}}\,{\mathbf{|}}\,\upalpha) = \) \( \mathop {\lim }\limits_{{\upalpha \to {\infty }}} {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}(\upalpha){\varvec{\upbeta}} = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}\,{\varvec{\upbeta}}\,{\mathbf{ = }}q({\varvec{\upbeta}}). \) In accordance with Lemma 7.1, matrix \( {\mathbf{B}} \) (106) is conditionally positive definite, hence, due to Lemma 7.2, there exists \( \upalpha_{0} \), such that \( \left. {q({\mathbf{\beta |}}\upalpha)} \right|_{{{\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0}} > 0 \), \( {\varvec{\upbeta}} \ne {\mathbf{0}} \in {\mathbf{\mathbb{R}}}^{n} \), i.e., \( {\mathbf{B}}(\upalpha) \) is conditionally positive definite if \( \upalpha >\upalpha_{0} \). ■

Proof of Theorem 8.

Let the training assembly consist of four objects \( N = 4 \):

$$ \begin{aligned} &\uprho(\upomega_{1} ,\upomega_{2} ) =\uprho(\upomega_{3} ,\upomega_{4} ) = \sqrt 2 ,\;\uprho(\upomega_{1} ,\upomega_{3} ) =\uprho(\upomega_{1} ,\upomega_{4} ) =\uprho(\upomega_{2} ,\upomega_{3} ) =\uprho(\upomega_{2} ,\upomega_{4} ) = 1, \\ & y_{1} = y_{2} = 1,\;y_{3} = y_{4} = - 1. \\ \end{aligned} $$
(107)

Let us try to find to find a decision rule (68) that correctly classifies all the objects:

$$ \left\{ \begin{aligned} \underbrace {{\left( { -\uprho^{2} (\upomega_{1} ,\upomega_{1} )} \right)}}_{0}a_{1} {\kern 1pt} + {\kern 1pt} \underbrace {{\left( { -\uprho^{2} (\upomega_{2} ,\upomega_{1} )} \right)}}_{ - 2}a_{2} {\kern 1pt} + {\kern 1pt} \underbrace {{\left( { -\uprho^{2} (\upomega_{3} ,\upomega_{1} )} \right)}}_{ - 1}a_{3} + \underbrace {{\left( { -\uprho^{2} (\upomega_{4} ,\upomega_{1} )} \right)}}_{ - 1}a_{4} + b > 0, \hfill \\ \underbrace {{\left( { -\uprho^{2} (\upomega_{1} ,\upomega_{2} )} \right)}}_{ - 2}a_{1} + \underbrace {{\left( { -\uprho^{2} (\upomega_{2} ,\upomega_{2} )} \right)}}_{0}a_{2} + \underbrace {{\left( { -\uprho^{2} (\upomega_{3} ,\upomega_{2} )} \right)}}_{ - 1}a_{3} + \underbrace {{\left( { -\uprho^{2} (\upomega_{4} ,\upomega_{2} )} \right)}}_{ - 1}a_{4} + b > 0, \hfill \\ \underbrace {{\left( { -\uprho^{2} (\upomega_{1} ,\upomega_{3} )} \right)}}_{ - 1}a_{1} + \underbrace {{\left( { -\uprho^{2} (\upomega_{2} ,\upomega_{3} )} \right)}}_{ - 1}a_{2} + \underbrace {{\left( { -\uprho^{2} (\upomega_{3} ,\upomega_{3} )} \right)}}_{0}a_{3} + \underbrace {{\left( { -\uprho^{2} (\upomega_{4} ,\upomega_{3} )} \right)}}_{ - 2}a_{4} + b < 0, \hfill \\ \underbrace {{\left( { -\uprho^{2} (\upomega_{1} ,\upomega_{4} )} \right)}}_{ - 1}a_{1} + \underbrace {{\left( { -\uprho^{2} (\upomega_{2} ,\upomega_{4} )} \right)}}_{ - 1}a_{2} {\kern 1pt} + {\kern 1pt} \underbrace {{\left( { -\uprho^{2} (\upomega_{3} ,\upomega_{4} )} \right)}}_{ - 2}a_{3} + \underbrace {{\left( { -\uprho^{2} (\upomega_{4} ,\upomega_{4} )} \right)}}_{0}a_{4} + b < 0. \hfill \\ \end{aligned} \right. $$
(108)

Then, the numbers \( (a_{1} ,a_{2} ,a_{3} ,a_{4} ,b) \) have to meet the equalities

$$ \left\{ \begin{aligned} - 2a_{2} - a_{3} - a_{4} + b > 0, \hfill \\ - 2a_{1} - a_{3} - a_{4} + b > 0, \hfill \\ - a_{1} - a_{2} - 2a_{4} + b < 0, \hfill \\ - a_{1} - a_{2} - 2a_{3} + b < 0, \hfill \\ \end{aligned} \right. \;{\text{or,}}\;{\text{what}}\;{\text{is}}\,{\text{equivalent,}}\;\;\left\{ \begin{aligned} \left\{ \begin{aligned} - 2a_{2} - a_{3} - a_{4} + b > 0, \hfill \\ - 2a_{1} - a_{3} - a_{4} + b > 0, \hfill \\ \end{aligned} \right. \hfill \\ \left\{ \begin{aligned} a_{1} + a_{2} + 2a_{4} - b > 0, \hfill \\ a_{1} + a_{2} + 2a_{3} - b > 0. \hfill \\ \end{aligned} \right. \hfill \\ \end{aligned} \right. $$
(109)

Adding the left and right parts of the first two inequalities results in the inequality \( - (a_{1} + a_{2} ) - \)\( a_{3} - a_{4} + b > 0 \), i.e. \( a_{1} + a_{2} + a_{3} + a_{4} - b < 0 \), and the same operation applied to the second two inequalities gives \( a_{1} + a_{2} + a_{3} + a_{4} - b > 0 \). It is clear that these two pairs of inequalities are incompatible, hence, the inequalities (109) are incompatible, too. Thus, the decision rule of kind (69), which would correctly classify (108) the assembly (107), does not exist. ■

Proof of Theorem 9.

Let us consider an arbitrary object of the training set \( \upomega_{j} \in\Omega \). In accordance with (69), the decision rule (101) can be represented as

$$ d(\upomega_{j} |a_{1} , \ldots ,a_{N} ,b,\upalpha) = \frac{{1 +\upalpha}}{\upalpha}\left[ {a_{j} + \sum\limits_{i = 1,\;i \ne j}^{n} {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{j} ,\upomega_{i} )} \right)a_{i} } } \right] + b. $$

Indeed,

$$ \begin{aligned} & \sum\limits_{i = 1}^{n} {\left( { -\uprho^{2} (\upomega_{j} ,\upomega_{i} |\upalpha)} \right)a_{i} } + b = \frac{{1 +\upalpha}}{\upalpha}\sum\limits_{i = 1}^{n} {\left[ {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{j} ,\upomega_{i} )} \right) - 1} \right]a_{i} } + b = \\ & \quad \quad \quad \frac{1 + \alpha }{\alpha }\sum\limits_{i = 1}^{n} {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{j} ,\upomega_{i} )} \right)a_{i} } - \frac{{1 +\upalpha}}{\upalpha}\underbrace {{\sum\limits_{i = 1}^{n} {a_{i} } }}_{ = 0} + b = \\ & \quad \quad \quad \quad \quad \quad \quad \quad \frac{{1 +\upalpha}}{\upalpha}\underbrace {{\exp \left( { -\upalpha \uprho ^{2} (\upomega_{j} ,\upomega_{j} )} \right)}}_{ = 1}a_{j} + \frac{{1 +\upalpha}}{\upalpha}\sum\limits_{i = 1,\;i \ne j}^{n} {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{j} ,\upomega_{i} )} \right)a_{i} } + b. \\ \end{aligned} $$

Since \( \mathop {\lim }\limits_{{\upalpha \to {\infty }}} \left[ {(1 +\upalpha)/\upalpha} \right] = 1 \), we have

$$ \mathop {\lim }\limits_{{\upalpha \to {\infty }}} d(\upomega_{j} |a_{1} , \ldots ,a_{N} ,b,\upalpha) = a_{j} + \sum\limits_{i = 1,\;i \ne j}^{n} {\underbrace {{\mathop {\lim }\limits_{{\upalpha \to \infty }} \exp \left( { -\upalpha \uprho ^{2} (\upomega_{j} ,\upomega_{i} )} \right)}}_{ = 0}a_{i} } + b = a_{j} + b. $$

Let \( (a_{j} ,\;j = 1, \ldots ,N,b) \) be the parameter vector such that \( a_{j} > 0 \) if \( y_{j} = 1 \), \( a_{j} < 0 \) if \( y_{j} = - 1 \), and \( b = 0 \). Then

$$ \mathop {\lim }\limits_{{\upalpha \to {\infty }}} d(\upomega_{j} |a_{1} , \ldots ,a_{N} ,b,\upalpha) = a_{j} \left\{ {\begin{array}{*{20}l} { > 0,\;y_{j} = 1,} \hfill \\ { < 0,\;y_{j} = - 1,} \hfill \\ \end{array} } \right. $$

i.e., parameter vector \( (a_{j} ,j{\kern 1pt} = \,1, \ldots ,N,b) \) correctly separates the entire training set. ■

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mottl, V., Seredin, O., Krasotkina, O. (2018). Compactness Hypothesis, Potential Functions, and Rectifying Linear Space in Machine Learning. In: Rozonoer, L., Mirkin, B., Muchnik, I. (eds) Braverman Readings in Machine Learning. Key Ideas from Inception to Current State. Lecture Notes in Computer Science(), vol 11100. Springer, Cham. https://doi.org/10.1007/978-3-319-99492-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99492-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99491-8

  • Online ISBN: 978-3-319-99492-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics