Compactness Hypothesis, Potential Functions, and Rectifying Linear Space in Machine Learning

Mottl, Vadim; Seredin, Oleg; Krasotkina, Olga

doi:10.1007/978-3-319-99492-5_3

Vadim Mottl¹⁶,
Oleg Seredin¹⁶ &
Olga Krasotkina¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11100))

1358 Accesses
2 Citations

Abstract

Emmanuel Braverman was one of the very few thinkers who, during his extremely short life, managed to inseminate several seemingly completely different areas of science. This paper overviews one of the knowledge areas he essentially affected in the sixties years of the last century, namely, the area of Machine Learning. Later, Vladimir Vapnik proposed a more engineering-oriented name of this knowledge area – Estimation of Dependencies Based on Empirical Data. We shall consider these titles as synonyms. The aim of the paper is to briefly trace the way how three notions introduced by Braverman formed the core of the contemporary Machine Learning doctrine. These notions are: (1) compactness hypothesis, (2) potential function, and (3) the rectifying linear space, in which the former two have resulted. There will be little new in this paper. Almost all the constructions we are going to speak about had been published by numerous scientists. The novelty is, perhaps, only in that all these issues will be systematically considered together as immediate consequences of Braveman’s basic principles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The latter developments are a particular case of a more general relational approach to dependence estimation, which allows for asymmetric comparison functions [16, 17].
2.
Otherwise, we would be forced to assume that the distance is a metric, and the respective metric space is separable, i.e., contains a countable everywhere dense subset. All the mathematical construction would become much more complicated without any gain in generalization of the resulting class of dependence models.

References

Braverman, E.M.: Experiments on machine learning to recognize visual patterns. Autom. Remote Control 23, 315–327 (1962). Translated from Russian Autimat. i Telemekh. 23, 349–364 (1962)
Google Scholar
Arkadʹev, A.G., Braverman, E.M.: Computers and Pattern Recognition. Thompson Book Company, Washington (1967). 115 p.
Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Vapnik, V.: Estimation of Dependences Based on Empirical Data. Springer, New York (1982). https://doi.org/10.1007/0-387-34239-7
Book MATH Google Scholar
Duin, R.P.W.: Compactness and complexity of pattern recognition problems. In: Proceedings of International Symposium on Pattern Recognition “In Memoriam Pierre Devijver”, Brussels, B, 12 February, Royal Military Academy, pp. 124–128 (1999)
Google Scholar
Aizerman, M., Braverman, E., Rozonoer, L.: Theoretical foundations of the potential function method in pattern recognition learning. Autom. Remote Control 25, 917–936 (1964)
Google Scholar
Mercer, J.: Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. A 209, 415–446 (1909)
Article Google Scholar
Goldfarb, L.: A unified approach to pattern recognition. Pattern Recogn. 17, 575–582 (1984)
Article MathSciNet Google Scholar
Goldfarb, L.: A New Approach to Pattern Recognition. Progress in Pattern Recognition, Elsevier Science Publishers BV 2, 241–402 (1985)
MathSciNet MATH Google Scholar
Pękalska, E., Duin, R.P.W.: Dissimilarity representations allow for building good classifiers. Pattern Recogn. Lett. 23(8), 943–956 (2002)
Article Google Scholar
Pekalska, E., Duin, R.P.W.: The Dissimilarity Representation for Pattern Recognition: Foundations and Applications. World Scientific Publishing Co. Inc., River Edge (2005)
Book Google Scholar
Haasdonk, B., Pekalska, E.: Indefinite kernel Fisher discriminant. In: Proceedings of the 19th International Conference on Pattern Recognition, Tampa, USA, 8–11 December 2008
Google Scholar
Duin, R.P.W., Pękalska, E.: Non-Euclidean dissimilarities: causes and informativeness. In: Hancock, E.R., Wilson, R.C., Windeatt, T., Ulusoy, I., Escolano, F. (eds.) SSPR/SPR 2010. LNCS, vol. 6218, pp. 871–880. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14980-1_31
Chapter Google Scholar
Haasdonk, B.: Feature space interpretation of SVMs with indefinite kernels. TPAMI 25, 482–492 (2005)
Article Google Scholar
Pękalska, E., Harol, A., Duin, R.P.W., Spillmann, B., Bunke, H.: Non-Euclidean or non-metric measures can be informative. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) SSPR/SPR 2006. LNCS, vol. 4109, pp. 871–880. Springer, Heidelberg (2006). https://doi.org/10.1007/11815921_96
Chapter Google Scholar
Duin, R., Pekalska, E., De Ridder, D.: Relational discriminant analysis. Pattern Recogn. Lett. 20, 1175–1181 (1999)
Article Google Scholar
Maria-Florina Balcan, M.-F., Blum, A., Srebro, N.: A theory of learning with similarity functions. Mach. Learn. 72, 89–112 (2008)
Article Google Scholar
Nelder, J., Wedderburn, R.: Generalized linear models. J. Roy. Stat. Soc. Ser. A (Gen.) 135(3), 370–384 (1972)
Article Google Scholar
McCullagh, P., Nelder, J.: Generalized Linear Models, 511 p., 2nd edn. Chapman and Hall, London (1989)
Chapter Google Scholar
Mottl, V., Krasotkina, O., Seredin, O., Muchnik, I.: Principles of multi-kernel data mining. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 52–61. Springer, Heidelberg (2005). https://doi.org/10.1007/11510888_6
Chapter Google Scholar
Tatarchuk, A., Urlov, E., Mottl, V., Windridge, D.: A support kernel machine for supervised selective combining of diverse pattern-recognition modalities. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS, vol. 5997, pp. 165–174. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12127-2_17
Chapter Google Scholar
Gonen, M., Alpaydın, E.: Multiple kernel learning algorithms. J. Mach. Learn. Res. 12, 2211–2268 (2011)
MathSciNet MATH Google Scholar
Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2001)
MATH Google Scholar
Deza, M., Deza, E.: Encyclopedia of Distances. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-642-00234-2
Book MATH Google Scholar
Azizov, T.Y., Iokhvidov, I.S.: Linear Operators in Spaces with an Indefinite Metric. Wiley, Chichester (1989)
MATH Google Scholar
Langer, H.: Krein space. In: Hazewinkel, M. (ed.) Encyclopaedia of Mathematics (set). Springer, Netherlands (1994)
Google Scholar
Ong, C.S., Mary, X., Canu, S., Smola, A.: Learning with non-positive kernels. In: Proceedings of the Twenty-First International Conference on Machine learning, ICML 2004, Banff, Alberta, Canada, 04–08 July 2004
Google Scholar
Bugrov, S., Nikolsky, S.M.: Fundamentals of Linear Algebra and Analytical Geometry. Mir, Moscow (1982)
MATH Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Information Science and Statistics. Springer, New York (2000). https://doi.org/10.1007/978-1-4757-3264-1
Book MATH Google Scholar
Guyon, I., Vapnik, V.N., Boser, B.E., Bottou, L., Solla, S.A.: Structural risk minimization for character recognition. In: Advances in Neural Information Processing Systems, vol. 4. Morgan Kaufman, Denver (1992)
Google Scholar
Wilson, J.R., Lorenz, K.A.: Short history of the logistic regression model. Modeling Binary Correlated Responses using SAS, SPSS and R. IBSS, vol. 9, pp. 17–23. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23805-0_2
Chapter MATH Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learning 20, 273–297 (1995)
MATH Google Scholar
Tikhonov, A.N.: On the stability of inverse problems. Dokl. Akad. Nauk SSSR 39(5), 195–198 (1943)
MathSciNet Google Scholar
Tikhonov, A.N.: Solution of incorrectly formulated problems and the regularization method. Sov. Math. 4, 1035–1038 (1963)
MATH Google Scholar
Tikhonov, A.N., Arsenin, V.Y.: Solution of Ill-Posed Problems. Winston & Sons, Washington (1977)
MATH Google Scholar
Hoerl, A.E., Kennard, D.J.: Application of ridge analysis to regression problems. Chem. Eng. Prog. 58, 54–59 (1962)
Google Scholar
Vinod, H.D., Ullah, A.: Recent advances in regression methods, vol. 41. In: Statistics: Textbooks and Monographs. Marcel Dekker Inc., New York (1981)
Google Scholar
Mottl, V., Dvoenko, S., Seredin, O., Kulikowski, C., Muchnik, I.: Featureless pattern recognition in an imaginary Hilbert space and its application to protein fold classification. In: Perner, P. (ed.) MLDM 2001. LNCS (LNAI), vol. 2123, pp. 322–336. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44596-X_26
Chapter MATH Google Scholar
Frank, I.E., Friedman, J.H.: A statistical view of some chemometrics regression tools. Technometrics 35, 109–148 (1993)
Article Google Scholar
Fu, W.J.: Penalized regression: the bridge versus the LASSO. J. Comput. Graph. Stat. 7, 397–416 (1998)
MathSciNet Google Scholar
Mottl, V., Seredin, O., Krasotkina, O., Muchnik, I.: Fusion of Euclidean metrics in featureless data analysis: an equivalent of the classical problem of feature selection. Pattern Recogn. Image Anal. 15(1), 83–86 (2005)
Google Scholar
Mottl, V., Seredin, O., Krasotkina, O., Mochnik, I.: Kernel fusion and feature selection in machine learning. In: Proceedings of the 8th IASTED International Conference on Intelligent Systems and Control, Cambridge, USA, 31 October–2 November, 2005, pp. 477–482
Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. 67, 301–320 (2005)
Article MathSciNet Google Scholar
Wang, L., Zhu, J., Zou, H.: The doubly regularized support vector machine. Statistica Sinica 16, 589–615 (2006)
MathSciNet MATH Google Scholar
Tibshirani, R.J.: Regression shrinkage and selection via the LASSO. J. Roy. Stat. Soc. Ser. B 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Tibshirani, R.J.: The LASSO method for variable selection in the Cox model. Stat. Med. 16, 385–395 (1997)
Article Google Scholar
Tatarchuk, A., Mottl, V., Eliseyev, A., Windridge, D.: Selectivity supervision in combining pattern-recognition modalities by feature- and kernel-selective support vector machines. In: Proceedings of the 19th International Conference on Pattern Recognition, ICPR-2008, vol. 1–6, pp. 2336–2339 (2008)
Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. Theor. Methods 96(456), 1348–1360 (2001)
Article MathSciNet Google Scholar
Krasotkina, O., Mottl, V.A.: Bayesian approach to sparse Cox regression in high-dimensional survival analysis. In: Proceedings of the 11th International Conference on Machine Learning and Data Mining (MLDM 2015), Hamburg, Germany, 20–23 July 2015, pp. 425–437
Chapter Google Scholar
Krasotkina, O., Mottl, V.A.: Bayesian approach to sparse learning-to-rank for search engine optimization. In: Proceedings of the 11th International Conference on Machine Learning and Data Mining (MLDM 2015), Hamburg, Germany, 20–23 July 2015, pp. 382–394
Chapter Google Scholar
Tatarchuk, A., Sulimova, V., Windridge, D., Mottl, V., Lange, M.: Supervised selective combining pattern recognition modalities and its application to signature verification by fusing on-line and off-line kernels. In: Benediktsson, J.A., Kittler, J., Roli, F. (eds.) MCS 2009. LNCS, vol. 5519, pp. 324–334. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02326-2_33
Chapter Google Scholar
Razin, N., et al.: Application of the multi-modal relevance vector machine to the problem of protein secondary structure prediction. In: Shibuya, T., Kashima, H., Sese, J., Ahmad, S. (eds.) PRIB 2012. LNCS, vol. 7632, pp. 153–165. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34123-6_14
Chapter Google Scholar
Tatarchuk, A., Sulimova, V., Torshin, I., Mottl, V., Windridge, D.: Supervised selective kernel fusion for membrane protein prediction. In: Comin, M., Käll, L., Marchiori, E., Ngom, A., Rajapakse, J. (eds.) PRIB 2014. LNCS, vol. 8626, pp. 98–109. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09192-1_9
Chapter Google Scholar
Berg, C., Christensen, J.P.R., Ressel, P.: Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions. Springer, New York (1984). https://doi.org/10.1007/978-1-4612-1128-0
Book MATH Google Scholar

Download references

Acknowledgements

We would like to acknowledge support from grants of the Russian Foundation for Basic Research 14-07-00527, 16-57-52042, 17-07-00436, 17-07-00993, 18-07-01087, 18-07-00942, and from Tula State University within the framework of the scientific project № 2017-62PUBL.

Author information

Authors and Affiliations

Tula State University, Tula, 300012, Lenin Ave. 92, Russia
Vadim Mottl & Oleg Seredin
Moscow State University, Moscow, 119991, Lenin Hills 1, Russia
Olga Krasotkina

Authors

Vadim Mottl
View author publications
You can also search for this author in PubMed Google Scholar
Oleg Seredin
View author publications
You can also search for this author in PubMed Google Scholar
Olga Krasotkina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vadim Mottl .

Editor information

Editors and Affiliations

West Newton, MA, USA
Lev Rozonoer
National Research University Higher School of Economics, Moscow, Russia
Boris Mirkin
Rutgers University, Piscataway, NJ, USA
Ilya Muchnik

Appendix. The Proofs of Theorems

Proof of Theorem 1.

Let $ \upomega^{\prime } ,\upomega^{\prime \prime } ,\upomega^{\prime \prime \prime } $ be three arbitrary objects. Due to (10), $ h \ge \mathop {\sup }\limits_{{\widetilde{\upomega}^{\prime } = \widetilde{\upomega}^{\prime \prime } = \widetilde{\upomega}^{\prime \prime \prime } \in\Omega }} \left\{ {\uprho(\widetilde{\upomega}^{\prime } ,\widetilde{\upomega}^{\prime \prime \prime } )\, - } \right. $ $ - \,[\uprho(\widetilde{\upomega}^{\prime } ,\widetilde{\upomega}^{\prime \prime } ) +\uprho(\widetilde{\upomega}^{\prime \prime } ,\widetilde{\upomega}^{\prime \prime \prime } )\left. ] \right\} = 0 $, thus, $ r(\upalpha,\upbeta) $ is nonnegative $ r(\upalpha,\upbeta) =\uprho(\upalpha,\upbeta) + h \ge 0 $. Since $ r(\upomega^{\prime } ,\upomega^{\prime \prime } ) =\uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) + h $, we have $ r(\upomega^{\prime } ,\upomega^{\prime \prime } ) + r(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) - r(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) = \left[ {\uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) + h} \right] + $ $ \left[ {\uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) + h} \right] - \left[ {\uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) + h} \right] =\uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) +\uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) -\uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) + h $. Here, on the force of (10), $ \uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) +\uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) -\uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) \ge - h $, and $ \uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) +\uprho(\upomega^{\prime \prime } ,\upomega^{\prime \prime \prime } ) -\uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) + h \ge $ $ - h + \,h = 0 $, whence it follows that $ r(\upomega^{\prime } ,\upomega^{\prime \prime } ) + r(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) - r(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) \ge 0 $. ■

Proof of Theorem 2.

Let $ \left\{ {\upomega_{1} , \ldots ,\upomega_{M} } \right\} \subset\Omega $ be a finite collection of objects within a distance space, $ \upphi =\upomega_{k} $ be chosen as the center, and $ {\mathbf{K}}_{\upphi} = {\mathbf{K}}_{{\upomega_{k} }} = \left[ {K_{{\upomega_{k} }} (\upomega_{i} ,\upomega_{j} ),\;i,j = 1, \ldots ,M} \right] $ be the matrix formed by similarity function (11). Let, further, another object be assigned as a new center $ {\upvarphi}^{{\prime}} =\upomega_{l} $ and $ {\mathbf{K}}_{{\upphi^{\text{'}} }} = {\mathbf{K}}_{{\upomega_{l} }} = \left[ {K_{{\upomega_{l} }} (\upomega_{i} ,\upomega_{j} ),\;i,j = 1, \ldots ,M} \right] $ be another similarity matrix. In accordance with (14),

$$ K_{{\upphi^{\text{'}} }} (\upomega^{\text{'}} ,\upomega^{{\text{''}}} ) = K_{{\upomega_{l} }} (\upomega_{i} ,\upomega_{j} ) = K_{{\upomega_{k} }} (\upomega_{i} ,\upomega_{j} ) - K_{{\upomega_{k} }} (\upomega_{i} ,\upomega_{l} ) - K_{{\upomega_{k} }} (\upomega_{j} ,\upomega_{l} ) + K_{{\upomega_{k} }} (\upomega_{l} ,\upomega_{l} ). $$

Let notation $ {\mathbf{E}}_{{\upomega_{k} \to\upomega_{l} }} (M \times M) $ stand for matrices in which all the elements are zeros except the $ l $ th row of units $ (a_{li} = 1,\;i = 1, \ldots ,M) $ and one additional unit element $ a_{kl} = 1 $:

$$ \begin{aligned} {\mathbf{E}}_{{\upomega_{k} \to\upomega_{l} }} = \left( {\begin{array}{*{20}c} 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \end{array} } \right)\begin{array}{*{20}c} 1 \\ \vdots \\ k \\ \vdots \\ l \\ \vdots \\ M \\ \end{array} \hfill \\ \, \begin{array}{*{20}c} {1\,} & \cdots & {k\,} & \cdots & {\,l} & { \cdots \,} & M \\ \end{array} \hfill \\ \end{aligned} $$

It is clear that the matrices $ {\mathbf{S}}_{{\upomega_{k} \to\upomega_{l} }} = {\mathbf{I}} - {\mathbf{E}}_{{\upomega_{k} \to\upomega_{l} }} $ are nondegenerate.

Let us consider two quadratic forms $ {\mathbf{x}}^{T} {\mathbf{K}}_{{\upomega_{k} }} {\mathbf{x}} $ and $ {\mathbf{y}}^{T} {\mathbf{K}}_{{\upomega_{l} }} {\mathbf{y}} $, $ {\mathbf{x}},{\mathbf{y}} \in {\mathbf{\mathbb{R}}}^{M} $. Here

$$ {\mathbf{y}}^{T} {\mathbf{K}}_{{\upomega_{l} }} {\mathbf{y}} = {\mathbf{y}}^{T} {\mathbf{S}}_{{\upomega_{k} \to\upomega_{l} }}^{T} {\mathbf{K}}_{{\upomega_{k} }} {\mathbf{S}}_{{\upomega_{k} \to\upomega_{l} }} {\mathbf{y}} = \left( {{\mathbf{S}}_{{\upomega_{k} \to\upomega_{l} }} {\mathbf{y}}} \right)^{T} {\mathbf{K}}_{{\upomega_{k} }} \left( {{\mathbf{S}}_{{\upomega_{k} \to\upomega_{l} }} {\mathbf{y}}} \right). $$

As we see, the quadratic forms coincide after the one-to-one substitution $ {\mathbf{x}} = {\mathbf{S}}_{{\upomega_{k} \to\upomega_{l} }} {\mathbf{y}} $. In accordance with Sylvester’s law of inertia for quadratic forms, the numbers of positive, negative, and zero eigenvalues of matrices $ {\mathbf{K}}_{{\upomega_{k} }} $ and $ {\mathbf{K}}_{{\upomega_{l} }} $ coincide, so, their signatures coincide, too. ■

Proof of Theorem 3.

Proof is based on the following lemma [54, p. 74].

Lemma 3.1.

Function $ K_{\upphi} (\upomega^{\prime } ,\upomega^{\prime \prime } ) = \tilde{K}(\upomega^{\prime } ,\upomega^{\prime \prime } ) - \tilde{K}(\upomega^{\prime } ,\upphi) - \tilde{K}(\upomega^{\prime \prime } ,\upphi) + \tilde{K}(\upphi,\upphi) $, $ \upomega^{\prime } ,\upomega^{\prime \prime } , $ $ \upphi \in\Omega $, is kernel if and only if $ \tilde{K}(\upvartheta^{\prime } ,\upvartheta^{\prime \prime } ) $, $ \upvartheta^{\prime } ,\upvartheta^{\prime \prime } \in\Omega $, is conditional kernel – matrix $ \left[ {\tilde{K}(\upvartheta_{k} ,\upvartheta_{l} ),\,k,l = 1, \ldots ,M} \right] $ is conditionally positive definite (30) for any finite set $ \{\upvartheta_{1} , \ldots ,\upvartheta_{M} \} $.

Proof of the Theorem.

In notations of this paper, the conditionally positive kernel is defined by the proto-Euclidean metric $ \tilde{K}(\upvartheta^{\prime } ,\upvartheta^{\prime \prime } ) = -\uprho^{2} (\upvartheta^{\prime } ,\upvartheta^{\prime \prime } ) $. The substitution of this equality in the assertion of Lemma 3.1 yields $ K_{\upphi} (\upomega^{\prime } ,\upomega^{\prime \prime } ) = -\uprho^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } ) +\uprho^{2} (\upomega^{\prime } ,\upphi) + $ $ \uprho^{2} (\upomega^{\prime \prime } ,\upphi) +\uprho^{2} (\upphi,\upphi) $. Since the last summand is zero, we have $ K_{\upphi} (\upomega_{i} ,\upomega_{j} ) = $ $ (1/2)\left[ {\uprho^{2} (\upomega_{i} ,\upphi) + } \right.\uprho^{2} (\upomega_{j} ,\upphi)\left. { -\uprho^{2} (\upomega_{i} ,\upomega_{j} )} \right] $. ■

Proof of Theorem 4.

In accordance with (88), we have in the case (89)

$$ \begin{aligned} & \mathop {\lim }\limits_{{\upalpha \to 0}}\uprho^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } |\upalpha) = \mathop {\lim }\limits_{{\upalpha \to 0}} \frac{{\left[ {1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)} \right] -\upalpha\left[ {1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)} \right]}}{\upalpha} = \\ & \mathop {\lim }\limits_{{\upalpha \to 0}} \left\{ {\frac{{1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)}}{\upalpha} - \left[ {1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)} \right]} \right\} = \mathop {\lim }\limits_{{\upalpha \to 0}} \frac{{1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)}}{\upalpha} - \\ & \mathop {\lim }\limits_{{\upalpha \to 0}} \left[ {1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)} \right] = \mathop {\lim }\limits_{{\upalpha \to 0}} \frac{{1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)}}{\upalpha} = \frac{0}{0}, \\ \end{aligned} $$

$$ \begin{aligned} & \mathop {\lim }\limits_{{\upalpha \to 0}}\uprho^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } |\upalpha) = \mathop {\lim }\limits_{{\upalpha \to 0}} {\kern 1pt} \frac{{({\partial }/{\partial }\,\upalpha)\left[ {1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)} \right]}}{{({\partial }/{\partial }\,\upalpha)\upalpha}} = \mathop {\lim }\limits_{{\upalpha \to 0}} \frac{{\partial }}{{{\partial }\,\upalpha}}\left[ {1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)} \right]{\kern 1pt} = \\ & - \mathop {\lim }\limits_{{\upalpha \to 0}} \frac{{\partial }}{{{\partial }\,\upalpha}}\exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right) = - {\kern 1pt} \mathop {\lim }\limits_{{\upalpha \to 0}} \left[ {\left( { -\uprho^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)\exp \left( { -\upalpha \uprho ^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } )} \right)} \right] =\uprho^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } ). \\ \end{aligned} $$

Equality (90) immediately follows from (88). ■

Proof of Theorem 5.

Let $ \upomega_{1} ,\upomega_{2} ,\upomega_{3} \in\Omega $ be three objects, $ \uprho_{12} =\uprho(\upomega_{1} ,\upomega_{2} ) $, $ \uprho_{23} =\uprho(\upomega_{2} ,\upomega_{3} ) $, $ \uprho_{13} =\uprho(\upomega_{1} ,\upomega_{3} ) $, and let $ \uprho_{12} \le\uprho_{23} $. Under notations (88) $ \uprho(\upomega^{\prime } ,\upomega^{\prime \prime } |\upalpha) =\uprho_{\upalpha} \left( {\uprho(\upomega^{\prime } ,\upomega^{\prime \prime } )} \right) $ and $ \uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) =\uprho $, the function $ \uprho_{\upalpha} (\uprho) $ is concave and increasing $ ({\partial }/{\partial }\,\uprho)\,\uprho_{\upalpha} (\uprho) > 0 $. Thus, the following inequalities hold:

$$ \begin{aligned} &\uprho_{\upalpha} (\uprho_{12} +\uprho_{23} ) \le\uprho_{\upalpha} (\uprho_{23} ) + \left[ {({\partial }/{\partial }\,\uprho)\uprho_{\upalpha} (\uprho)} \right]_{{\,\uprho_{23} }}\uprho_{12} ,\;\;\left[ {({\partial }/{\partial }\,\uprho)\uprho_{\upalpha} (\uprho)} \right]_{{\,\uprho_{23} }} \le \left[ {({\partial }/{\partial }\,\uprho)\uprho_{\upalpha} (\uprho)} \right]_{{\,\uprho_{12} }} , \\ & \left[ {({\partial }/{\partial }\,\uprho)\uprho_{\upalpha} (\uprho)} \right]_{{\,\uprho_{12} }}\uprho_{12} \le\uprho_{\upalpha} (\uprho_{12} ). \\ \end{aligned} $$

Here all the derivatives are positive, so, $ \uprho_{\upalpha} (\uprho_{12} +\uprho_{23} ) \le\uprho_{\upalpha} (\uprho_{12} ) +\uprho_{\upalpha} (\uprho_{23} ) $. Since the original metric satisfied the triangle inequality $ \uprho_{13} \le\uprho_{12} +\uprho_{23} $, we have $ \uprho_{\upalpha} (\uprho_{13} )\, \le $ $ \uprho_{\upalpha} (\uprho_{12} ) +\uprho_{\upalpha} (\uprho_{23} ) $. ■

Proof of Theorem 6.

It is enough to prove that for any finite set of objects $ \left\{ {\upomega_{1} , \ldots ,\upomega_{n} } \right\} \subset\Omega $ and any center $ \upphi \subset\Omega $ the matrix

$$ {\mathbf{K}}_{\upphi} (\upalpha) = \frac{1}{2}\left[ {\uprho^{2} (\upomega_{i} ,\upphi|\upalpha) +\uprho^{2} (\upomega_{j} ,\upphi|\upalpha) -\uprho^{2} (\upomega_{i} ,\upomega_{j} |\upalpha),\;i,j = 1, \ldots ,n} \right] $$

is positive definite if matrix

$$ {\mathbf{K}}_{\upphi} = \frac{1}{2}\left[ {\uprho^{2} (\upomega_{i} ,\upphi) +\uprho^{2} (\upomega_{j} ,\upphi) -\uprho^{2} (\upomega_{i} ,\upomega_{j} ),\;i,j = 1, \ldots ,n} \right] $$

(103)

is positive definite.

In accordance with Lemma 3.1 (proof of Theorem 3), it is necessary and sufficient for positive definiteness of matrix (103) that matrix

$$ {\mathbf{B}}(\upalpha) = \left[ { -\uprho^{2} (\upomega_{i} ,\upomega_{j} |\upalpha),\;i,j = 1, \ldots ,n} \right] $$

would be conditionally positive definite.

At the same time, due to (85), $ \uprho^{2} (\upomega_{i} ,\upomega_{j} |\upalpha){ \propto }1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega_{i} ,\upomega_{j} )} \right) $, so,

$$ {\mathbf{B}}(\upalpha){ \propto }{\mathbf{C}}(\upalpha) - {\mathbf{D}},{\mathbf{D}} = \left( {\begin{array}{*{20}c} 1 & 1 & \cdots & 1 \\ 1 & 1 & \cdots & 1 \\ \vdots & \vdots & \ddots & \vdots \\ 1 & 1 & \cdots & 1 \\ \end{array} } \right) = {\mathbf{11}}^{\text{T}} , $$

(104)

$$ {\mathbf{C}}(\upalpha) = \left( {\begin{array}{*{20}c} {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{1} ,\upomega_{1} )} \right)} & {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{1} ,\upomega_{2} )} \right)} & \cdots & {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{1} ,\upomega_{n} )} \right)} \\ {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{2} ,\upomega_{1} )} \right)} & {\exp \left( { -\upalpha \uprho ^{2} (\omega_{2} ,\omega_{2} )} \right)} & \cdots & {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{2} ,\upomega_{n} )} \right)} \\ {} & \vdots & \ddots & \vdots \\ {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{n} ,\upomega_{1} )} \right)} & {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{n} ,\upomega_{2} )} \right)} & \cdots & {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{n} ,\upomega_{n} )} \right)} \\ \end{array} } \right). $$

(105)

Here matrix $ {\mathbf{C}}(\upalpha) $ is always positive definite for any proto-Euclidean metric $ \uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) $ (103) on the force of Mercer’s theorem (Sect. 1.4), i.e., $ {\varvec{\upbeta}}^{\text{T}} {\mathbf{C}}(\upalpha){\varvec{\upbeta}}\,{\mathbf{ > }}0 $ if $ {\varvec{\upbeta}} \ne {\mathbf{0}} \in {\mathbf{\mathbb{R}}}^{n} $.

Let us consider the quadratic function $ q({\varvec{\upbeta}}|\upalpha) = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}(\upalpha){\varvec{\upbeta}} = {\varvec{\upbeta}}^{\text{T}} \left( {{\mathbf{C}}(\upalpha) - {\mathbf{D}}} \right){\varvec{\upbeta}} = $ $ {\varvec{\upbeta}}^{\text{T}} \left( {{\mathbf{C}}(\upalpha) - {\mathbf{11}}^{\text{T}} } \right){\varvec{\upbeta}} = {\varvec{\upbeta}}^{\text{T}} {\mathbf{C}}(\upalpha){\varvec{\upbeta}} - ({\varvec{\upbeta}}^{\text{T}} {\mathbf{1}})({\mathbf{1}}^{\text{T}} {\varvec{\upbeta}}) $ on the hyperplane $ {\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0 $. It is clear that $ \left. {q({\varvec{\upbeta}}|\upalpha)} \right|_{{{\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0}} = {\varvec{\upbeta}}^{\text{T}} {\mathbf{C}}(\upalpha){\varvec{\upbeta}}\,{\mathbf{ > }}0 $. Thus, matrix $ {\mathbf{B}}(\upalpha) $ is conditionally positive definite. ■

Proof of Theorem 7.

Proof is facilitated by the following two Lemmas.

Lemma 7.1.

Matrix $ {\mathbf{B}} $ has the main eigenvalues $ \uplambda_{1} = \ldots =\uplambda_{n - 1} = 1 $ of multiplicity $ n - 1 $, the last one $ \uplambda_{n} {\kern 1pt} = \, - \,(n\, - \,1) $ of multiplicity 1, the main eigenvectors $ {\mathbf{z}}_{i} ,i = 1, \ldots ,n - 1 $ with zero sums of elements $ {\mathbf{1}}^{\text{T}} {\mathbf{z}}_{i} = 0 $, $ i = 1, \ldots ,n $, and the last eigenvector $ {\mathbf{z}}_{n} = {\mathbf{1}} \in {\mathbf{\mathbb{R}}}^{n} . $

$$ {\mathbf{B}} = \left( {\begin{array}{*{20}c} 0 & { - {\kern 1pt} 1} & \cdots & { - {\kern 1pt} 1} \\ { - {\kern 1pt} 1} & 0 & \cdots & { - {\kern 1pt} 1} \\ \cdots & \cdots & \ddots & \cdots \\ { - {\kern 1pt} 1} & { - {\kern 1pt} 1} & \cdots & 0 \\ \end{array} } \right) $$

(106)

Indeed, the main eigenvalues and eigenvectors meet the equalities

$$ \left( {\begin{array}{*{20}c} 0 & { - 1} & \cdots & { - 1} \\ { - 1} & 0 & \cdots & { - 1} \\ \cdots & \cdots & \ddots & \cdots \\ { - 1} & { - 1} & \cdots & 0 \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {z_{1} } \\ {z_{2} } \\ \vdots \\ {z_{n} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} { - \sum\nolimits_{j = 1,j \ne 1}^{n} {z_{j} } } \\ { - \sum\nolimits_{j = 1,j \ne 2}^{n} {z_{j} } } \\ \vdots \\ { - \sum\nolimits_{j = 1,j \ne n}^{n} {z_{j} } } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {z_{1} } \\ {z_{2} } \\ \vdots \\ {z_{n} } \\ \end{array} } \right),\;\;\left( {\begin{array}{*{20}c} {z_{1} = - \sum\nolimits_{j = 1,j \ne 1}^{n} {z_{j} } = - \sum\nolimits_{j = 1}^{n} {z_{j} } + z_{1} } \\ {z_{2} = - \sum\nolimits_{j = 1,j \ne 2}^{n} {z_{j} } = - \sum\nolimits_{j = 1}^{n} {z_{j} } + z_{2} } \\ \vdots \\ {z_{n} = - \sum\nolimits_{j = 1,j \ne n}^{n} {z_{j} } = - \sum\nolimits_{j = 1}^{n} {z_{j} } + z_{n} } \\ \end{array} } \right), $$

$$ \sum\limits_{l = 1}^{n} {z_{l} } = - n\sum\limits_{j = 1}^{n} {z_{j} } + \sum\limits_{l = 1}^{n} {z_{l} } ,\;\;n\sum\limits_{j = 1}^{n} {z_{l} } = \sum\limits_{l = 1}^{n} {z_{l} } - \sum\limits_{l = 1}^{n} {z_{l} } = 0,{\text{ i}}.{\text{e}}.,{\mathbf{1}}^{\text{T}} {\mathbf{z}}_{i} = 0,i = 1, \ldots ,n. $$

The respective equalities for the last eigenvalue and eigenvector have the form

$$ \left( {\begin{array}{*{20}c} 0 & { - {\kern 1pt} 1} & \cdots & { - {\kern 1pt} 1} \\ { - {\kern 1pt} 1} & 0 & \cdots & { - {\kern 1pt} 1} \\ \cdots & \cdots & \ddots & \cdots \\ { - {\kern 1pt} 1} & { - {\kern 1pt} 1} & \cdots & 0 \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {z_{1} } \\ {z_{2} } \\ \vdots \\ {z_{n} } \\ \end{array} } \right) = - (n - 1)\left( {\begin{array}{*{20}c} {z_{1} } \\ {z_{2} } \\ \vdots \\ {z_{n} } \\ \end{array} } \right),\;z_{1} = z_{2} = \ldots = z_{n} ,\;{\text{i}}.{\text{e}}.\;{\mathbf{z}}_{n} = {\mathbf{1}}\, \in \,{\mathbf{\mathbb{R}}}^{n} ,{\text{QED}}. $$

Lemma 7.2.

Quadratic form $ q({\varvec{\upbeta}}) = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}\,{\varvec{\upbeta}} $ is positive $ q({\varvec{\upbeta}}) > 0 $, when $ {\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0 $ and $ {\varvec{\upbeta}} \ne {\mathbf{0}} \in {\mathbf{\mathbb{R}}}^{n} $, hence, matrix $ {\mathbf{B}} $ (106) is conditionally positive definite.

Indeed, due to Lemma 7.1, $ {\mathbf{B}} = \sum\nolimits_{i = 1}^{n} {\uplambda_{i} {\mathbf{z}}_{i} {\mathbf{z}}_{i}^{\text{T}} } = \sum\nolimits_{i = 1}^{n - 1} {{\mathbf{z}}_{i} {\mathbf{z}}_{i}^{\text{T}} } - (n - 1){\mathbf{z}}_{n} {\mathbf{z}}_{n}^{\text{T}} = $$ \sum\nolimits_{i = 1}^{n - 1} {{\mathbf{z}}_{i} {\mathbf{z}}_{i}^{\text{T}} } - (n - 1){\mathbf{11}}^{\text{T}} $. Let us consider the quadratic form $ q({\varvec{\upbeta}}) = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}\,{\varvec{\upbeta}} = \sum\nolimits_{i = 1}^{n - 1} {{\varvec{\upbeta}}^{\text{T}} {\mathbf{z}}_{i} {\mathbf{z}}_{i}^{\text{T}} {\varvec{\upbeta}}} - (n - 1){\varvec{\upbeta}}^{\text{T}} {\mathbf{11}}^{\text{T}} {\varvec{\upbeta}} $. On the hyperplane $ {\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0 $, it has the values $ \left. {q({\varvec{\upbeta}})} \right|_{{{\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0}} = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}\,{\varvec{\upbeta}} = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}\,{\varvec{\upbeta}} $. Here matrix $ {\mathbf{B}} = \sum\nolimits_{i = 1}^{n - 1} {{\mathbf{z}}_{i} {\mathbf{z}}_{i}^{\text{T}} } $ is positive definite, i.e., $ \left. {q({\varvec{\upbeta}})} \right|_{{{\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0}} > 0 $, $ {\varvec{\upbeta}} \ne {\mathbf{0}} \in {\mathbf{\mathbb{R}}}^{n} $, QED.

Now we are ready to prove Theorem 7.

We have to prove that matrix $ {\mathbf{B}}(\upalpha) $ is conditionally positive definite if $ \upalpha $ is large enough. Let us consider the quadratic form

$$ {\mathbf{B}}(\upalpha) = \left( {\begin{array}{*{20}c} 0 & { -\uprho^{2} (\upomega_{1} ,\upomega_{2} |\upalpha)} & \cdots & { -\uprho^{2} (\upomega_{1} ,\upomega_{n} |\upalpha)} \\ { -\uprho^{2} (\upomega_{2} ,\upomega_{1} |\upalpha)} & 0 & \cdots & { -\uprho^{2} (\upomega_{2} ,\upomega_{n} |\upalpha)} \\ \cdots & \cdots & \ddots & \cdots \\ { -\uprho^{2} (\upomega_{n} ,\upomega_{1} |\upalpha)} & { -\uprho^{2} (\upomega_{n} ,\upomega_{2} |\upalpha)} & \cdots & 0 \\ \end{array} } \right) $$

$ q({\mathbf{\beta |}}\upalpha) = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}(\upalpha){\varvec{\upbeta}} $ on the intersection of the hyperplane $ {\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0 $ and hypersphere $ {\varvec{\upbeta}}^{\text{T}} \,{{\varvec{\upbeta}} = 1} $. Since $ q({\varvec{\upbeta}}\,{\mathbf{|}}\,\upalpha) $ is continuous function of $ \upalpha $, $ \mathop {\lim }\limits_{{\upalpha \to \infty }} {\mathbf{B}}(\upalpha) = {\mathbf{B}} $ and, so, $ \mathop {\lim }\limits_{{\upalpha \to \infty }} q({\varvec{\upbeta}}\,{\mathbf{|}}\,\upalpha) = $ $ \mathop {\lim }\limits_{{\upalpha \to {\infty }}} {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}(\upalpha){\varvec{\upbeta}} = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}\,{\varvec{\upbeta}}\,{\mathbf{ = }}q({\varvec{\upbeta}}). $ In accordance with Lemma 7.1, matrix $ {\mathbf{B}} $ (106) is conditionally positive definite, hence, due to Lemma 7.2, there exists $ \upalpha_{0} $, such that $ \left. {q({\mathbf{\beta |}}\upalpha)} \right|_{{{\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0}} > 0 $, $ {\varvec{\upbeta}} \ne {\mathbf{0}} \in {\mathbf{\mathbb{R}}}^{n} $, i.e., $ {\mathbf{B}}(\upalpha) $ is conditionally positive definite if $ \upalpha >\upalpha_{0} $. ■

Proof of Theorem 8.

Let the training assembly consist of four objects $ N = 4 $:

$$ \begin{aligned} &\uprho(\upomega_{1} ,\upomega_{2} ) =\uprho(\upomega_{3} ,\upomega_{4} ) = \sqrt 2 ,\;\uprho(\upomega_{1} ,\upomega_{3} ) =\uprho(\upomega_{1} ,\upomega_{4} ) =\uprho(\upomega_{2} ,\upomega_{3} ) =\uprho(\upomega_{2} ,\upomega_{4} ) = 1, \\ & y_{1} = y_{2} = 1,\;y_{3} = y_{4} = - 1. \\ \end{aligned} $$

(107)

Let us try to find to find a decision rule (68) that correctly classifies all the objects:

$$ \left\{ \begin{aligned} \underbrace {{\left( { -\uprho^{2} (\upomega_{1} ,\upomega_{1} )} \right)}}_{0}a_{1} {\kern 1pt} + {\kern 1pt} \underbrace {{\left( { -\uprho^{2} (\upomega_{2} ,\upomega_{1} )} \right)}}_{ - 2}a_{2} {\kern 1pt} + {\kern 1pt} \underbrace {{\left( { -\uprho^{2} (\upomega_{3} ,\upomega_{1} )} \right)}}_{ - 1}a_{3} + \underbrace {{\left( { -\uprho^{2} (\upomega_{4} ,\upomega_{1} )} \right)}}_{ - 1}a_{4} + b > 0, \hfill \\ \underbrace {{\left( { -\uprho^{2} (\upomega_{1} ,\upomega_{2} )} \right)}}_{ - 2}a_{1} + \underbrace {{\left( { -\uprho^{2} (\upomega_{2} ,\upomega_{2} )} \right)}}_{0}a_{2} + \underbrace {{\left( { -\uprho^{2} (\upomega_{3} ,\upomega_{2} )} \right)}}_{ - 1}a_{3} + \underbrace {{\left( { -\uprho^{2} (\upomega_{4} ,\upomega_{2} )} \right)}}_{ - 1}a_{4} + b > 0, \hfill \\ \underbrace {{\left( { -\uprho^{2} (\upomega_{1} ,\upomega_{3} )} \right)}}_{ - 1}a_{1} + \underbrace {{\left( { -\uprho^{2} (\upomega_{2} ,\upomega_{3} )} \right)}}_{ - 1}a_{2} + \underbrace {{\left( { -\uprho^{2} (\upomega_{3} ,\upomega_{3} )} \right)}}_{0}a_{3} + \underbrace {{\left( { -\uprho^{2} (\upomega_{4} ,\upomega_{3} )} \right)}}_{ - 2}a_{4} + b < 0, \hfill \\ \underbrace {{\left( { -\uprho^{2} (\upomega_{1} ,\upomega_{4} )} \right)}}_{ - 1}a_{1} + \underbrace {{\left( { -\uprho^{2} (\upomega_{2} ,\upomega_{4} )} \right)}}_{ - 1}a_{2} {\kern 1pt} + {\kern 1pt} \underbrace {{\left( { -\uprho^{2} (\upomega_{3} ,\upomega_{4} )} \right)}}_{ - 2}a_{3} + \underbrace {{\left( { -\uprho^{2} (\upomega_{4} ,\upomega_{4} )} \right)}}_{0}a_{4} + b < 0. \hfill \\ \end{aligned} \right. $$

(108)

Then, the numbers $ (a_{1} ,a_{2} ,a_{3} ,a_{4} ,b) $ have to meet the equalities

$$ \left\{ \begin{aligned} - 2a_{2} - a_{3} - a_{4} + b > 0, \hfill \\ - 2a_{1} - a_{3} - a_{4} + b > 0, \hfill \\ - a_{1} - a_{2} - 2a_{4} + b < 0, \hfill \\ - a_{1} - a_{2} - 2a_{3} + b < 0, \hfill \\ \end{aligned} \right. \;{\text{or,}}\;{\text{what}}\;{\text{is}}\,{\text{equivalent,}}\;\;\left\{ \begin{aligned} \left\{ \begin{aligned} - 2a_{2} - a_{3} - a_{4} + b > 0, \hfill \\ - 2a_{1} - a_{3} - a_{4} + b > 0, \hfill \\ \end{aligned} \right. \hfill \\ \left\{ \begin{aligned} a_{1} + a_{2} + 2a_{4} - b > 0, \hfill \\ a_{1} + a_{2} + 2a_{3} - b > 0. \hfill \\ \end{aligned} \right. \hfill \\ \end{aligned} \right. $$

(109)

Adding the left and right parts of the first two inequalities results in the inequality $ - (a_{1} + a_{2} ) - $$ a_{3} - a_{4} + b > 0 $, i.e. $ a_{1} + a_{2} + a_{3} + a_{4} - b < 0 $, and the same operation applied to the second two inequalities gives $ a_{1} + a_{2} + a_{3} + a_{4} - b > 0 $. It is clear that these two pairs of inequalities are incompatible, hence, the inequalities (109) are incompatible, too. Thus, the decision rule of kind (69), which would correctly classify (108) the assembly (107), does not exist. ■

Proof of Theorem 9.

Let us consider an arbitrary object of the training set $ \upomega_{j} \in\Omega $. In accordance with (69), the decision rule (101) can be represented as

$$ d(\upomega_{j} |a_{1} , \ldots ,a_{N} ,b,\upalpha) = \frac{{1 +\upalpha}}{\upalpha}\left[ {a_{j} + \sum\limits_{i = 1,\;i \ne j}^{n} {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{j} ,\upomega_{i} )} \right)a_{i} } } \right] + b. $$

Indeed,

$$ \begin{aligned} & \sum\limits_{i = 1}^{n} {\left( { -\uprho^{2} (\upomega_{j} ,\upomega_{i} |\upalpha)} \right)a_{i} } + b = \frac{{1 +\upalpha}}{\upalpha}\sum\limits_{i = 1}^{n} {\left[ {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{j} ,\upomega_{i} )} \right) - 1} \right]a_{i} } + b = \\ & \quad \quad \quad \frac{1 + \alpha }{\alpha }\sum\limits_{i = 1}^{n} {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{j} ,\upomega_{i} )} \right)a_{i} } - \frac{{1 +\upalpha}}{\upalpha}\underbrace {{\sum\limits_{i = 1}^{n} {a_{i} } }}_{ = 0} + b = \\ & \quad \quad \quad \quad \quad \quad \quad \quad \frac{{1 +\upalpha}}{\upalpha}\underbrace {{\exp \left( { -\upalpha \uprho ^{2} (\upomega_{j} ,\upomega_{j} )} \right)}}_{ = 1}a_{j} + \frac{{1 +\upalpha}}{\upalpha}\sum\limits_{i = 1,\;i \ne j}^{n} {\exp \left( { -\upalpha \uprho ^{2} (\upomega_{j} ,\upomega_{i} )} \right)a_{i} } + b. \\ \end{aligned} $$

Since $ \mathop {\lim }\limits_{{\upalpha \to {\infty }}} \left[ {(1 +\upalpha)/\upalpha} \right] = 1 $, we have

$$ \mathop {\lim }\limits_{{\upalpha \to {\infty }}} d(\upomega_{j} |a_{1} , \ldots ,a_{N} ,b,\upalpha) = a_{j} + \sum\limits_{i = 1,\;i \ne j}^{n} {\underbrace {{\mathop {\lim }\limits_{{\upalpha \to \infty }} \exp \left( { -\upalpha \uprho ^{2} (\upomega_{j} ,\upomega_{i} )} \right)}}_{ = 0}a_{i} } + b = a_{j} + b. $$

Let $ (a_{j} ,\;j = 1, \ldots ,N,b) $ be the parameter vector such that $ a_{j} > 0 $ if $ y_{j} = 1 $, $ a_{j} < 0 $ if $ y_{j} = - 1 $, and $ b = 0 $. Then

$$ \mathop {\lim }\limits_{{\upalpha \to {\infty }}} d(\upomega_{j} |a_{1} , \ldots ,a_{N} ,b,\upalpha) = a_{j} \left\{ {\begin{array}{*{20}l} { > 0,\;y_{j} = 1,} \hfill \\ { < 0,\;y_{j} = - 1,} \hfill \\ \end{array} } \right. $$

i.e., parameter vector $ (a_{j} ,j{\kern 1pt} = \,1, \ldots ,N,b) $ correctly separates the entire training set. ■

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mottl, V., Seredin, O., Krasotkina, O. (2018). Compactness Hypothesis, Potential Functions, and Rectifying Linear Space in Machine Learning. In: Rozonoer, L., Mirkin, B., Muchnik, I. (eds) Braverman Readings in Machine Learning. Key Ideas from Inception to Current State. Lecture Notes in Computer Science(), vol 11100. Springer, Cham. https://doi.org/10.1007/978-3-319-99492-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-99492-5_3
Published: 23 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99491-8
Online ISBN: 978-3-319-99492-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Compactness Hypothesis, Potential Functions, and Rectifying Linear Space in Machine Learning

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix. The Proofs of Theorems

Appendix. The Proofs of Theorems

Proof of Theorem 1.

Proof of Theorem 2.

Proof of Theorem 3.

Lemma 3.1.

Proof of the Theorem.

Proof of Theorem 4.

Proof of Theorem 5.

Proof of Theorem 6.

Proof of Theorem 7.

Lemma 7.1.

Lemma 7.2.

Proof of Theorem 8.

Proof of Theorem 9.

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation