Advertisement

Compactness Hypothesis, Potential Functions, and Rectifying Linear Space in Machine Learning

  • Vadim MottlEmail author
  • Oleg Seredin
  • Olga Krasotkina
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11100)

Abstract

Emmanuel Braverman was one of the very few thinkers who, during his extremely short life, managed to inseminate several seemingly completely different areas of science. This paper overviews one of the knowledge areas he essentially affected in the sixties years of the last century, namely, the area of Machine Learning. Later, Vladimir Vapnik proposed a more engineering-oriented name of this knowledge area – Estimation of Dependencies Based on Empirical Data. We shall consider these titles as synonyms. The aim of the paper is to briefly trace the way how three notions introduced by Braverman formed the core of the contemporary Machine Learning doctrine. These notions are: (1) compactness hypothesis, (2) potential function, and (3) the rectifying linear space, in which the former two have resulted. There will be little new in this paper. Almost all the constructions we are going to speak about had been published by numerous scientists. The novelty is, perhaps, only in that all these issues will be systematically considered together as immediate consequences of Braveman’s basic principles.

Keywords

Set of real-world objects Pattern recognition Numerical regression Ordinal regression Compactness hypothesis Precedent-based learning Distance representation modalities Pseudo Euclidean liner space Regularized empirical risk minimization Potential function Distance transformation Selective fusion of distances 

Notes

Acknowledgements

We would like to acknowledge support from grants of the Russian Foundation for Basic Research 14-07-00527, 16-57-52042, 17-07-00436, 17-07-00993, 18-07-01087, 18-07-00942, and from Tula State University within the framework of the scientific project № 2017-62PUBL.

References

  1. 1.
    Braverman, E.M.: Experiments on machine learning to recognize visual patterns. Autom. Remote Control 23, 315–327 (1962). Translated from Russian Autimat. i Telemekh. 23, 349–364 (1962)Google Scholar
  2. 2.
    Arkadʹev, A.G., Braverman, E.M.: Computers and Pattern Recognition. Thompson Book Company, Washington (1967). 115 p.Google Scholar
  3. 3.
    Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)zbMATHGoogle Scholar
  4. 4.
    Vapnik, V.: Estimation of Dependences Based on Empirical Data. Springer, New York (1982).  https://doi.org/10.1007/0-387-34239-7CrossRefzbMATHGoogle Scholar
  5. 5.
    Duin, R.P.W.: Compactness and complexity of pattern recognition problems. In: Proceedings of International Symposium on Pattern Recognition “In Memoriam Pierre Devijver”, Brussels, B, 12 February, Royal Military Academy, pp. 124–128 (1999)Google Scholar
  6. 6.
    Aizerman, M., Braverman, E., Rozonoer, L.: Theoretical foundations of the potential function method in pattern recognition learning. Autom. Remote Control 25, 917–936 (1964)Google Scholar
  7. 7.
    Mercer, J.: Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. A 209, 415–446 (1909)CrossRefGoogle Scholar
  8. 8.
    Goldfarb, L.: A unified approach to pattern recognition. Pattern Recogn. 17, 575–582 (1984)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Goldfarb, L.: A New Approach to Pattern Recognition. Progress in Pattern Recognition, Elsevier Science Publishers BV 2, 241–402 (1985)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Pękalska, E., Duin, R.P.W.: Dissimilarity representations allow for building good classifiers. Pattern Recogn. Lett. 23(8), 943–956 (2002)CrossRefGoogle Scholar
  11. 11.
    Pekalska, E., Duin, R.P.W.: The Dissimilarity Representation for Pattern Recognition: Foundations and Applications. World Scientific Publishing Co. Inc., River Edge (2005)CrossRefGoogle Scholar
  12. 12.
    Haasdonk, B., Pekalska, E.: Indefinite kernel Fisher discriminant. In: Proceedings of the 19th International Conference on Pattern Recognition, Tampa, USA, 8–11 December 2008Google Scholar
  13. 13.
    Duin, R.P.W., Pękalska, E.: Non-Euclidean dissimilarities: causes and informativeness. In: Hancock, E.R., Wilson, R.C., Windeatt, T., Ulusoy, I., Escolano, F. (eds.) SSPR/SPR 2010. LNCS, vol. 6218, pp. 871–880. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-14980-1_31CrossRefGoogle Scholar
  14. 14.
    Haasdonk, B.: Feature space interpretation of SVMs with indefinite kernels. TPAMI 25, 482–492 (2005)CrossRefGoogle Scholar
  15. 15.
    Pękalska, E., Harol, A., Duin, R.P.W., Spillmann, B., Bunke, H.: Non-Euclidean or non-metric measures can be informative. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) SSPR/SPR 2006. LNCS, vol. 4109, pp. 871–880. Springer, Heidelberg (2006).  https://doi.org/10.1007/11815921_96CrossRefGoogle Scholar
  16. 16.
    Duin, R., Pekalska, E., De Ridder, D.: Relational discriminant analysis. Pattern Recogn. Lett. 20, 1175–1181 (1999)CrossRefGoogle Scholar
  17. 17.
    Maria-Florina Balcan, M.-F., Blum, A., Srebro, N.: A theory of learning with similarity functions. Mach. Learn. 72, 89–112 (2008)CrossRefGoogle Scholar
  18. 18.
    Nelder, J., Wedderburn, R.: Generalized linear models. J. Roy. Stat. Soc. Ser. A (Gen.) 135(3), 370–384 (1972)CrossRefGoogle Scholar
  19. 19.
    McCullagh, P., Nelder, J.: Generalized Linear Models, 511 p., 2nd edn. Chapman and Hall, London (1989)CrossRefGoogle Scholar
  20. 20.
    Mottl, V., Krasotkina, O., Seredin, O., Muchnik, I.: Principles of multi-kernel data mining. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 52–61. Springer, Heidelberg (2005).  https://doi.org/10.1007/11510888_6CrossRefGoogle Scholar
  21. 21.
    Tatarchuk, A., Urlov, E., Mottl, V., Windridge, D.: A support kernel machine for supervised selective combining of diverse pattern-recognition modalities. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS, vol. 5997, pp. 165–174. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-12127-2_17CrossRefGoogle Scholar
  22. 22.
    Gonen, M., Alpaydın, E.: Multiple kernel learning algorithms. J. Mach. Learn. Res. 12, 2211–2268 (2011)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2001)zbMATHGoogle Scholar
  24. 24.
    Deza, M., Deza, E.: Encyclopedia of Distances. Springer, Heidelberg (2006).  https://doi.org/10.1007/978-3-642-00234-2CrossRefzbMATHGoogle Scholar
  25. 25.
    Azizov, T.Y., Iokhvidov, I.S.: Linear Operators in Spaces with an Indefinite Metric. Wiley, Chichester (1989)zbMATHGoogle Scholar
  26. 26.
    Langer, H.: Krein space. In: Hazewinkel, M. (ed.) Encyclopaedia of Mathematics (set). Springer, Netherlands (1994)Google Scholar
  27. 27.
    Ong, C.S., Mary, X., Canu, S., Smola, A.: Learning with non-positive kernels. In: Proceedings of the Twenty-First International Conference on Machine learning, ICML 2004, Banff, Alberta, Canada, 04–08 July 2004Google Scholar
  28. 28.
    Bugrov, S., Nikolsky, S.M.: Fundamentals of Linear Algebra and Analytical Geometry. Mir, Moscow (1982)zbMATHGoogle Scholar
  29. 29.
    Vapnik, V.: The Nature of Statistical Learning Theory. Information Science and Statistics. Springer, New York (2000).  https://doi.org/10.1007/978-1-4757-3264-1CrossRefzbMATHGoogle Scholar
  30. 30.
    Guyon, I., Vapnik, V.N., Boser, B.E., Bottou, L., Solla, S.A.: Structural risk minimization for character recognition. In: Advances in Neural Information Processing Systems, vol. 4. Morgan Kaufman, Denver (1992)Google Scholar
  31. 31.
    Wilson, J.R., Lorenz, K.A.: Short history of the logistic regression model. Modeling Binary Correlated Responses using SAS, SPSS and R. IBSS, vol. 9, pp. 17–23. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-23805-0_2CrossRefzbMATHGoogle Scholar
  32. 32.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learning 20, 273–297 (1995)zbMATHGoogle Scholar
  33. 33.
    Tikhonov, A.N.: On the stability of inverse problems. Dokl. Akad. Nauk SSSR 39(5), 195–198 (1943)MathSciNetGoogle Scholar
  34. 34.
    Tikhonov, A.N.: Solution of incorrectly formulated problems and the regularization method. Sov. Math. 4, 1035–1038 (1963)zbMATHGoogle Scholar
  35. 35.
    Tikhonov, A.N., Arsenin, V.Y.: Solution of Ill-Posed Problems. Winston & Sons, Washington (1977)zbMATHGoogle Scholar
  36. 36.
    Hoerl, A.E., Kennard, D.J.: Application of ridge analysis to regression problems. Chem. Eng. Prog. 58, 54–59 (1962)Google Scholar
  37. 37.
    Vinod, H.D., Ullah, A.: Recent advances in regression methods, vol. 41. In: Statistics: Textbooks and Monographs. Marcel Dekker Inc., New York (1981)Google Scholar
  38. 38.
    Mottl, V., Dvoenko, S., Seredin, O., Kulikowski, C., Muchnik, I.: Featureless pattern recognition in an imaginary Hilbert space and its application to protein fold classification. In: Perner, P. (ed.) MLDM 2001. LNCS (LNAI), vol. 2123, pp. 322–336. Springer, Heidelberg (2001).  https://doi.org/10.1007/3-540-44596-X_26CrossRefzbMATHGoogle Scholar
  39. 39.
    Frank, I.E., Friedman, J.H.: A statistical view of some chemometrics regression tools. Technometrics 35, 109–148 (1993)CrossRefGoogle Scholar
  40. 40.
    Fu, W.J.: Penalized regression: the bridge versus the LASSO. J. Comput. Graph. Stat. 7, 397–416 (1998)MathSciNetGoogle Scholar
  41. 41.
    Mottl, V., Seredin, O., Krasotkina, O., Muchnik, I.: Fusion of Euclidean metrics in featureless data analysis: an equivalent of the classical problem of feature selection. Pattern Recogn. Image Anal. 15(1), 83–86 (2005)Google Scholar
  42. 42.
    Mottl, V., Seredin, O., Krasotkina, O., Mochnik, I.: Kernel fusion and feature selection in machine learning. In: Proceedings of the 8th IASTED International Conference on Intelligent Systems and Control, Cambridge, USA, 31 October–2 November, 2005, pp. 477–482Google Scholar
  43. 43.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. 67, 301–320 (2005)MathSciNetCrossRefGoogle Scholar
  44. 44.
    Wang, L., Zhu, J., Zou, H.: The doubly regularized support vector machine. Statistica Sinica 16, 589–615 (2006)MathSciNetzbMATHGoogle Scholar
  45. 45.
    Tibshirani, R.J.: Regression shrinkage and selection via the LASSO. J. Roy. Stat. Soc. Ser. B 58, 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  46. 46.
    Tibshirani, R.J.: The LASSO method for variable selection in the Cox model. Stat. Med. 16, 385–395 (1997)CrossRefGoogle Scholar
  47. 47.
    Tatarchuk, A., Mottl, V., Eliseyev, A., Windridge, D.: Selectivity supervision in combining pattern-recognition modalities by feature- and kernel-selective support vector machines. In: Proceedings of the 19th International Conference on Pattern Recognition, ICPR-2008, vol. 1–6, pp. 2336–2339 (2008)Google Scholar
  48. 48.
    Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. Theor. Methods 96(456), 1348–1360 (2001)MathSciNetCrossRefGoogle Scholar
  49. 49.
    Krasotkina, O., Mottl, V.A.: Bayesian approach to sparse Cox regression in high-dimensional survival analysis. In: Proceedings of the 11th International Conference on Machine Learning and Data Mining (MLDM 2015), Hamburg, Germany, 20–23 July 2015, pp. 425–437CrossRefGoogle Scholar
  50. 50.
    Krasotkina, O., Mottl, V.A.: Bayesian approach to sparse learning-to-rank for search engine optimization. In: Proceedings of the 11th International Conference on Machine Learning and Data Mining (MLDM 2015), Hamburg, Germany, 20–23 July 2015, pp. 382–394CrossRefGoogle Scholar
  51. 51.
    Tatarchuk, A., Sulimova, V., Windridge, D., Mottl, V., Lange, M.: Supervised selective combining pattern recognition modalities and its application to signature verification by fusing on-line and off-line kernels. In: Benediktsson, J.A., Kittler, J., Roli, F. (eds.) MCS 2009. LNCS, vol. 5519, pp. 324–334. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-02326-2_33CrossRefGoogle Scholar
  52. 52.
    Razin, N., et al.: Application of the multi-modal relevance vector machine to the problem of protein secondary structure prediction. In: Shibuya, T., Kashima, H., Sese, J., Ahmad, S. (eds.) PRIB 2012. LNCS, vol. 7632, pp. 153–165. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-34123-6_14CrossRefGoogle Scholar
  53. 53.
    Tatarchuk, A., Sulimova, V., Torshin, I., Mottl, V., Windridge, D.: Supervised selective kernel fusion for membrane protein prediction. In: Comin, M., Käll, L., Marchiori, E., Ngom, A., Rajapakse, J. (eds.) PRIB 2014. LNCS, vol. 8626, pp. 98–109. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-09192-1_9CrossRefGoogle Scholar
  54. 54.
    Berg, C., Christensen, J.P.R., Ressel, P.: Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions. Springer, New York (1984).  https://doi.org/10.1007/978-1-4612-1128-0CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Tula State UniversityTulaRussia
  2. 2.Moscow State UniversityMoscowRussia

Personalised recommendations