Mathematical Models of Supervised Learning and Application to Medical Diagnosis

  • Roberta De AsmundisEmail author
  • Mario Rosario Guarracino
Part of the Fields Institute Communications book series (FIC, volume 63)


Supervised learning models are applicable in many fields of science and technology, such as economics, engineering and medicine. Among supervised learning algorithms, there are the so-called Support Vector Machines (SVM), exhibiting accurate solutions and low training time. They are based on the statistical learning theory and provide the solution by minimizing a quadratic type cost function. SVM, in conjunction with the use of kernel methods, provide non-linear classification models, namely separations that cannot be expressed using inequalities on linear combinations of parameters. There are some issues that may reduce the effectiveness of these methods. For example, in multi-center clinical trials, experts from different institutions collect data on many patients. In this case, techniques currently in use determine the model considering all the available data. Although they are well suited to cases under consideration, they do not provide accurate answers in general. Therefore, it is necessary to identify a subset of the training set which contains all available information, providing a model that still generalizes to new testing data. It is also possible that the training sets vary over time, for example, because data are added and modified as a result of new tests or new knowledge. In this case, the current techniques are not able to capture the changes, but need to start the learning process from the beginning. The techniques, which extract only the new knowledge contained in the data and provide the learning model in an incremental way, have the advantage of taking into account only the experiments really useful and speed up the analysis. In this paper, we describe some solutions to these problems, with the support of numerical experiments on the discrimination among differ types of leukemia.


Support Vector Machine Acute Myeloid Leukemia Generalize Eigenvalue Problem Optimal Hyperplane Acute Myeloid Leukemia Sample 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    M. Schena, D. Shalon, R.W. Davis, P.O. Brown, Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270 (1995)Google Scholar
  2. 2.
    T. Barrett, D.B. Troup, S.E., Wilhite, P. Ledoux, C. Evangelista, I.F. Kim, M. Tomashevsky, K.A. Marshall, K.H. Phillippy, P.M. Sherman, R.N. Muertter, M. Holko, O. Ayanbule, A. Yefanov, A. Soboleva, NCBI GEO: Archive for functional genomics data sets–10 years on. Nucl. Acids Res. 39, D1005–D1010 (2011)Google Scholar
  3. 3.
    Parkinson et al., ArrayExpress update – an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucl. Acids Res. (2010)Google Scholar
  4. 4.
    A. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, A.J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. U.S.A. 96(12), 6745–6750 (1999)CrossRefGoogle Scholar
  5. 5.
    Golub et al., Molecular classifcation of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)Google Scholar
  6. 6.
    I. Hedenfalk, D. Duggan, Y. Chen, M. Radmacher, R. Simon, P. Meltzer, B. Gusterson, M. Esteller, M. Raffeld, Z. Yakhini, A. Ben-Dor, E. Dougherty, J. Kononen, L. Bubendorf, W. Fehrle, S. Pttalunga, S. Gruvberger, N. Loman, O. Johannsson, H. Olsson, B. Wilfond, G. Sauter, O.P. Kallioniemi, A. Borg, J. Trent, Gene-expression profiles in hereditary breast cancer. New Engl. J. Med. 344, 539–548 (2001)CrossRefGoogle Scholar
  7. 7.
    D. Singh, P.G. Febbo, K. Ross, D.G. Jackson, J. Manola, C. Ladd, P. Tamayo, A.A. Renshaw, A.V. D’Amico, J.P. Richie, E.S. Lander, M. Loda, P.W. Kantoff, T.R. Golub, W.R. Sellers, Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)CrossRefGoogle Scholar
  8. 8.
    L.J. van’t Veer, H. Dai, M.J. Van De Vijver, T.D. He, A.A.M. Hart, M. Mao, H.L. Peterse, K. Van Der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, P.S. Linsley, R. Bernards, S.H. Friend, Gene expression profiling predicts clinical outcome of breast cancer. Nature 415 (2002)Google Scholar
  9. 9.
    C.L. Nutt, D.R. Mani, R.A. Betensky, P. Tamayo, J.G. Cairncross, C. Ladd, U. Pohl, C. Hartmann, M.F. McLaughlin, T.T. Batchelor, P.M. Black, A. von Deimling, S.L. Pomeroy, T.R. Golub, D.N. Louis, Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63(7), 1602–1607 (2003)Google Scholar
  10. 10.
    N. Iizuka, M. Oka, H. Yamada Okabe, M. Nishida, Y. Maeda, N. Mori, T. Takao, T. Tamesa, A. Tangoku, H. Tabuchi, K. Hamada, H. Nakayama, H. Ishitsuka, T. Miyamoto, A. Hirabayashi, S. Uchimura, Y. Hamamoto, Oligonucleotide microarray for prediction of early intrahepatic recurrence of hepatocellular carcinoma after curative resection. The Lancet 361, 923–929 (2003)CrossRefGoogle Scholar
  11. 11.
    S. Baginsky, L. Henning, P. Zimmermann, W. Gruissem, Gene expression analysis, proteomics, and network discovery. Plant Physiol. 152, 402–410 (2010); American Society of Plant BiologistsGoogle Scholar
  12. 12.
    V. Vapnik, The Nature of Statistical Learning Theory (Springer, New York, 1995)zbMATHGoogle Scholar
  13. 13.
    C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20, 273–297 (1995)zbMATHGoogle Scholar
  14. 14.
    B.E. Boser, I.M. Guyon, V.N. Vapnik, A Training Algorithm for Optimal Margin Classifiers. 5th Annual ACM Workshop on COLT, Pittsburgh, PA, 1992, pp. 144–152Google Scholar
  15. 15.
    O.L. Mangasarian, E.W. Wild, Multisurface proximal support vector classification via generalized eigenvalues. IEEE Trans. Pattern Anal. Mach. Intell. 27(12) (2005)Google Scholar
  16. 16.
    B. Schölop, A.J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (MIT, MA, 2001)Google Scholar
  17. 17.
    M.R. Guarracino, C. Cifarelli, O. Seref, P.M. Pardalos, A classification method based on generalized eigenvalue problems. Optim. Meth. Software 22, 73–81 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    C. Cifarelli, M.R. Guarracino, O. Seref, S. Cuciniello, P.M. Pardalos, Incremental classifcation with generalized eigenvalues. J. Class. 24(2), 205–219 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
  19. 19.
    I. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  20. 20.
    E.S. Lander et al., Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)CrossRefGoogle Scholar
  21. 21.
    D. Wheeler et al., The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008)CrossRefGoogle Scholar
  22. 22.
    Ten Years of Genetics and Genomics: What Have We Achieved and Where are We Heading? Nature Reviews Genetics, AOP, published online (2010)Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Roberta De Asmundis
    • 1
    Email author
  • Mario Rosario Guarracino
    • 2
  1. 1.Department of Statistical Sciences (DSS)University of Rome ‘La Sapienza’RomeItaly
  2. 2.High Performance Computing and Networking InstituteItalian National Research CouncilNaplesItaly

Personalised recommendations