Mathematical Models of Supervised Learning and Application to Medical Diagnosis

De Asmundis, Roberta; Guarracino, Mario Rosario

doi:10.1007/978-1-4614-4133-5_3

Roberta De Asmundis⁴ &
Mario Rosario Guarracino⁵

Part of the book series: Fields Institute Communications ((FIC,volume 63))

1127 Accesses

Abstract

Supervised learning models are applicable in many fields of science and technology, such as economics, engineering and medicine. Among supervised learning algorithms, there are the so-called Support Vector Machines (SVM), exhibiting accurate solutions and low training time. They are based on the statistical learning theory and provide the solution by minimizing a quadratic type cost function. SVM, in conjunction with the use of kernel methods, provide non-linear classification models, namely separations that cannot be expressed using inequalities on linear combinations of parameters. There are some issues that may reduce the effectiveness of these methods. For example, in multi-center clinical trials, experts from different institutions collect data on many patients. In this case, techniques currently in use determine the model considering all the available data. Although they are well suited to cases under consideration, they do not provide accurate answers in general. Therefore, it is necessary to identify a subset of the training set which contains all available information, providing a model that still generalizes to new testing data. It is also possible that the training sets vary over time, for example, because data are added and modified as a result of new tests or new knowledge. In this case, the current techniques are not able to capture the changes, but need to start the learning process from the beginning. The techniques, which extract only the new knowledge contained in the data and provide the learning model in an incremental way, have the advantage of taking into account only the experiments really useful and speed up the analysis. In this paper, we describe some solutions to these problems, with the support of numerical experiments on the discrimination among differ types of leukemia.

Mathematics Subject Classification (2010): Primary 68T10, Secondary 62H30

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

M. Schena, D. Shalon, R.W. Davis, P.O. Brown, Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270 (1995)
Google Scholar
T. Barrett, D.B. Troup, S.E., Wilhite, P. Ledoux, C. Evangelista, I.F. Kim, M. Tomashevsky, K.A. Marshall, K.H. Phillippy, P.M. Sherman, R.N. Muertter, M. Holko, O. Ayanbule, A. Yefanov, A. Soboleva, NCBI GEO: Archive for functional genomics data sets–10 years on. Nucl. Acids Res. 39, D1005–D1010 (2011)
Google Scholar
Parkinson et al., ArrayExpress update – an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucl. Acids Res. (2010)
Google Scholar
A. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, A.J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. U.S.A. 96(12), 6745–6750 (1999)
Article Google Scholar
Golub et al., Molecular classifcation of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Google Scholar
I. Hedenfalk, D. Duggan, Y. Chen, M. Radmacher, R. Simon, P. Meltzer, B. Gusterson, M. Esteller, M. Raffeld, Z. Yakhini, A. Ben-Dor, E. Dougherty, J. Kononen, L. Bubendorf, W. Fehrle, S. Pttalunga, S. Gruvberger, N. Loman, O. Johannsson, H. Olsson, B. Wilfond, G. Sauter, O.P. Kallioniemi, A. Borg, J. Trent, Gene-expression profiles in hereditary breast cancer. New Engl. J. Med. 344, 539–548 (2001)
Article Google Scholar
D. Singh, P.G. Febbo, K. Ross, D.G. Jackson, J. Manola, C. Ladd, P. Tamayo, A.A. Renshaw, A.V. D’Amico, J.P. Richie, E.S. Lander, M. Loda, P.W. Kantoff, T.R. Golub, W.R. Sellers, Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)
Article Google Scholar
L.J. van’t Veer, H. Dai, M.J. Van De Vijver, T.D. He, A.A.M. Hart, M. Mao, H.L. Peterse, K. Van Der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, P.S. Linsley, R. Bernards, S.H. Friend, Gene expression profiling predicts clinical outcome of breast cancer. Nature 415 (2002)
Google Scholar
C.L. Nutt, D.R. Mani, R.A. Betensky, P. Tamayo, J.G. Cairncross, C. Ladd, U. Pohl, C. Hartmann, M.F. McLaughlin, T.T. Batchelor, P.M. Black, A. von Deimling, S.L. Pomeroy, T.R. Golub, D.N. Louis, Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63(7), 1602–1607 (2003)
Google Scholar
N. Iizuka, M. Oka, H. Yamada Okabe, M. Nishida, Y. Maeda, N. Mori, T. Takao, T. Tamesa, A. Tangoku, H. Tabuchi, K. Hamada, H. Nakayama, H. Ishitsuka, T. Miyamoto, A. Hirabayashi, S. Uchimura, Y. Hamamoto, Oligonucleotide microarray for prediction of early intrahepatic recurrence of hepatocellular carcinoma after curative resection. The Lancet 361, 923–929 (2003)
Article Google Scholar
S. Baginsky, L. Henning, P. Zimmermann, W. Gruissem, Gene expression analysis, proteomics, and network discovery. Plant Physiol. 152, 402–410 (2010); American Society of Plant Biologists
Google Scholar
V. Vapnik, The Nature of Statistical Learning Theory (Springer, New York, 1995)
MATH Google Scholar
C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
B.E. Boser, I.M. Guyon, V.N. Vapnik, A Training Algorithm for Optimal Margin Classifiers. 5th Annual ACM Workshop on COLT, Pittsburgh, PA, 1992, pp. 144–152
Google Scholar
O.L. Mangasarian, E.W. Wild, Multisurface proximal support vector classification via generalized eigenvalues. IEEE Trans. Pattern Anal. Mach. Intell. 27(12) (2005)
Google Scholar
B. Schölop, A.J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (MIT, MA, 2001)
Google Scholar
M.R. Guarracino, C. Cifarelli, O. Seref, P.M. Pardalos, A classification method based on generalized eigenvalue problems. Optim. Meth. Software 22, 73–81 (2007)
Article MathSciNet MATH Google Scholar
C. Cifarelli, M.R. Guarracino, O. Seref, S. Cuciniello, P.M. Pardalos, Incremental classifcation with generalized eigenvalues. J. Class. 24(2), 205–219 (2007)
Article MathSciNet MATH Google Scholar
I. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
E.S. Lander et al., Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)
Article Google Scholar
D. Wheeler et al., The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008)
Article Google Scholar
Ten Years of Genetics and Genomics: What Have We Achieved and Where are We Heading? Nature Reviews Genetics, AOP, published online (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistical Sciences (DSS), University of Rome ‘La Sapienza’, Rome, Italy
Roberta De Asmundis
High Performance Computing and Networking Institute, Italian National Research Council, Naples, Italy
Mario Rosario Guarracino

Authors

Roberta De Asmundis
View author publications
You can also search for this author in PubMed Google Scholar
Mario Rosario Guarracino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roberta De Asmundis .

Editor information

Editors and Affiliations

, Department of Industrial & Systems Engin, University of Florida, Weil Hall 401, Gainesville, 32611, Florida, USA
Panos M. Pardalos
, Department of Mathematics, University of Waterloo, University Avenue West 200, Waterloo, N2L 3G1, Ontario, Canada
Thomas F. Coleman
, Department of Industrial Engineering, University of Central Florida, Central Florida Blvd 4000, Orlando, 32816, Florida, USA
Petros Xanthopoulos

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

De Asmundis, R., Guarracino, M.R. (2013). Mathematical Models of Supervised Learning and Application to Medical Diagnosis. In: Pardalos, P., Coleman, T., Xanthopoulos, P. (eds) Optimization and Data Analysis in Biomedical Informatics. Fields Institute Communications, vol 63. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4133-5_3

Download citation

DOI: https://doi.org/10.1007/978-1-4614-4133-5_3
Published: 20 July 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-4132-8
Online ISBN: 978-1-4614-4133-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics