Abstract
The prerequisite of any machine learning or data mining application is to have a clear target variable that the system will try to learn. In a supervised setting, we also need to know the value of this target variable for a set of training examples (i.e., patient records). In the case study presented in this chapter, the value of the considered target variable that can be used for training is the ground truth characterizations of the coronary artery disease severity or, as a different scenario, the progression of the patients. We either set as target variable the disease severity, or disease progression, and then we consider a two-class problem in which we aim to discriminate a group of patients that are characterized as “severely diseased” or “severely progressed,” from a second group containing “mildly diseased” or “mildly progressed” patients, respectively. This latter mild/severe characterization is the actual value of the target variable for each patient.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
M.W. Browne, “Cross-validation methods”, Journal of Mathematical Psychologyvol. 44, Issue 1, pp. 108–132, March 2000.
I.H. Witten, Eibe Frank, “Data Mining: Practical Machine Learning Tools and Techniques”, Morgan Kaufmann, June 2005.
R. Herbrich, Learning Kernel Classifiers, MIT Press, Cambridge, MA, 2002.
I. Guyon and A. Elisseeff, “Variable and feature selection”, Journal of Machine Learning Research, vol. 3, March 2003.
A. Gimelli, G. Rossi, P. Landi, P. Marzullo, G. Iervasi, A. L’Abbate, and Daniele Rovai, “Stress/Rest Myocardial Perfusion Abnormalities by Gated SPECT: Still the Best Predictor of Cardiac Events in Stable Ischemic Heart Disease”, Journal of Nuclear Medicine, vol. 50, Issue 4, April 2009.
I. Guyon, J. Weston, S. Barnhill, V. Vapnik, “Gene Selection for Cancer Classification using Support Vector Machines”, Machine Learning, vol. 46, Issue 1–3, pp. 389–422, 2002.
University of California – Irvine (UCI) Machine Learning Repository: http://archive.ics.uci.edu/ml.
R. Das, I. Turkoglu and A. Sengur, “Effective Diagnosis of Heart Disease through Neural Network Ensembles”, Expert Systems with Applications, vol. 36, pp. 7675–7680, 2009.
M.G. Tsipouras, T.P. Exarchos, D.I. Fotiadis, A.P. Kotsia, K.V. Vakalis, K.K. Naka, L.K. Michalis, “Automated Diagnosis of Coronary Artery Disease Based on Data Mining and Fuzzy Modeling”, IEEE Transactions on Biomedical Engineering, vol. 12, Issue 4, pp. 447–458, 2008.
C. Ordonez, “Comparing Association Rules and Decision Trees for Disease Prediction”, Proceedings of the ACM HIKM’06, Arlington, 2006.
C. Ordonez, N. Ezquerra and C. Santana, “Constraining and Summarizing Association Rules in Medical Data”, Knowledge and Information Systems, vol. 9, Issue 3, pp. 259–283, 2006.
P. Chanda, L. Sucheston, A. Zhang, D. Brazeau, J.L. Freudenheim, C. Ambrosone and M. Ramanathan, “AMBIENCE: A Novel Approach and Efficient Algorithm for Identifying Informative Genetic and Environmental Associations with Complex Phenotypes”, Genetics, vol. 180, pp. 1191–1210, October 2008.
J.H. Moore, J.C. Gilbert, C.T. Tsai, F.T. Chiang, T. Holden, N. Barney and B.C. White, “A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility”, Journal of Theoretical Biology, vol. 241, pp. 252–261, 2006.
International HapMap Project: http://hapmap.ncbi.nlm.nih.gov/abouthapmap.html
J. R. Quinlan and J. R. C4.5, “Programs for machine learning”, Morgan Kaufmann Publishers, 1993.
C. Cortes and V. Vapnik, “Support-vector network”, Machine Learning, vol. 20, Issue 3, pp. 273–297, 1995.
N. Cristiannini and J. Shawe-Taylor, “An Introduction to Support Vector Machines and Other Kernel-Based Learning Models”, Cambridge University Press, 2000.
L. Breiman, “Random Forests”, Machine Learning, vol. 45, Issue 1, pp. 5–32, 2001.
P. Tan, M. Steinbach and V. Kumar, “Introduction to Data Mining”, Addison-Wesley, 2005.
T. Hastie, R. Tibshirani and J. Friedman, “The Elements of Statistical Learning”, Springer-Verlag, 2008.
H. Liu and R. Setiono, “Chi2: Feature selection and discretization of numeric attributes”, Proceedings of the IEEE 7th International Conference on Tools with Artificial Intelligence, pp. 338–391, 1995.
H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, Issue 8, pp. 1226–1238, 2005.
A. Reiner, C. Carlson, B. Thyagarajan, M. Rieder, J. Polak, D. Siscovick, D. Nickerson, D. Jacobs Jr, and M. Gross. “Soluble P-Selectin, SELP Polymorphisms, and Atherosclerotic Risk in European-American and African-African Young Adults”, Arteriosclerosis, Thrombosis and Vascular Biology, August 2008.
A. Timinskas, Z. Kucinskiene, and V. Kucinskas. “Atherosclerosis: alterations in cell communication”, in ACTA MEDICA LITUANICA, vol. 14, Issue 1. P. 24–29, 2007
S. Szymczak, B.W. Igl, and A. Ziegler. “Detecting SNP-expression associations: A comparison of mutual information and median test with standard statistical approaches”, Statistics in Medicine, vol. 28, pp. 3581–3596, 2009.
J. Stangard, S. Kardia, S. Hmon, R. Schmidt, A. Tybjaerg-Hansen, V. Salomaa, E. Boerwinkle, and C. Sing. “Contribution of regulatory and structural variations in APOE to predicting dyslipidemia”, The Journal of Lipid Research, vol. 47, pp. 318–328, 2006.
N. Yosef, J. Gramm, Q. Wang, W. Noble, R. Karp, and R. Sharan.“Prediction Of Phenotype Information From Genotype Data”, Communications In Information And Systems, vol. 10, Issue 2, pp. 99–114, 2010.
F. Pan, L. McMilan, F. Pardo-Manuel De Villena, D. Threadgill, and W. Wang.“TreeQA: Quantitative Genome Wide Association Mapping Using Local Perfect Phylogeny Trees”, Pac Symposium of Biocomputing, pp. 415–426, 2009.
D. Tzikas and A. Likas, “An Incremental Bayesian Approach for Training Multilayer Perceptrons”, Proceedings of the International Conference on Artificial Neural Networks (ICANN’10), Thessaloniki, Greece, Springer, 2010.
X. Wu, D.l Barbar, L. Zhang, and Y. Ye, “Gene Interaction Analysis Using k-way Interaction Loglinear Model: A Case Study on Yeast Data”, ICML Workshop, Machine Learning in Bioinformatics, 2003.
A. Jakulin, I. Bratko, “Testing the Significance of Attribute Interactions”,Proceedings of the 21st International Conference on Machine Learning (ICML-2004), Eds. R. Greiner and D. Schuurmans, pp. 409–416, Banff, Canada, 2004.
The ARTreat Project, site: http://www.artreat.org
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Kalogeratos, A., Chasanis, V., Rakocevic, G., Likas, A., Babovic, Z., Novakovic, M. (2013). Mining Clinical Data. In: Rakocevic, G., Djukic, T., Filipovic, N., Milutinović, V. (eds) Computational Medicine in Data Mining and Modeling. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8785-2_1
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8785-2_1
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8784-5
Online ISBN: 978-1-4614-8785-2
eBook Packages: Computer ScienceComputer Science (R0)