Mining Clinical Data

Kalogeratos, Argyris; Chasanis, V.; Rakocevic, G.; Likas, A.; Babovic, Z.; Novakovic, M.

doi:10.1007/978-1-4614-8785-2_1

Argyris Kalogeratos⁵,
V. Chasanis⁵,
G. Rakocevic⁶,
A. Likas⁵,
Z. Babovic⁷ &
…
M. Novakovic⁷

1672 Accesses
1 Citations

Abstract

The prerequisite of any machine learning or data mining application is to have a clear target variable that the system will try to learn. In a supervised setting, we also need to know the value of this target variable for a set of training examples (i.e., patient records). In the case study presented in this chapter, the value of the considered target variable that can be used for training is the ground truth characterizations of the coronary artery disease severity or, as a different scenario, the progression of the patients. We either set as target variable the disease severity, or disease progression, and then we consider a two-class problem in which we aim to discriminate a group of patients that are characterized as “severely diseased” or “severely progressed,” from a second group containing “mildly diseased” or “mildly progressed” patients, respectively. This latter mild/severe characterization is the actual value of the target variable for each patient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

M.W. Browne, “Cross-validation methods”, Journal of Mathematical Psychologyvol. 44, Issue 1, pp. 108–132, March 2000.
Google Scholar
I.H. Witten, Eibe Frank, “Data Mining: Practical Machine Learning Tools and Techniques”, Morgan Kaufmann, June 2005.
Google Scholar
R. Herbrich, Learning Kernel Classifiers, MIT Press, Cambridge, MA, 2002.
Google Scholar
I. Guyon and A. Elisseeff, “Variable and feature selection”, Journal of Machine Learning Research, vol. 3, March 2003.
Google Scholar
A. Gimelli, G. Rossi, P. Landi, P. Marzullo, G. Iervasi, A. L’Abbate, and Daniele Rovai, “Stress/Rest Myocardial Perfusion Abnormalities by Gated SPECT: Still the Best Predictor of Cardiac Events in Stable Ischemic Heart Disease”, Journal of Nuclear Medicine, vol. 50, Issue 4, April 2009.
Google Scholar
I. Guyon, J. Weston, S. Barnhill, V. Vapnik, “Gene Selection for Cancer Classification using Support Vector Machines”, Machine Learning, vol. 46, Issue 1–3, pp. 389–422, 2002.
Article MATH Google Scholar
University of California – Irvine (UCI) Machine Learning Repository: http://archive.ics.uci.edu/ml.
R. Das, I. Turkoglu and A. Sengur, “Effective Diagnosis of Heart Disease through Neural Network Ensembles”, Expert Systems with Applications, vol. 36, pp. 7675–7680, 2009.
Article Google Scholar
M.G. Tsipouras, T.P. Exarchos, D.I. Fotiadis, A.P. Kotsia, K.V. Vakalis, K.K. Naka, L.K. Michalis, “Automated Diagnosis of Coronary Artery Disease Based on Data Mining and Fuzzy Modeling”, IEEE Transactions on Biomedical Engineering, vol. 12, Issue 4, pp. 447–458, 2008.
Article Google Scholar
C. Ordonez, “Comparing Association Rules and Decision Trees for Disease Prediction”, Proceedings of the ACM HIKM’06, Arlington, 2006.
Google Scholar
C. Ordonez, N. Ezquerra and C. Santana, “Constraining and Summarizing Association Rules in Medical Data”, Knowledge and Information Systems, vol. 9, Issue 3, pp. 259–283, 2006.
Article Google Scholar
P. Chanda, L. Sucheston, A. Zhang, D. Brazeau, J.L. Freudenheim, C. Ambrosone and M. Ramanathan, “AMBIENCE: A Novel Approach and Efficient Algorithm for Identifying Informative Genetic and Environmental Associations with Complex Phenotypes”, Genetics, vol. 180, pp. 1191–1210, October 2008.
Google Scholar
J.H. Moore, J.C. Gilbert, C.T. Tsai, F.T. Chiang, T. Holden, N. Barney and B.C. White, “A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility”, Journal of Theoretical Biology, vol. 241, pp. 252–261, 2006.
Article MathSciNet Google Scholar
International HapMap Project: http://hapmap.ncbi.nlm.nih.gov/abouthapmap.html
J. R. Quinlan and J. R. C4.5, “Programs for machine learning”, Morgan Kaufmann Publishers, 1993.
Google Scholar
C. Cortes and V. Vapnik, “Support-vector network”, Machine Learning, vol. 20, Issue 3, pp. 273–297, 1995.
MATH Google Scholar
N. Cristiannini and J. Shawe-Taylor, “An Introduction to Support Vector Machines and Other Kernel-Based Learning Models”, Cambridge University Press, 2000.
Google Scholar
L. Breiman, “Random Forests”, Machine Learning, vol. 45, Issue 1, pp. 5–32, 2001.
Article MATH Google Scholar
P. Tan, M. Steinbach and V. Kumar, “Introduction to Data Mining”, Addison-Wesley, 2005.
Google Scholar
T. Hastie, R. Tibshirani and J. Friedman, “The Elements of Statistical Learning”, Springer-Verlag, 2008.
Google Scholar
H. Liu and R. Setiono, “Chi2: Feature selection and discretization of numeric attributes”, Proceedings of the IEEE 7th International Conference on Tools with Artificial Intelligence, pp. 338–391, 1995.
Google Scholar
H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, Issue 8, pp. 1226–1238, 2005.
Article Google Scholar
A. Reiner, C. Carlson, B. Thyagarajan, M. Rieder, J. Polak, D. Siscovick, D. Nickerson, D. Jacobs Jr, and M. Gross. “Soluble P-Selectin, SELP Polymorphisms, and Atherosclerotic Risk in European-American and African-African Young Adults”, Arteriosclerosis, Thrombosis and Vascular Biology, August 2008.
Google Scholar
A. Timinskas, Z. Kucinskiene, and V. Kucinskas. “Atherosclerosis: alterations in cell communication”, in ACTA MEDICA LITUANICA, vol. 14, Issue 1. P. 24–29, 2007
Google Scholar
S. Szymczak, B.W. Igl, and A. Ziegler. “Detecting SNP-expression associations: A comparison of mutual information and median test with standard statistical approaches”, Statistics in Medicine, vol. 28, pp. 3581–3596, 2009.
Article MathSciNet Google Scholar
J. Stangard, S. Kardia, S. Hmon, R. Schmidt, A. Tybjaerg-Hansen, V. Salomaa, E. Boerwinkle, and C. Sing. “Contribution of regulatory and structural variations in APOE to predicting dyslipidemia”, The Journal of Lipid Research, vol. 47, pp. 318–328, 2006.
Article Google Scholar
N. Yosef, J. Gramm, Q. Wang, W. Noble, R. Karp, and R. Sharan.“Prediction Of Phenotype Information From Genotype Data”, Communications In Information And Systems, vol. 10, Issue 2, pp. 99–114, 2010.
Article MATH Google Scholar
F. Pan, L. McMilan, F. Pardo-Manuel De Villena, D. Threadgill, and W. Wang.“TreeQA: Quantitative Genome Wide Association Mapping Using Local Perfect Phylogeny Trees”, Pac Symposium of Biocomputing, pp. 415–426, 2009.
Google Scholar
D. Tzikas and A. Likas, “An Incremental Bayesian Approach for Training Multilayer Perceptrons”, Proceedings of the International Conference on Artificial Neural Networks (ICANN’10), Thessaloniki, Greece, Springer, 2010.
Google Scholar
X. Wu, D.l Barbar, L. Zhang, and Y. Ye, “Gene Interaction Analysis Using k-way Interaction Loglinear Model: A Case Study on Yeast Data”, ICML Workshop, Machine Learning in Bioinformatics, 2003.
Google Scholar
A. Jakulin, I. Bratko, “Testing the Significance of Attribute Interactions”,Proceedings of the 21st International Conference on Machine Learning (ICML-2004), Eds. R. Greiner and D. Schuurmans, pp. 409–416, Banff, Canada, 2004.
Google Scholar
The ARTreat Project, site: http://www.artreat.org

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Ioannina, GR-45110, Ioannina, Greece
Argyris Kalogeratos, V. Chasanis & A. Likas
Mathematical Institute, Serbian Academy of Sciences and Arts, Belgrade, 11000, Serbia
G. Rakocevic
Innovation Center of the School of Electrical Engineering, University of Belgrade, Belgrade, 11000, Serbia
Z. Babovic & M. Novakovic

Authors

Argyris Kalogeratos
View author publications
You can also search for this author in PubMed Google Scholar
V. Chasanis
View author publications
You can also search for this author in PubMed Google Scholar
G. Rakocevic
View author publications
You can also search for this author in PubMed Google Scholar
A. Likas
View author publications
You can also search for this author in PubMed Google Scholar
Z. Babovic
View author publications
You can also search for this author in PubMed Google Scholar
M. Novakovic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Argyris Kalogeratos .

Editor information

Editors and Affiliations

Mathematical Institute, Serbian Academy of Science and Arts, Belgrade, Serbia
Goran Rakocevic
Faculty of Engineering, University of Kragujevac, Kragujevac, Serbia
Tijana Djukic
Faculty of Engineering, University of Kragujevac, Kragujevac, Serbia
Nenad Filipovic
School of Electrical Engineering, University of Belgrade, Belgrade, Serbia
Veljko Milutinović

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kalogeratos, A., Chasanis, V., Rakocevic, G., Likas, A., Babovic, Z., Novakovic, M. (2013). Mining Clinical Data. In: Rakocevic, G., Djukic, T., Filipovic, N., Milutinović, V. (eds) Computational Medicine in Data Mining and Modeling. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8785-2_1

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8785-2_1
Published: 19 September 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8784-5
Online ISBN: 978-1-4614-8785-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics