Skip to main content

Abstract

The prerequisite of any machine learning or data mining application is to have a clear target variable that the system will try to learn. In a supervised setting, we also need to know the value of this target variable for a set of training examples (i.e., patient records). In the case study presented in this chapter, the value of the considered target variable that can be used for training is the ground truth characterizations of the coronary artery disease severity or, as a different scenario, the progression of the patients. We either set as target variable the disease severity, or disease progression, and then we consider a two-class problem in which we aim to discriminate a group of patients that are characterized as “severely diseased” or “severely progressed,” from a second group containing “mildly diseased” or “mildly progressed” patients, respectively. This latter mild/severe characterization is the actual value of the target variable for each patient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. M.W. Browne, “Cross-validation methods”, Journal of Mathematical Psychologyvol. 44, Issue 1, pp. 108–132, March 2000.

    Google Scholar 

  2. I.H. Witten, Eibe Frank, “Data Mining: Practical Machine Learning Tools and Techniques”, Morgan Kaufmann, June 2005.

    Google Scholar 

  3. R. Herbrich, Learning Kernel Classifiers, MIT Press, Cambridge, MA, 2002.

    Google Scholar 

  4. I. Guyon and A. Elisseeff, “Variable and feature selection”, Journal of Machine Learning Research, vol. 3, March 2003.

    Google Scholar 

  5. A. Gimelli, G. Rossi, P. Landi, P. Marzullo, G. Iervasi, A. L’Abbate, and Daniele Rovai, “Stress/Rest Myocardial Perfusion Abnormalities by Gated SPECT: Still the Best Predictor of Cardiac Events in Stable Ischemic Heart Disease”, Journal of Nuclear Medicine, vol. 50, Issue 4, April 2009.

    Google Scholar 

  6. I. Guyon, J. Weston, S. Barnhill, V. Vapnik, “Gene Selection for Cancer Classification using Support Vector Machines”, Machine Learning, vol. 46, Issue 1–3, pp. 389–422, 2002.

    Article  MATH  Google Scholar 

  7. University of California – Irvine (UCI) Machine Learning Repository: http://archive.ics.uci.edu/ml.

  8. R. Das, I. Turkoglu and A. Sengur, “Effective Diagnosis of Heart Disease through Neural Network Ensembles”, Expert Systems with Applications, vol. 36, pp. 7675–7680, 2009.

    Article  Google Scholar 

  9. M.G. Tsipouras, T.P. Exarchos, D.I. Fotiadis, A.P. Kotsia, K.V. Vakalis, K.K. Naka, L.K. Michalis, “Automated Diagnosis of Coronary Artery Disease Based on Data Mining and Fuzzy Modeling”, IEEE Transactions on Biomedical Engineering, vol. 12, Issue 4, pp. 447–458, 2008.

    Article  Google Scholar 

  10. C. Ordonez, “Comparing Association Rules and Decision Trees for Disease Prediction”, Proceedings of the ACM HIKM’06, Arlington, 2006.

    Google Scholar 

  11. C. Ordonez, N. Ezquerra and C. Santana, “Constraining and Summarizing Association Rules in Medical Data”, Knowledge and Information Systems, vol. 9, Issue 3, pp. 259–283, 2006.

    Article  Google Scholar 

  12. P. Chanda, L. Sucheston, A. Zhang, D. Brazeau, J.L. Freudenheim, C. Ambrosone and M. Ramanathan, “AMBIENCE: A Novel Approach and Efficient Algorithm for Identifying Informative Genetic and Environmental Associations with Complex Phenotypes”, Genetics, vol. 180, pp. 1191–1210, October 2008.

    Google Scholar 

  13. J.H. Moore, J.C. Gilbert, C.T. Tsai, F.T. Chiang, T. Holden, N. Barney and B.C. White, “A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility”, Journal of Theoretical Biology, vol. 241, pp. 252–261, 2006.

    Article  MathSciNet  Google Scholar 

  14. International HapMap Project: http://hapmap.ncbi.nlm.nih.gov/abouthapmap.html

  15. J. R. Quinlan and J. R. C4.5, “Programs for machine learning”, Morgan Kaufmann Publishers, 1993.

    Google Scholar 

  16. C. Cortes and V. Vapnik, “Support-vector network”, Machine Learning, vol. 20, Issue 3, pp. 273–297, 1995.

    MATH  Google Scholar 

  17. N. Cristiannini and J. Shawe-Taylor, “An Introduction to Support Vector Machines and Other Kernel-Based Learning Models”, Cambridge University Press, 2000.

    Google Scholar 

  18. L. Breiman, “Random Forests”, Machine Learning, vol. 45, Issue 1, pp. 5–32, 2001.

    Article  MATH  Google Scholar 

  19. P. Tan, M. Steinbach and V. Kumar, “Introduction to Data Mining”, Addison-Wesley, 2005.

    Google Scholar 

  20. T. Hastie, R. Tibshirani and J. Friedman, “The Elements of Statistical Learning”, Springer-Verlag, 2008.

    Google Scholar 

  21. H. Liu and R. Setiono, “Chi2: Feature selection and discretization of numeric attributes”, Proceedings of the IEEE 7th International Conference on Tools with Artificial Intelligence, pp. 338–391, 1995.

    Google Scholar 

  22. H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, Issue 8, pp. 1226–1238, 2005.

    Article  Google Scholar 

  23. A. Reiner, C. Carlson, B. Thyagarajan, M. Rieder, J. Polak, D. Siscovick, D. Nickerson, D. Jacobs Jr, and M. Gross. “Soluble P-Selectin, SELP Polymorphisms, and Atherosclerotic Risk in European-American and African-African Young Adults”, Arteriosclerosis, Thrombosis and Vascular Biology, August 2008.

    Google Scholar 

  24. A. Timinskas, Z. Kucinskiene, and V. Kucinskas. “Atherosclerosis: alterations in cell communication”, in ACTA MEDICA LITUANICA, vol. 14, Issue 1. P. 24–29, 2007

    Google Scholar 

  25. S. Szymczak, B.W. Igl, and A. Ziegler. “Detecting SNP-expression associations: A comparison of mutual information and median test with standard statistical approaches”, Statistics in Medicine, vol. 28, pp. 3581–3596, 2009.

    Article  MathSciNet  Google Scholar 

  26. J. Stangard, S. Kardia, S. Hmon, R. Schmidt, A. Tybjaerg-Hansen, V. Salomaa, E. Boerwinkle, and C. Sing. “Contribution of regulatory and structural variations in APOE to predicting dyslipidemia”, The Journal of Lipid Research, vol. 47, pp. 318–328, 2006.

    Article  Google Scholar 

  27. N. Yosef, J. Gramm, Q. Wang, W. Noble, R. Karp, and R. Sharan.“Prediction Of Phenotype Information From Genotype Data”, Communications In Information And Systems, vol. 10, Issue 2, pp. 99–114, 2010.

    Article  MATH  Google Scholar 

  28. F. Pan, L. McMilan, F. Pardo-Manuel De Villena, D. Threadgill, and W. Wang.“TreeQA: Quantitative Genome Wide Association Mapping Using Local Perfect Phylogeny Trees”, Pac Symposium of Biocomputing, pp. 415–426, 2009.

    Google Scholar 

  29. D. Tzikas and A. Likas, “An Incremental Bayesian Approach for Training Multilayer Perceptrons”, Proceedings of the International Conference on Artificial Neural Networks (ICANN’10), Thessaloniki, Greece, Springer, 2010.

    Google Scholar 

  30. X. Wu, D.l Barbar, L. Zhang, and Y. Ye, “Gene Interaction Analysis Using k-way Interaction Loglinear Model: A Case Study on Yeast Data”, ICML Workshop, Machine Learning in Bioinformatics, 2003.

    Google Scholar 

  31. A. Jakulin, I. Bratko, “Testing the Significance of Attribute Interactions”,Proceedings of the 21st International Conference on Machine Learning (ICML-2004), Eds. R. Greiner and D. Schuurmans, pp. 409–416, Banff, Canada, 2004.

    Google Scholar 

  32. The ARTreat Project, site: http://www.artreat.org

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Argyris Kalogeratos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Kalogeratos, A., Chasanis, V., Rakocevic, G., Likas, A., Babovic, Z., Novakovic, M. (2013). Mining Clinical Data. In: Rakocevic, G., Djukic, T., Filipovic, N., Milutinović, V. (eds) Computational Medicine in Data Mining and Modeling. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8785-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-8785-2_1

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-8784-5

  • Online ISBN: 978-1-4614-8785-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics