Abstract
Data mining techniques have been extensively applied in bioinformatics to analyze biomedical data. In this paper, we choose the Rapid-I’s RapidMiner as our tool to discover decision tree based diabetes prediction model from a Pima Indians Diabetes Data Set, which collects the information of patients with and without developing diabetes. Following the data mining process, our discussion will focus on the data preprocessing, including attribute identification and selection, outlier removal, data normalization and numerical discretization, visual data analysis, hidden relationships discovery, and a diabetes prediction model construction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kantardzic, M.: Data Mining: Concepts, Models, Methods, and Algorithms. John Wiley & Sons, Inc., New Jersey (2002)
Rapid-I, Interactive Design. Products: RapidMiner, Yale (2008), http://rapidi.com/content/view/13/69/lang.en/
Wheelwright, J.: Native America’s Alleles. Discover Magazine (2005), http://discovermagazine.com/2005/may/native-americas-alleles
Asuncion, A., Newman, D.J.: Pima Indians Diabetes Data Set, UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2007), http://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabets
Larose, D.T.: Data Mining Methods and Models. John Wiley & Sons, Inc., Hoboken (2006)
Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann, San Francisco (1999)
Cios, K.J., Pedrycz, W., Swiniarski, R.W., Kurgan, L.A.: Data Mining: A Knowledge Discovery Approach. Springer, New York (2007)
Seibel, J.A.: Diabetes Guide, WebMD (2007), http://diabetes.webmd.com/guide/oral-glucose-tolerance-test
Stein, D.W.: Hypertension / High Blood Pressure Guide, WebMD (2006), http://www.webmd.com/hypetension-diagnosing-high-blood-pressure
Zelman, K.M.: How Accurate is Body Mass Index, or BMI? WebMD (2008), http://www.webmd.com/diet/features/how-accurate-body-mass-index-bmi
Kass, G.V.: An Exploratory Technique for Investigating Large Quantities of Categorical Data. Journal of Applied Statistics 29(2), 119–127 (1980)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1992)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Han, J., Rodriguez, J.C., Beheshti, M. (2009). Discovering Decision Tree Based Diabetes Prediction Model. In: Kim, Th., Fang, WC., Lee, C., Arnett, K.P. (eds) Advances in Software Engineering. ASEA 2008. Communications in Computer and Information Science, vol 30. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10242-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-10242-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10241-7
Online ISBN: 978-3-642-10242-4
eBook Packages: Computer ScienceComputer Science (R0)