Genetic Algorithm Based Methods for Identification of Health Risk Factors Aimed at Preventing Metabolic Syndrome

  • Topon Kumar Paul
  • Ken Ueno
  • Koichiro Iwata
  • Toshio Hayashi
  • Nobuyoshi Honda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5361)


In recent years, metabolic syndrome has emerged as a major health concern because it increases the risk of developing lifestyle diseases, such as diabetes, hypertension, and cardiovascular disease. Some of the symptoms of the metabolic syndrome are high blood pressure, decreased HDL cholesterol, and elevated triglycerides (TG). To prevent the developing of metabolic syndrome, accurate prediction of the future values of these health risk factors and identification of other factors from the health checkup and lifestyle data, which are highly related with these risk factors, are very important. In this paper, we propose a new framework, based on genetic algorithm and its variants, for identifying those important health factors and predicting the future health risk of a person with high accuracy. We show the effectiveness of the proposed system by applying it to the health checkup and lifestyle data of Toshiba Corporation.


Feature selection classification unbalanced data metabolic syndrome fitness evaluation RPMBGA+ AUC balanced 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    MedlinePlus: Metabolic syndrome [Online accessed June 27, 2008] (2008),
  2. 2.
    Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading (1989)zbMATHGoogle Scholar
  3. 3.
    Holland, J.: Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor (1975)Google Scholar
  4. 4.
    Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)zbMATHGoogle Scholar
  5. 5.
    Paul, T.K., Iba, H.: Prediction of cancer class with majority voting genetic programming classifier using gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics (23 August 2007); Preprint on IEEE Computer Society Digital Library, June 11 (2008)Google Scholar
  6. 6.
    Paul, T.K., Hasegawa, Y., Iba, H.: Classification of gene expression data by majority voting genetic programming classifier. In: Proceedings of the 2006 IEEE WCCI, Vancouver, BC, Canada, pp. 8690–8697 (2006)Google Scholar
  7. 7.
    Paul, T.K., Iba, H.: Gene selection for classification of cancers using probabilistic model building genetic algorithm. BioSystems 82(3), 208–225 (2005)CrossRefGoogle Scholar
  8. 8.
    Paul, T.K., Iba, H.: Identification of informative genes for molecular classification using probabilistic model building genetic algorithm. In: Proceedings of Genetic and Evolutionary Computation Conference 2004, pp. 414–425 (2004)Google Scholar
  9. 9.
    Deb, K., Reddy, A.R.: Reliable classification of two-class cancer data using evolutionary algorithms. BioSystems 72, 111–129 (2003)CrossRefGoogle Scholar
  10. 10.
    Wang, L., Chu, F., Xie, W.: Accurate cancer classification using expressions of very few genes. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4(1) (2007)Google Scholar
  11. 11.
    Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufman Publishers, San Francisco (1993)Google Scholar
  12. 12.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)CrossRefzbMATHGoogle Scholar
  13. 13.
    Tan, K.C., Tay, A., Lee, T.H., Heng, C.M.: Mining multiple comprehensible classification rules using genetic programming. In: Proceedings of the 2002 Congress on Evolutionary Computation, Washington, DC, USA, pp. 1302–1307 (2002)Google Scholar
  14. 14.
    Alfaro-Cid, E., Sharman, K., Esparcia-Alcàzar, A.I.: A genetic programming approach for bankruptcy prediction using a highly unbalanced database. In: Giacobini, M. (ed.) EvoWorkshops 2007. LNCS, vol. 4448, pp. 169–178. Springer, Heidelberg (2007)Google Scholar
  15. 15.
    Pelikan, M., Goldberg, D., Lobo, F.: A survey of optimizations by building and using probabilistic models. Technical Report, Illigal Report 99018, Illinois Genetic Algorithms Laboratory, University of Illinois at Urbana-Champaign, USA (1999)Google Scholar
  16. 16.
    Paul, T.K., Ueno, K., Iwata, K., Hayashi, T., Honda, N.: Risk prediction and risk factors identification from imbalanced data with rpmbga+. In: GECCO 2008: Proceedings of the 2008 GECCO conference companion on Genetic and evolutionary computation, pp. 2193–2198. ACM, New York (2008)CrossRefGoogle Scholar
  17. 17.
    Baluja, S.: Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical Report CMU-CS-94-163, Carnegie Mellon University, Pittsburgh, Pennsylvania (1994)Google Scholar
  18. 18.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Topon Kumar Paul
    • 1
  • Ken Ueno
    • 1
  • Koichiro Iwata
    • 2
  • Toshio Hayashi
    • 2
  • Nobuyoshi Honda
    • 2
  1. 1.Corporate Research & Development Center, Toshiba CorporationKawasakiJapan
  2. 2.Toshiba CorporationTokyoJapan

Personalised recommendations