A Simple Genetic Algorithm for Biomarker Mining

  • Dusan Popovic
  • Alejandro Sifrim
  • Georgios A. Pavlopoulos
  • Yves Moreau
  • Bart De Moor
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7632)


We present a method for prognostics biomarker mining based on a genetic algorithm with a novel fitness function and a bagging-like model averaging scheme. We demonstrate it on publicly available data sets of gene expressions in colon cancer tissue specimens and assess the relevance of the discovered biomarkers by means of a qualitative analysis. Furthermore, we test performance of the method on the cancer recurrence prediction task using two independent external validation sets. The obtained results correspond to the top published performances of gene signatures developed specially for the colon cancer case.


genetic algorithm feature selection biomarker discovery gene expressions colon cancer gene signature k-nearest neighbours bagging 


  1. 1.
    Van’t Veer, L.J., Dai, H., van de Vijver, M.J., He, Y.D., Hart, A.A., Mao, M., Peterse, H.L., Van der Kooy, K., Marton, M.J., Witteveen, A.T., Schreiber, G.J., Kerkhoven, R.M., Roberts, C., Linsley, P.S., Bernards, R., Friend, S.H.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)CrossRefGoogle Scholar
  2. 2.
    Glas, A.M., Floore, A., Delahaye, L.J., Witteveen, A.T., Pover, R.C., Bakx, N., Lahti-Domenici, J.S., Bruinsma, T.J., Warmoes, M.O., Bernards, R., Wessels, L.F., Van’t Veer, L.J.: Converting a breast cancer microarray signature into a high-throughput diagnostic test. BMC Genomics 7, 278 (2006)CrossRefGoogle Scholar
  3. 3.
    Jemal, A., Siegel, R., Ward, E., Hao, Y., Xu, J., Thun, M.J.: Cancer statistics, 2009. CA Cancer J. Clin. 59, 225–249 (2009)CrossRefGoogle Scholar
  4. 4.
    O’Connell, J.B., Maggard, M.A., Ko, C.Y.: Colon cancer survival rates with the new American Joint Committee on Cancer sixth edition staging. J. Natl. Cancer. Inst. 96, 1420–1425 (2004)CrossRefGoogle Scholar
  5. 5.
    Kerr D., Gray R., Quirke P., Watson D., Yothers G., Lavery I.C., Lee M., O’Connell M.J., Shak S., Wolmark N.: A quantitative multigene RT-PCR assay for prediction of recurrence in stage II colon cancer: Selection of the genes in four large studies and results of the independent, prospectively designed QUASAR validation study. J. Clin. Oncol. 27(suppl.), 169s, abstr 4000 (2009)Google Scholar
  6. 6.
    Barrier, A., Boelle, P.Y., Roser, F., Gregg, J., Tse, C., Brault, D., Lacaine, F., Houry, S., Huguier, M., Franc, B., Flahault, A., Lemoine, A., Dudoit, S.: Stage II colon cancer prognosis prediction by tumor gene expression profiling. J. Clin. Oncol. 24, 4685–4691 (2006)CrossRefGoogle Scholar
  7. 7.
    Wang, Y., Jatkoe, T., Zhang, Y., Mutch, M.G., Talantov, D., Jiang, J., McLeod, H.L., Atkins, D.: Gene expression profiles and molecular markers to predict recurrence of Dukes’ B colon cancer. J. Clin. Oncol. 22, 1564–1571 (2004)CrossRefGoogle Scholar
  8. 8.
    Jourdan, L., Dhaenens, C., Talbi, E.-G.: A genetic algorithm for feature selection in data-mining for genetics. In: Proceedings of the 4th Metaheuristics International Conference Porto (MIC 2001), Porto, Portugal, pp. 29–34 (2001)Google Scholar
  9. 9.
    Jirapech-Umpai, T., Aitken, S.: Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes. BMC Bioinformatics 6, 148 (2005)CrossRefGoogle Scholar
  10. 10.
    Ooi, C.H., Tan, P.: Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19(1), 37–44 (2003)CrossRefGoogle Scholar
  11. 11.
    Fraser, A.: Simulation of genetic systems by automatic digital computers. I. Introduction. Aust. J. Biol. Sci. 10, 484–491 (1957)Google Scholar
  12. 12.
    Holland, J.H.: Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. University of Michigan Press (1975)Google Scholar
  13. 13.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman & Hall/CRC, Boca Raton (1993)zbMATHGoogle Scholar
  15. 15.
    Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)CrossRefzbMATHGoogle Scholar
  16. 16.
    Stone, C.J.: Consistent nonparametric regression. The Annals of Statistics 5(4), 595–620 (1977)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Stanfill, C., Waltz, D.: Toward memory-based reasoning. Commun. ACM 29(12), 1213–1228 (1986)CrossRefGoogle Scholar
  18. 18.
    Keki, M.B.: Generative Fixation: A Unified Explanation for the Adaptive Capacity of Simple Recombinative Genetic Algorithms. Ph.D. Thesis, Brandeis University (2009)Google Scholar
  19. 19.
    Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press (1996)Google Scholar
  20. 20.
    Baker, J.E.: Reducing Bias and Inefficiency in the Selection Algorithm. In: Proceedings of the Second International Conference on Genetic Algorithms and their Application, pp. 14–21. L. Erlbaum Associates, Hillsdale (1987)Google Scholar
  21. 21.
    Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  22. 22.
    Edgar, R., Domrachev, M., Lash, A.E.: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 1:30(1), 207–210 (2002)Google Scholar
  23. 23.
    Smith J.J., Deane N.G., Wu F., Merchant N.B., Zhang B., Jiang A., Lu P., Johnson J.C., Schmidt C., Bailey C.E., Eschrich S., Kis C., Levy S., Washington M.K., Heslin M.J., Coffey R.J., Yeatman T.J., Shyr Y., Beauchamp R.D.: Experimentally derived metastasis gene expression profile predicts recurrence and death in patients with colon cancer. Gastroenterology 138(3), 958–968, PMID: 19914252 (2010)Google Scholar
  24. 24.
    Kaiser, S., Park, Y.K., Franklin, J.L., Halberg, R.B., Yu, M., Jessen, W.J., Freudenberg, J., Chen, X., Haigis, K., Jegga, A.G., Kong, S., Sakthivel, B., Xu, H., Reichling, T., Azhar, M., Boivin, G.P., Roberts, R.B., Bissahoyo, A.C., Gonzales, F., Bloom, G.C., Eschrich, S., Carter, S.L., Aronow, J.E., Kleimeyer, J., Kleimeyer, M., Ramaswamy, V., Settle, S.H., Boone, B., Levy, S., Graff, J.M., Doetschman, T., Groden, J., Dove, W.F., Threadgill, D.W., Yeatman, T.J., Coffey Jr., R.J., Aronow, B.J.: Transcriptional recapitulation and subversion of embryonic colon development by mouse colon tumor models and human colon cancer. Genome Biol. 8(7), R131, PMID: 17615082 (2007)Google Scholar
  25. 25.
    Hubbell, E., Liu, W.M., Mei, R.: Robust estimators for expression analysis. Bioinformatics 18(12), 1585–1592 (2002)CrossRefGoogle Scholar
  26. 26.
    Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., Speed, T.P.: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4(2), 249–264 (2003)CrossRefzbMATHGoogle Scholar
  27. 27.
    Mandl, M., Slack, D.N., Keyse, S.M.: Specific inactivation and nuclear anchoring of extracellular signal-regulated kinase 2 by the inducible dual-specificity protein phosphatase DUSP5. Mol. Cell. Biol. 25(5), 1830–1845 (2005)CrossRefGoogle Scholar
  28. 28.
    Ueda, K., Arakawa, H., Nakamura, Y.: Dual-specificity phosphatase 5 (DUSP5) as a direct transcriptional target of tumor sup-pressor p53. Oncogene 22(36), 5586–5591 (2003)CrossRefGoogle Scholar
  29. 29.
    Watson, J.E., Doggett, N.A., Albertson, D.G., Andaya, A., Chinnaiyan, A., van Dekken, H., Ginzinger, D., Haqq, C., James, K., Kamkar, S., Kowbel, D., Pinkel, D., Schmitt, L., Simko, J.P., Volik, S., Weinberg, V.K., Paris, P.L., Collins, C.: Integration of high-resolution array com-parative genomic hybridization analysis of chromosome 16q with expression array data refines common regions of loss at 16q23-qter and identifies underlying candidate tumor suppressor genes in prostate cancer. Oncogene 23, 3487–3494 (2004)CrossRefGoogle Scholar
  30. 30.
    Lo, P.K., Lee, J.S., Liang, X., Han, L., Mori, T., Fackler, M.J., Sadik, H., Argani, P., Pandita, T.K., Su-kumar, S.: Epigenetic inactivation of the potential tumor suppressor gene FOXF1 in breast cancer. Cancer Res. 70, 6047–6058 (2010)CrossRefGoogle Scholar
  31. 31.
    Ormestad, M., Astorga, J., Landgren, H., Wang, T., Johansson, B.R., Miura, N., Carlsson, P.: Foxf1 and Foxf2 control murine gut development by limiting mesenchymal Wnt signaling and promoting extracellular matrix production. Development 133, 833–843 (2006)CrossRefGoogle Scholar
  32. 32.
    Madison, B.B., McKenna, L.B., Dolson, D., Epstein, D.J., Kaestner, K.H.: FoxF1 and FoxL1 link hedgehog signaling and the control of epithelial proliferation in the developing stomach and intestine. J. Biol. Chem. 284, 5936–5944 (2009)CrossRefGoogle Scholar
  33. 33.
    Jiang, Y., Casey, G., Lavery, I.C., Zhang, Y., Talantov, D., Martin-McGreevy, M., Skacel, M., Manilich, E., Mazumder, A., Atkins, D., Delaney, C.P., Wang, Y.: Development of a clinically feasible molecular assay to predict recurrence of stage II colon cancer. J. Mol. Diagn. 10, 346–354 (2008)CrossRefGoogle Scholar
  34. 34.
    Lin, Y.H., Friederichs, J., Black, M.A., Mages, J., Rosenberg, R., Guilford, P.J., Phillips, V., Thompson-Fawcett, M., Kasabov, N., Toro, T., Merrie, A.E., van Rij, A., Yoon, H.S., McCall, J.L., Siewert, J.R., Holzmann, B., Reeve, A.E.: Multiple gene expression classifiers from different array platforms predict poor prognosis of colorectal cancer. Clin. Cancer. Res. 13, 498–507 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Dusan Popovic
    • 1
  • Alejandro Sifrim
    • 1
  • Georgios A. Pavlopoulos
    • 1
  • Yves Moreau
    • 1
  • Bart De Moor
    • 1
  1. 1.ESAT-SCD / IBBT-KU Leuven Future Health DepartmentKatholieke Universiteit LeuvenLeuvenBelgium

Personalised recommendations