Introduction to the Development and Validation of Predictive Biomarker Models from High-Throughput Data Sets

  • Xutao Deng
  • Fabien Campagne
Part of the Methods in Molecular Biology book series (MIMB, volume 620)


High-throughput technologies can routinely assay biological or clinical samples and produce wide data sets where each sample is associated with tens of thousands of measurements. Such data sets can be mined to discover biomarkers and develop statistical models capable of predicting an endpoint of interest from data measured in the samples. The field of biomarker model development combines methods from statistics and machine learning to develop and evaluate predictive biomarker models. In this chapter, we discuss the computational steps involved in the development of biomarker models designed to predict information about individual samples and review approaches often used to implement each step. A practical example of biomarker model development in a large gene expression data set is presented. This example leverages BDVal, a suite of biomarker model development programs developed as an open-source project (see

Key words

Biomarker model development gene expression high-throughput measurement microarray machine learning cross-validation performance estimates feature selection BDVal 


  1. 1.
    Group, B. D. W. (2001) Biomarkers and Surrogate Endpoints: Preferred Definitions and Conceptual Framework. Clin Pharmacol Ther 69, 89–95.CrossRefGoogle Scholar
  2. 2.
    Evans, W. E., and Relling, M. V. (1999) Pharmacogenomics: Translating Functional Genomics into Rational Therapeutics. Science 286, 487–491.PubMedCrossRefGoogle Scholar
  3. 3.
    He, Y. (2006) Genomic Approach to Biomarker Identification and its Recent Applications. Cancer Biomark 2,; 103–133.PubMedGoogle Scholar
  4. 4.
    Yasui, Y., Pepe, M., Thompson, M., Adam, B., Wright, G., Qu, Y., Potter, J., Winget, M., Thornquist, M., and Feng, Z. (2003) A Data-Analytic Strategy for Protein Biomarker Discovery: Profiling of High-Dimensional Proteomic Data for Cancer Detection. Biostatistics 4, 449–463.PubMedCrossRefGoogle Scholar
  5. 5.
    Baylin, S. B., and Ohm, J. E. (2006) Epigenetic Gene Silencing in Cancer–a Mechanism for Early Oncogenic Pathway Addiction? Nat Rev Cancer 6, 107–116.PubMedCrossRefGoogle Scholar
  6. 6.
    Cho, W. C. (2007) Contribution of Oncoproteomics to Cancer Biomarker Discovery. Mol. Cancer 6, 25.PubMedCrossRefGoogle Scholar
  7. 7.
    Sawyers, C. (2005) Making Progress through Molecular Attacks on Cancer. Cold Spring Harb Symp Quant Biol 70,; 479–482.PubMedCrossRefGoogle Scholar
  8. 8.
    Riesterer, O., Milas, L., and Ang, K. (2007) Use of Molecular Biomarkers for Predicting the Response to Radiotherapy with or without Chemotherapy. J Clin Oncol 25,; 4075–4083.PubMedCrossRefGoogle Scholar
  9. 9.
    Lobdell, D. T., and Mendola, P. (2005) Development of a Biomarkers Database for the National Children’s Study. Toxicol Appl Pharmacol 206, 269–273.PubMedCrossRefGoogle Scholar
  10. 10.
    Simon, R. (2003) Supervised analysis when the number of candidate features greatly exceeds the number of cases. Association for Computing Machinery SIGKDD Explorations 5 (2), 31–36.Google Scholar
  11. 11.
    Scherzer, C. R., Eklund, A. C., Morse, L. J., Liao, Z., Locascio, J. J., Fefer, D., Schwarzschild, M. A., Schlossmacher, M. G., Hauser, M. A., Vance, J. M., Sudarsky, L. R., Standaert, D. G., Growdon, J. H., Jensen, R. V., and Gullans, S. R. (2007) Molecular Markers of Early Parkinson’s Disease Based on Gene Expression in Blood. Proc Natl Acad Sci 104, 955–960.PubMedCrossRefGoogle Scholar
  12. 12.
    Lenz, G., Wright, G., Dave, S. S., Xiao, W., Powell, J., Zhao, H., Xu, W., Tan, B., Goldschmidt, N., Iqbal, J., Vose, J., Bast, M., Fu, K., Weisenburger, D. D., Greiner, T. C., Armitage, J. O., Kyle, A., May, L., Gascoyne, R. D., Connors, J. M., Troen, G., Holte, H., Kvaloy, S., Dierickx, D., Verhoef, G., Delabie, J., Smeland, E. B., Jares, P., Martinez, A., Lopez-Guillermo, A., Montserrat, E., Campo, E., Braziel, R. M., Miller, T. P.,; Rimsza, L. M., Cook, J. R., Pohlman, B., Sweetenham, J., Tubbs, R. R., Fisher, R. I., Hartmann, E., Rosenwald, A., Ott, G., Muller-Hermelink, H. K., Wrench, D., Lister, T. A., Jaffe, E. S., Wilson, W. H., Chan, W. C., Staudt, L. M., and Lymphoma/Leukemia Molecular Profiling Project. (2008) Stromal Gene Signatures in Large-B-Cell Lymphomas. N Engl J Med 359, 2313–2323.PubMedCrossRefGoogle Scholar
  13. 13.
    Metzeler, K. H., Hummel, M., Bloomfield, C. D., Spiekermann, K., Braess, J., Sauerland, M., Heinecke, A., Radmacher, M., Marcucci, G., Whitman, S. P., Maharry, K., Paschka, P., Larson, R. A., Berdel, W. E., Buchner, T., Wormann, B., Mansmann, U., Hiddemann, W., Bohlander, S. K., Buske, C., and for Cancer and Leukemia Group B and the German AML Cooperative Group. (2008) An 86-Probe-Set Gene-Expression Signature Predicts Survival in Cytogenetically Normal Acute Myeloid Leukemia. Blood 112,; 4193–4201.Google Scholar
  14. 14.
    Mok, S. C., Chao, J., Skates, S., Wong, K., Yiu, G. K., Muto, M. G., Berkowitz, R. S., and Cramer, D. W. (2001) Prostasin, a Potential Serum Marker for Ovarian Cancer: Identification through Microarray Technology. J Natl Cancer Inst 93, 1458–1464.PubMedCrossRefGoogle Scholar
  15. 15.
    Varambally, S., Yu, J., Laxman, B., Rhodes, D., Mehra, R., Tomlins, S., Shah, R., Chandran, U., Monzon, F., Becich, M., Wei, J., Pienta, K., Ghosh, D., Rubin, M., and Chinnaiyan, A. (2005) Integrative Genomic and Proteomic Analysis of Prostate Cancer Reveals Signatures of Metastatic Progression. Cancer Cell 8, 393–406.PubMedCrossRefGoogle Scholar
  16. 16.
    Setlur, S. R., Mertz, K. D., Hoshida, Y., Demichelis, F., Lupien, M., Perner, S., Sboner, A., Pawitan, Y., Andren, O., Johnson, L. A., Tang, J., Adami, H. O., Calza, S., Chinnaiyan, A. M., Rhodes, D., Tomlins, S., Fall, K., Mucci, L. A., Kantoff, P. W., Stampfer, M. J., Andersson, S. O., Varenhorst, E., Johansson, J. E., Brown, M., Golub, T. R., and Rubin, M. A. (2008) Estrogen-Dependent Signaling in a Molecularly Distinct Subclass of Aggressive Prostate Cancer. J Natl Cancer Inst 100, 815–825.PubMedCrossRefGoogle Scholar
  17. 17.
    van’t Veer, Laura J., Dai, H., van de Vijver, Marc J., He, Y. D., Hart, A. A. M., Mao, M., Peterse, H. L., van der Kooy, K., Marton, M. J., Witteveen, A. T., Schreiber, G. J., Kerkhoven, R. M., Roberts, C., Linsley, P. S., Bernards, R., and Friend, S. H. (2002) Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer. Nature 415,; 530–536.PubMedCrossRefGoogle Scholar
  18. 18.
    Gianni, L., Zambetti, M., Clark, K., Baker, J., Cronin, M., Wu, J., Mariani, G., Rodriguez, J., Carcangiu, M., Watson, D., Valagussa, P., Rouzier, R., Symmans, W. F., Ross, J. S., Hortobagyi, G. N., Pusztai, L., and Shak, S. (2005) Gene Expression Profiles in Paraffin-Embedded Core Biopsy Tissue Predict Response to Chemotherapy in Women with Locally Advanced Breast Cancer. J Clin Oncol 23, 7265–7277.PubMedCrossRefGoogle Scholar
  19. 19.
    Bertucci, F., and Birnbaum, D. (2007) Breast Cancer Genomics: Real-Time Use. Lancet Oncol 8, 1045–1047.PubMedCrossRefGoogle Scholar
  20. 20.
    Buyse, M., Loi, S., van’t Veer, L., Viale, G., Delorenzi, M., Glas, A. M., d’Assignies, M. S., Bergh, J., Lidereau, R., Ellis, P., Harris, A., Bogaerts, J., Therasse, P., Floore, A., Amakrane, M., Piette, F., Rutgers, E., Sotiriou, C., Cardoso, F., Piccart, M. J., and TRANSBIG Consortium. (2006) Validation and Clinical Utility of a 70-Gene Prognostic Signature for Women with Node-Negative Breast Cancer. J Natl Cancer Inst 98,; 1183–1192.PubMedCrossRefGoogle Scholar
  21. 21.
    Sreekumar, R., Halvatsiotis, P., Schimke, J. C., and Nair, K. S. (2002) Gene Expression Profile in Skeletal Muscle of Type 2 Diabetes and the Effect of Insulin Treatment. Diabetes 51, 1913–1920.PubMedCrossRefGoogle Scholar
  22. 22.
    Suzman, D. L., McLaughlin, M., Hu, Z., Kleiner, D. E., Wood, B., Lempicki, R. A., Mican, J. M., Suffredini, A., Masur, H., Polis, M. A., and Kottilil, S. (2008) Identification of Novel Markers for Liver Fibrosis in HIV/hepatitis C Virus Coinfected Individuals using Genomics-Based Approach. AIDS 22, 1433–1439.PubMedCrossRefGoogle Scholar
  23. 23.
    Pritzker, K. P. (2002) Cancer Biomarkers: Easier Said than done. Clin Chem 48,; 1147–1150.PubMedGoogle Scholar
  24. 24.
    Lashkari, D. A., DeRisi, J. L., McCusker, J. H., Namath, A. F., Gentile, C., Hwang, S. Y., Brown, P. O., and Davis, R. W. (1997) Yeast Microarrays for Genome Wide Parallel Genetic and Gene Expression Analysis. Proc Natl Acad Sci USA 94, 13057–13062.PubMedCrossRefGoogle Scholar
  25. 25.
    Schena, M., Shalon, D., Davis, R. W., and Brown, P. O. (1995) Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray. Science 270, 467–470.PubMedCrossRefGoogle Scholar
  26. 26.
    Karas, M., and Hillenkamp, F. (1988) Laser Desorption Ionization of Proteins with Molecular Masses Exceeding 10,000 Daltons. Anal Chem 60, 2299–2301.PubMedCrossRefGoogle Scholar
  27. 27.
    Fenn, J. B., Mann, M., Meng, C. K., Wong, S. F., and Whitehouse, C. M. (1989) Electrospray Ionization for Mass Spectrometry of Large Biomolecules. Science 246,; 64–71.PubMedCrossRefGoogle Scholar
  28. 28.
    Hatada, I., Fukasawa, M., Kimura, M., Morita, S., Yamada, K., Yoshikawa, T., Yamanaka, S., Endo, C., Sakurada, A., Sato, M., Kondo, T., Horii, A., Ushijima, T., and Sasaki, H. (2006) Genome-Wide Profiling of Promoter Methylation in Human. Oncogene 25, 3059–3064.PubMedCrossRefGoogle Scholar
  29. 29.
    Ching, T. T., Maunakea, A. K., Jun, P., Hong, C., Zardo, G., Pinkel, D., Albertson, D. G., Fridlyand, J., Mao, J. H., Shchors, K., Weiss, W. A., and Costello, J. F. (2005) Epigenome Analyses using BAC Microarrays Identify Evolutionary Conservation of Tissue-Specific Methylation of SHANK3. Nat Genet 37, 645–651.PubMedCrossRefGoogle Scholar
  30. 30.
    Aebersold, R., and Mann, M. (2003) Mass Spectrometry-Based Proteomics. Nature 422, 198–207.PubMedCrossRefGoogle Scholar
  31. 31.
    Aebersold, R., and Goodlett, D. R. (2001) Mass Spectrometry in Proteomics. Chem Rev 101, 269–295.PubMedCrossRefGoogle Scholar
  32. 32.
    Branham, W. S., Melvin, C. D., Han, T., Desai, V. G., Moland, C. L., Scully, A. T., and Fuscoe, J. C. (2007) Elimination of Laboratory Ozone Leads to a Dramatic Improvement in the Reproducibility of Microarray Gene Expression Measurements. BMC Biotechnol 7, 8.PubMedCrossRefGoogle Scholar
  33. 33.
    Fare, T. L., Coffey, E. M., Dai, H., He, Y. D., Kessler, D. A., Kilian, K. A., Koch, J. E., LeProust, E., Marton, M. J., Meyer, M. R., Stoughton, R. B., Tokiwa, G. Y., and Wang, Y. (2003) Effects of Atmospheric Ozone on Microarray Data Quality. Anal Chem 75, 4672–4675.PubMedCrossRefGoogle Scholar
  34. 34.
    Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A., and Nielsen, H. (2000) Assessing the Accuracy of Prediction Algorithms for Classification: An Overview. Bioinformatics 16, 412–424.PubMedCrossRefGoogle Scholar
  35. 35.
    J.A., Swets. (1988) Measuring the Accuracy of Diagnostic Systems. Science 240,; 1285–1293.PubMedCrossRefGoogle Scholar
  36. 36.
    Zweig, M. H., and Campbell, G. (1993) Receiver-Operating Characteristic (ROC) Plots: A Fundamental Evaluation Tool in Clinical Medicine. Clin Chem 39,; 561–577.PubMedGoogle Scholar
  37. 37.
    Fawcett, T. (2006) An Introduction to ROC Analysis. Pattern Recognit Lett 27, 861–874.CrossRefGoogle Scholar
  38. 38.
    Hanley, J. A., and McNeil, B. J. (1983) A Method of Comparing the Areas Under Receiver Operating Characteristic Curves Derived from the Same Cases. Radiology 148, 839–843.PubMedGoogle Scholar
  39. 39.
    Hanley, J. A., and McNeil, B. J. (1982) The Meaning and use of the Area Under a Receiver Operating Characteristic (ROC) Curve. Radiology 143, 29–36.PubMedGoogle Scholar
  40. 40.
    Shao, J. (1993) Linear Model Selection by Cross-Validation. J Am Stat Assoc 88,; 486–494.CrossRefGoogle Scholar
  41. 41.
    Kohavi, R. (1995) A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Morgan Kaufmann.Google Scholar
  42. 42.
    Efron, B. (1983) Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation. J Am Stat Assoc 78, 316–331.CrossRefGoogle Scholar
  43. 43.
    Parker, B. J., Gunter, S., and Bedo, J. (2007) Stratification Bias in Low Signal Microarray Studies. BMC Bioinformatics 8, 326.PubMedCrossRefGoogle Scholar
  44. 44.
  45. 45.
    Irizarry, R. A., Bolstad, B. M., Collin, F., Cope, L. M., Hobbs, B., and Speed, T. P. (2003) Summaries of Affymetrix GeneChip Probe Level Data. Nucleic Acids Res 31, e15.Google Scholar
  46. 46.
    Wu, Z., Irizarry, R. A., Gentleman, R., MartinezMurillo, F., and Spencer, F. (2004, December) A Model-Based Background Adjustment for Oligonucleotide Expression Arrays. J Am Stat Assoc 99, 909–917.CrossRefGoogle Scholar
  47. 47.
    Li, C., and Wong, W. H. (2001) Model-Based Analysis of Oligonucleotide Arrays: Expression Index Computation and Outlier Detection. Proc Natl Acad Sci USA 98,; 31–36.PubMedCrossRefGoogle Scholar
  48. 48.
    Katz, S., Irizarry, R. A., Lin, X., Tripputi, M., and Porter, M. W. (2006) A Summarization Approach for Affymetrix GeneChip Data using a Reference Training Set from a Large, Biologically Diverse Database. BMC Bioinformatics 7, 464.PubMedCrossRefGoogle Scholar
  49. 49.
    Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. (2003) A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Variance and Bias. Bioinformatics 19, 185–193.PubMedCrossRefGoogle Scholar
  50. 50.
    Quackenbush, J. (2002) Microarray Data Normalization and Transformation. Nat Genet 32 (Suppl), 496–501.PubMedCrossRefGoogle Scholar
  51. 51.
    Partek. Partek.
  52. 52.
    Johnson, W. E., Li, C., and Rabinovic, A. (2007) Adjusting Batch Effects in Microarray Expression Data using Empirical Bayes Methods. Biostatistics 8, 118–127.PubMedCrossRefGoogle Scholar
  53. 53.
    Hand, D. J., and Yu, K. (2001) Idiot’s Bayes: Not so Stupid After all? Int Stat Rev 69,; 385–398.CrossRefGoogle Scholar
  54. 54.
    Deegalla, S., and Boström, H. (2007) Classification of Microarrays with kNN: Comparison of Dimensionality Reduction Methods, in Lecture Notes in Computer Science, Springer Berlin/Heidelberg.Google Scholar
  55. 55.
    Dudoit, S., Fridlyand, J., and Speed, T. P. (2002) Comparison of Discrimination Methods for the Classification of Tumors using Gene Expression Data. J Am Stat Assoc 97, 77–87.CrossRefGoogle Scholar
  56. 56.
    Hosmer, D. W., and Lemeshow, S. (2000) Applied Logistic Regression (Wiley Series in Probability and Statistics). Wiley-Interscience Publication.Google Scholar
  57. 57.
    Tabachnick, B. G., and Fidell, L. S. (2006) Using Multivariate Statistics, 5th ed., Allyn & Bacon, Inc., Needham Heights, MA, USA.Google Scholar
  58. 58.
    Liao, J. G., and Chin, K. V. (2007) Logistic Regression for Disease Classification using Microarray Data: Model Selection in a Large p and Small n Case. Bioinformatics 23,; 1945–1951.PubMedCrossRefGoogle Scholar
  59. 59.
    Quinlan, J. R. (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google Scholar
  60. 60.
    Quinlan, J. R. (1996) Improved use of Continuous Attributes in C4.5. J Artificial Intell Res 4, 77–90.Google Scholar
  61. 61.
    Breiman, L. (2001) Random Forests. Machine Learning. 45, 5–32.CrossRefGoogle Scholar
  62. 62.
    Ho, T. K. (1998) The Random Subspace Method for Constructing Decision Forests. IEEE Trans Pattern Anal Mach Intell 20, 832–844.CrossRefGoogle Scholar
  63. 63.
    Diaz-Uriarte, R., and Alvarez de Andres, S. (2006) Gene Selection and Classification of Microarray Data using Random Forest. BMC Bioinformatics 7, 3.PubMedCrossRefGoogle Scholar
  64. 64.
    Cortes, C., and Vapnik, V. (1995) Support Vector Networks. Springer, Netherlands.Google Scholar
  65. 65.
    Joachims, T. (2002) Learning to Classify Text Using Support Vector Machines. Kluwer/Springer, Norwell, Massachusetts, USA.Google Scholar
  66. 66.
    Brown, M. P., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares, M., Jr., and Haussler, D. (2000) Knowledge-Based Analysis of Microarray Gene Expression Data by using Support Vector Machines. Proc Natl Acad Sci USA 97, 262–267.PubMedCrossRefGoogle Scholar
  67. 67.
    Meyer, D., Leisch, F., and Hornik, K. (2003) The Support Vector Machine Under Test. Neurocomputing 55, 169–186.CrossRefGoogle Scholar
  68. 68.
    Breiman, L. (1996) Bagging Predictors. Machine Learning 24, 123–140.Google Scholar
  69. 69.
    Bühlmann, P., and Yu, B. (2002) Analyzing Bagging. Annals of Statistics 30,; 927–961.CrossRefGoogle Scholar
  70. 70.
    Freund, Y., and Schapire, R. E. (1997) A Decision-Theoretic Generalization of Online Learning and an Application to Boosting. J Comp Sys Sci 55, 119–139.CrossRefGoogle Scholar
  71. 71.
    Freund, Y., Iyer, R., Schapire, R. E., and Singer, Y. (2003) An Efficient Boosting Algorithm for Combining Preferences. J Mach Learn Res 4, 933–969.Google Scholar
  72. 72.
    Dettling, M., and Buhlmann, P. (2003) Boosting for Tumor Classification with Gene Expression Data. Bioinformatics 19,; 1061–1069.PubMedCrossRefGoogle Scholar
  73. 73.
    Yeung, K. Y., and Ruzzo, W. L. (2001) Principal Component Analysis for Clustering Gene Expression Data. Bioinformatics 17, 763–774.PubMedCrossRefGoogle Scholar
  74. 74.
    Jolliffe, I. T. (1980) Principal Component Analysis. Springer, New York.Google Scholar
  75. 75.
    Sanguinetti, G., Milo, M., Rattray, M., and Lawrence, N. (2005) Accounting for Probe-Level Noise in Principal Component Analysis of Microarray Data. Bioinformatics 21,; 3748–3754.PubMedCrossRefGoogle Scholar
  76. 76.
    Lesnick, T., Papapetropoulos, S., Mash, D., Ffrench-Mullen, J., Shehadeh, L., de Andrade, M., Henley, J., Rocca, W., Ahlskog, J., and Maraganore, D. (2007) A Genomic Pathway Approach to a Complex Disease: Axon Guidance and Parkinson Disease. PLoS Genet 3, e98.PubMedCrossRefGoogle Scholar
  77. 77.
    Tusher, V. G., Tibshirani, R., and Chu, G. (2001) Significance Analysis of Microarrays Applied to the Ionizing Radiation Response. Proc Natl Acad Sci USA 98,; 5116–5121.PubMedCrossRefGoogle Scholar
  78. 78.
    Allison, D. B., Cui, X., Page, G. P., and Sabripour, M. (2006) Microarray Data Analysis: From Disarray to Consolidation and Consensus. Nat Rev Genet 7, 55–65.PubMedCrossRefGoogle Scholar
  79. 79.
    MAQC Consortium, Shi, L., Reidal., L. H., Jones, et al. (2006) The MicroArray Quality Control (MAQC) Project shows Inter- and Intraplatform Reproducibility of Gene Expression Measurements. Nat Biotechnol 24, 1151–1161.PubMedCrossRefGoogle Scholar
  80. 80.
    Shi, L., Tong, W., Fang, H., Scherf, U., Han, J., Puri, R., Frueh, F., Goodsaid, F., Guo, L., Su, Z., Han, T., Fuscoe, J., Xu, Z. A., Patterson, T., Hong, H., Xie, Q., Perkins, R., Chen, J., and Casciano, D. (2005) Cross-Platform Comparability of Microarray Technology: Intra-Platform Consistency and Appropriate Data Analysis Procedures are Essential. BMC Bioinformatics 6, S12.PubMedCrossRefGoogle Scholar
  81. 81.
    Shi, L., Jones, W. D., Jensen, R. V., Harris, S. C., Perkins, R. G., Goodsaid, F. M., Guo, L., Croner, L. J., Boysen, C., Fang, H., Qian, F., Amur, S., Bao, W., Barbacioru, C. C., Bertholet, V., Cao, X. M., Chu, T. M., Collins, P. J., Fan, X. H., Frueh, F. W., Fuscoe, J. C., Guo, X., Han, J., Herman, D., Hong, H., Kawasaki, E. S., Li, Q. Z., Luo, Y., Ma, Y., Mei, N., Peterson, R. L., Puri, R. K., Shippy, R., Su, Z., Sun, Y. A., Sun, H., Thorn, B., Turpaz, Y., Wang, C., Wang, S. J., Warrington, J. A., Willey, J. C., Wu, J., Xie, Q., Zhang, L., Zhang, L., Zhong, S., Wolfinger, R. D., and Tong, W. (2008) The Balance of Reproducibility, Sensitivity, and Specificity of Lists of Differentially Expressed Genes in Microarray Studies. BMC Bioinformatics 9 (Suppl 9), S10.PubMedCrossRefGoogle Scholar
  82. 82.
    Ding, C., and Peng, H. (2005) Minimum Redundancy Feature Selection from Microarray Gene Expression Data. J Bioinform Comput Biol 3, 185–205.PubMedCrossRefGoogle Scholar
  83. 83.
    Shannon, C., and Weaver, W. (1949) The Mathematical Theory of Communication. University of Illinois Press, Urbana, IL, USA.Google Scholar
  84. 84.
    Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. (2002) Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning; 46, 389–422.CrossRefGoogle Scholar
  85. 85.
    Liu, Q., and Sung, A. H. (2006) Recursive Feature Addition for Gene Selection. International Joint Conference on Neural Networks. Vancouver, BC, Canada,; pp. 1360–1367.Google Scholar
  86. 86.
    Kohavi, R., and John, G. (1997) Wrappers for Feature Subset Selection. Artif Intell 97, 273–324.CrossRefGoogle Scholar
  87. 87.
    Inza, I., Larranaga, P., Blanco, R., and Cerrolaza, A. J. (2004) Filter Versus Wrapper Gene Selection Approaches in DNA Microarray Domains. Artif Intell Med 31, 91–103.PubMedCrossRefGoogle Scholar
  88. 88.
    Xiong, M., Fang, X., and Zhao, J. (2001) Biomarker Identification by Feature Wrappers. Genome Res 11, 1878–1887.PubMedGoogle Scholar
  89. 89.
    Kirkpatrick, S., Gelatt, C. D., Jr, and Vecchi, M. P. (1983) Optimization by Simulated Annealing. Science 220, 671–680.PubMedCrossRefGoogle Scholar
  90. 90.
    Holland, J. H. (1975) Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor.Google Scholar
  91. 91.
    Carbonaro, A., and Maniezzo, V. (2003) The Ant Colony Optimization Paradigm for Combinatorial Optimization. Advances in Evolutionary Computing: Theory and Applications. Springer-Verlag, New York, NY, USA,; pp. 539–557.Google Scholar
  92. 92.
    Glover, F., and Laguna, M. (1997) Tabu Search.Kluwer, Norwell, MA, USA.Google Scholar
  93. 93.
    Dutkowski, J., and Gambin, A. (2007) On Consensus Biomarker Selection. BMC Bioinformatics 8, S5.PubMedCrossRefGoogle Scholar
  94. 94.
    Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S. (1999) Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537.PubMedCrossRefGoogle Scholar
  95. 95.
    Gong, Y., Yan, K., Lin, F., Anderson, K., Sotiriou, C., Andre, F., Holmes, F. A., Valero, V., Booser, D., Pippen, J., John E., Vukelja, S., Gomez, H., Mejia, J., Barajas, L. J., Hess, K. R., Sneige, N., Hortobagyi, G. N., Pusztai, L., and Symmans, W. F. (2007) Determination of Oestrogen-Receptor Status and ERBB2 Status of Breast Carcinoma: A Gene-Expression Profiling Study. The Lancet Oncology 8, 203–211.PubMedCrossRefGoogle Scholar
  96. 96.
    Guo, L., Lobenhofer, E. K., Wang, C., Shippy, R., Harris, S. C., Zhang, L., Mei, N., Chen, T., Herman, D., Goodsaid, F. M., Hurban, P., Phillips, K. L., Xu, J., Deng, X., Sun, Y. A., Tong, W., Dragan, Y. P., and Shi, L. (2006) Rat Toxicogenomic Study Reveals Analytical Consistency Across Microarray Platforms. Nat Biotechnol 24, 1162–1169.PubMedCrossRefGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Xutao Deng
    • 1
  • Fabien Campagne
    • 2
  1. 1.HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical CollegeNew YorkUSA
  2. 2.Department of Physiology and BiophysicsHRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical CollegeNew YorkUSA

Personalised recommendations