Collateral Missing Value Estimation: Robust Missing Value Estimation for Consequent Microarray Data Processing

  • Muhammad Shoaib B. Sehgal
  • Iqbal Gondal
  • Laurence Dooley
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3809)


Microarrays have unique ability to probe thousands of genes at a time that makes it a useful tool for variety of applications, ranging from diagnosis to drug discovery. However, data generated by microarrays often contains multiple missing gene expressions that affect the subsequent analysis, as most of the times these missing values are ignored. In this paper we have analyzed how accurate estimation of missing values can lead to better subsequent gene selection and class prediction. Collateral Missing Values Estimation (CMVE), which demonstrates superior imputation performance compared to Bayesian Principal Component Analysis (BPCA) Impute, K-Nearest Neighbour (KNN) algorithm, when estimating missing values in the BRCA1, BRCA2 and Sporadic genetic mutation samples present in ovarian cancer by exploiting both local/global and positive/negative correlation values. CMVE also consistently outperforms, in terms of classification accuracies, BPCA, KNN and ZeroImpute techniques. The imputation is followed by gene selection using fusion of Between Group to within Group Sum ofSquares and Weighted Partial Least Squares where Ridge Partial Least Square algorithm is used as a class predictor.


Partial Little Square Gene Selection Generalize Regression Neural Network Class Prediction Classification Fusion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sehgal, M.S.B., Gondal, I., Dooley, L.: Statistical Neural Networks and Support Vector Machine for the Classification of Genetic Mutations in Ovarian Cancer. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2004), USA, pp. 140–146 (2004)Google Scholar
  2. 2.
    Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E.F., Lander, E.S., Wong, W., Johnson, B.E., Golub, T.R., Sugarbaker, D.J., Meyerson, M.: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Presented at Proc. Natl. Acad. Sci, USA (2001)Google Scholar
  3. 3.
    Oba, S., Sato, M.A., Takemasa, I., Monden, M., Matsubara, K., Ishii, S.: A Bayesian Missing Value Estimation Method for Gene Expression Profile Data. Bioinformatics 19, 2088–2096 (2003)CrossRefGoogle Scholar
  4. 4.
    Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 77–78 (2002)Google Scholar
  5. 5.
    Sehgal, M.S.B., Gondal, I., Dooley, L.: K-Ranked Covariance Based Missing Values Estimation for Microarray Data Classification. IEEE Hybrid Intelligent Systems (HIS 2004) 00, 274–279 (2004)CrossRefGoogle Scholar
  6. 6.
    Acuna, E., Rodriguez, C.: The treatment of missing values and its effect in the classifier accuracy. In: Classification, Clustering and Data Mining Applications, pp. 639–648 (2004)Google Scholar
  7. 7.
    Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., Altman, R.: Missing Value Estimation Methods for DNA Microarrays. Bioinformatics 17, 520–525 (2001)CrossRefGoogle Scholar
  8. 8.
    Sehgal, M.S.B., Gondal, I., Dooley, L.: A Collateral Missing Value Estimation Algorithm for DNA Microarrays. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2005, USA, pp. 377–380 (2005)Google Scholar
  9. 9.
    Sehgal, M.S.B., Gondal, I., Dooley, L.: Support Vector Machine and Generalized Regression Neural Network Based Classification Fusion Models for Cancer Diagnosis. In: IEEE Hybrid Intelligent Systems, HIS 2004, Japan, pp. 49–54 (2004)Google Scholar
  10. 10.
    Antoniadis, A., Lambert-Lacroix, S., Leblanc, F.: Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics 19(5), 563–570 (2003)CrossRefGoogle Scholar
  11. 11.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasen-beek, M., Mesirov, J.P., Coller, H., Loh, M.L., Down-ing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lan-der, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
  12. 12.
    Broët, P., Lewin, A., Richardson, S., Dalmasso, C., Magdelenat, H.: A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments. Bioinformatics 20, 2562–2571 (2004)CrossRefGoogle Scholar
  13. 13.
    Liu, X., Krishnan, A., Mondry, A.: An Entropy-based gene selection method for cancer classification using microarray data. BMC Bioinformatics 6(76) (2005)Google Scholar
  14. 14.
    Sehgal, M.S.B., Gondal, I., Dooley, L.: Collateral Missing Value Imputation: a new robust missing value estimation algorithm for microarray data. Bioinformatics 21(10), 2417–2423 (2005)CrossRefGoogle Scholar
  15. 15.
    Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Kallioniemi, O.P., Wilfond, B., Borg, A., Trent, J.: Gene-expression profiles in hereditary breast cancer. N. Engl. J. Med. 344(8), 22, 539–548 (2001)CrossRefGoogle Scholar
  16. 16.
    Fort, G., Lambert-Lacroix, S.: Classification using partial least squares with penalized logistic regression. Bioinformatics 21, 1104–1111 (2005)CrossRefGoogle Scholar
  17. 17.
    Harvey, M., Arthur, C.: Fitting models to biological Data using linear and nonlinear regression. Oxford University Press, Oxford (2004)zbMATHGoogle Scholar
  18. 18.
    Yeung, K.Y., Bumgarner, R.E., Raftery, A.E.: Bayesian Model Averaging: development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21(10), 2394–2402 (2005)CrossRefGoogle Scholar
  19. 19.
    Zhou, X., Wang, X., Dougherty, E.R.: Gene Selection Using Logistic Regressions Based on AIC, BIC and MDL Criteria. New Mathematics and Natural Computation 1, 129–145 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Amir, A.J., Yee, C.J., Sotiriou, C., Brantley, K.R., Boyd, J., Liu, E.T.: Gene Expression Profiles of Brca1-Linked, Brca2-Linked, and Sporadic Ovarian Cancers. Journal of the National Cancer Institute 94(13) (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Muhammad Shoaib B. Sehgal
    • 1
  • Iqbal Gondal
    • 1
  • Laurence Dooley
    • 1
  1. 1.Faculty of ITMonash UniversityChurchillAustralia

Personalised recommendations