Investigating the Class-Specific Relevance of Predictor Sets Obtained from DDP-Based Feature Selection Technique

  • Chia Huey Ooi
  • Madhu Chetty
  • Shyh Wei Teng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4146)


Feature selection is crucial to tumor classification due to the high dimensionality of microarray datasets. With the aid of the degree of differential prioritization (DDP) between relevance and antiredundancy, our proposed DDP-based feature selection technique is capable of achieving better accuracies than those reported in previous studies, while using fewer genes in the predictor set. Additionally, we discovered a strong correlation between the DDP parameter in our feature selection technique and the number of classes in the dataset. This leads us to question if the measure of relevance in our feature selection technique becomes less efficient at capturing the class-specific relevance for each individual class of the dataset as the number of classes increases. In this study, we analyze the class-specific relevance of the predictor sets found using our feature selection technique. The analysis ultimately lays down the theoretical foundation for a beneficial improvement to our feature selection technique.


Feature Selection Benchmark Dataset Microarray Dataset Class Accuracy Tumor Classification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Hall, M.A., Smith, L.A.: Practical feature subset selection for machine learning. In: McDonald, C. (ed.) Proc. 21st Australian Computer Science Conference, pp. 181–191. Springer, Singapore (1998)Google Scholar
  2. 2.
    Ding, C., Peng, H.: Minimum Redundancy Feature Selection from Microarray Gene Expression Data. In: Proc. 2nd IEEE Computational Systems Bioinformatics Conference, pp. 523–529. IEEE Computer Society, Los Alamitos (2003)Google Scholar
  3. 3.
    Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)zbMATHCrossRefGoogle Scholar
  4. 4.
    Ooi, C.H., Chetty, M., Teng, S.W.: Relevance, redundancy and differential prioritization in feature selection for multiclass gene expression data. In: Oliveira, J.L., Maojo, V., Martín-Sánchez, F., Pereira, A.S. (eds.) ISBMDA 2005. LNCS (LNBI), vol. 3745, pp. 367–378. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  5. 5.
    Dudoit, S., Fridlyand, J., Speed, T.: Comparison of discrimination methods for the classification of tumors using gene expression data. JASA 97, 77–87 (2002)zbMATHMathSciNetGoogle Scholar
  6. 6.
    Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J.P., Poggio, T., Gerald, W., Loda, M., Lander, E.S., Golub, T.R.: Multi-class cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. 98, 15149–15154 (2001)CrossRefGoogle Scholar
  7. 7.
    Ross, D.T., Scherf, U., Eisen, M.B., Perou, C.M., Spellman, P., Iyer, V., Jeffrey, S.S., Van de Rijn, M., Waltham, M., Pergamenschikov, A., Lee, J.C.F., Lashkari, D., Shalon, D., Myers, T.G., Weinstein, J.N., Botstein, D., Brown, P.O.: Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics 24(3), 227–234 (2000)CrossRefGoogle Scholar
  8. 8.
    Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E.J., Lander, E.S., Wong, W., Johnson, B.E., Golub, T.R., Sugarbaker, D.J., Meyerson, M.: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma sub-classes. Proc. Natl. Acad. Sci. 98, 13790–13795 (2001)CrossRefGoogle Scholar
  9. 9.
    Armstrong, S.A., Staunton, J.E., Silverman, L.B., Pieters, R., den Boer, M.L., Minden, M.D., Sallan, S.E., Lander, E.S., Golub, T.R., Korsmeyer, S.J.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics 30, 41–47 (2002)CrossRefGoogle Scholar
  10. 10.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  11. 11.
    Yeoh, E.-J., Ross, M.E., Shurtleff, S.A., Williams, W.K., Patel, D., Mahfouz, R., Behm, F.G., Raimondi, S.C., Relling, M.V., Patel, A., Cheng, C., Campana, D., Wilkins, D., Zhou, X., Li, J., Liu, H., Pui, C.-H., Evans, W.E., Naeve, C., Wong, L., Downing, J.R.: Classification, subtype discovery, and prediction of outcome in pediatric lymphoblastic leukemia by gene expression profiling. Cancer Cell 1, 133–143 (2002)CrossRefGoogle Scholar
  12. 12.
    Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using expression profiling and artificial neural networks. Nature Medicine 7, 673–679 (2001)CrossRefGoogle Scholar
  13. 13.
    Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large margin DAGs for multiclass classification. Advances in Neural Information Processing Systems 12, 547–553 (2000)Google Scholar
  14. 14.
    Ooi, C.H., Chetty, M., Teng, S.W.: OVA Scheme vs. Single Machine Approach in Feature Selection for Microarray Datasets. In: Perner, P. (ed.) ICDM 2006. LNCS, vol. 4065, pp. 10–23. Springer, Heidelberg (in press, 2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Chia Huey Ooi
    • 1
  • Madhu Chetty
    • 1
  • Shyh Wei Teng
    • 1
  1. 1.Gippsland School of Information TechnologyMonash UniversityChurchillAustralia

Personalised recommendations