A Study on the Importance of Differential Prioritization in Feature Selection Using Toy Datasets

Ooi, Chia Huey; Teng, Shyh Wei; Chetty, Madhu

doi:10.1007/978-3-540-88436-1_27

Chia Huey Ooi⁴,
Shyh Wei Teng⁴ &
Madhu Chetty⁴

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5265))

Included in the following conference series:

IAPR International Conference on Pattern Recognition in Bioinformatics

1015 Accesses

Abstract

Previous empirical works have shown the effectiveness of differential prioritization in feature selection prior to molecular classification. We now propose to determine the theoretical basis for the concept of differential prioritization through mathematical analyses of the characteristics of predictor sets found using different values of the DDP (degree of differential prioritization) from realistic toy datasets. Mathematical analyses based on analytical measures such as distance between classes are implemented on these predictor sets. We demonstrate that the optimal value of the DDP is capable of forming a predictor set which consists of classes of features which are well separated and are highly correlated to the target classes – a characteristic of a truly optimal predictor set. From these analyses, the necessity of adjusting the DDP based on the dataset of interest is confirmed in a mathematical manner, indicating that the DDP-based feature selection technique is superior to both simplistic rank-based selection and state-of-the-art equal-priorities scoring methods. Applying similar analyses to real-life multiclass microarray datasets, we obtain further proof of the theoretical significance of the DDP for practical applications.

Download to read the full chapter text

Chapter PDF

A Meta-Review of Feature Selection Techniques in the Context of Microarray Data

McTwo: a two-step feature selection algorithm based on maximal information coefficient

Article Open access 23 March 2016

A study on metaheuristics approaches for gene selection in microarray data: algorithms, applications and open challenges

Article 23 October 2019

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Hall, M.A., Smith, L.A.: Practical feature subset selection for machine learning. In: Paper presented at the Proc. 21st Australasian Computer Science Conf. (1998)
Google Scholar
Ding, C., Long, F., Peng, H.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)
Article PubMed Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Machine Learning Research 3, 1157–1182 (2003)
Google Scholar
Knijnenburg, T.A., Reinders, M.J.T., Wessels, L.F.A.: The selection of relevant and non-redundant features to improve classification performance of microarray gene expression data. In: Proc. 11th Annual Conf. of the Advanced School for Computing and Imaging, Heijen, NL (2005)
Google Scholar
Li, T., Zhang, C., Ogihara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20, 2429–2437 (2004)
Article CAS PubMed Google Scholar
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., et al.: Multi-class cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98, 15149–15154 (2001)
Article CAS PubMed PubMed Central Google Scholar
Chai, H., Domeniconi, C.: An evaluation of gene selection methods for multi-class microarray data classification. In: Paper presented at the Proc. 2nd European Workshop on Data Mining and Text Mining in Bioinformatics (2004)
Google Scholar
Yu, L., Liu, H.: Redundancy based feature selection for microarray data. In: Paper presented at the Proc. of ACM SIGKDD 2004 (2004)
Google Scholar
Ooi, C.H., Chetty, M., Gondal, I.: The role of feature redundancy in tumor classification. In: Zhang, D., Jain, A.K. (eds.) ICBA 2004. LNCS, vol. 3072. Springer, Heidelberg (2004)
Google Scholar
Ooi, C.H., Chetty, M., Teng, S.W.: Relevance, redundancy and differential prioritization in feature selection for multiclass gene expression data. In: Oliveira, J.L., Maojo, V., Martín-Sánchez, F., Pereira, A.S. (eds.) ISBMDA 2005. LNCS (LNBI), vol. 3745. Springer, Heidelberg (2005)
Chapter Google Scholar
Ooi, C.H., Chetty, M., Teng, S.W.: Modeling microarray datasets for efficient feature selection. In: Paper presented at the Proc. 4th Australasian Conf. on Knowledge Discovery and Data Mining (AusDM 2005) (2005a)
Google Scholar
Dudoit, S., Fridlyand, J., Speed, T.: Comparison of discrimination methods for the classification of tumors using gene expression dat. J. Am. Stat. Assoc. 97, 77–87 (2002)
Article CAS Google Scholar
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., et al.: Multi-class cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98, 15149–15154 (2001)
Article CAS PubMed PubMed Central Google Scholar
Munagala, K., Tibshirani, R., Brown, P.: Cancer characterization and feature set extraction by discriminative margin clustering. BMC Bioinformatics 5, 21 (2004)
Article PubMed PubMed Central Google Scholar
Park, M., Hastie, T.: Hierarchical classification using shrunken centroids. Department of Statistics, Stanford University. Technical Report (2005), http://www-stat.stanford.edu/~hastie/Papers/hpam.pdf
Ross, D.T., Scherf, U., Eisen, M.B., Perou, C.M., Rees, C., Spellman, P., et al.: Systematic variation in gene expression patterns in human cancer cell lines. Nat. Genet. 24, 227–235 (2000)
Article CAS PubMed Google Scholar
Yeoh, E.-J., Ross, M.E., Shurtleff, S.A., Williams, W.K., Patel, D., Mahfouz, R., et al.: Classification, subtype discovery, and prediction of outcome in pediatric lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2), 133–143 (2002)
Article CAS PubMed Google Scholar
Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., et al.: Classification and diagnostic prediction of cancers using expression profiling and artificial neural networks. Nat. Med. 7, 673–679 (2001)
Article CAS PubMed PubMed Central Google Scholar
Bhattacharjee, A., Richards, W.G., Staunton, J.E., Li, C., Monti, S., Vasa, P., et al.: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl. Acad. Sci. USA 98, 13790–13795 (2001)
Article CAS PubMed PubMed Central Google Scholar
Armstrong, S.A., Staunton, J.E., Silverman, L.B., Pieters, R., den Boer, M.L., Minden, M.D., et al.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30, 41–47 (2002)
Article CAS PubMed Google Scholar
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, Monash University, Australia
Chia Huey Ooi, Shyh Wei Teng & Madhu Chetty

Authors

Chia Huey Ooi
View author publications
You can also search for this author in PubMed Google Scholar
Shyh Wei Teng
View author publications
You can also search for this author in PubMed Google Scholar
Madhu Chetty
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Gippsland School of IT, Monash University, 3842, Churchill, Victoria, Australia
Madhu Chetty
University of Windsor, 401 Sunset Avenue, Windsor, N9B 3P4, Ontario, Canada
Alioune Ngom
National Institute of Biomedical Innovation, 7-6-8, Saito-Asagi, Ibaraki-shi, 5670085, Osaka, Japan
Shandar Ahmad

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ooi, C.H., Teng, S.W., Chetty, M. (2008). A Study on the Importance of Differential Prioritization in Feature Selection Using Toy Datasets. In: Chetty, M., Ngom, A., Ahmad, S. (eds) Pattern Recognition in Bioinformatics. PRIB 2008. Lecture Notes in Computer Science(), vol 5265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88436-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-88436-1_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88434-7
Online ISBN: 978-3-540-88436-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

A Study on the Importance of Differential Prioritization in Feature Selection Using Toy Datasets

Abstract

Chapter PDF

Similar content being viewed by others

A Meta-Review of Feature Selection Techniques in the Context of Microarray Data

McTwo: a two-step feature selection algorithm based on maximal information coefficient

A study on metaheuristics approaches for gene selection in microarray data: algorithms, applications and open challenges

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

A Study on the Importance of Differential Prioritization in Feature Selection Using Toy Datasets

Abstract

Chapter PDF

Similar content being viewed by others

A Meta-Review of Feature Selection Techniques in the Context of Microarray Data

McTwo: a two-step feature selection algorithm based on maximal information coefficient

A study on metaheuristics approaches for gene selection in microarray data: algorithms, applications and open challenges

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation