New Gene Subset Selection Approaches Based on Linear Separating Genes and Gene-Pairs
The concept of linear separability of gene expression data sets with respect to two classes has been recently studied in the literature. The problem is to efficiently find all pairs of genes which induce a linear separation of the data. It has been suggested that an underlying molecular mechanism relates together the two genes of a separating pair to the phenotype under study, such as a specific cancer. In this paper we study the Containment Angle (CA) defined on the unit circle for a linearly separating gene-pair (LS-pair) as an alternative to the paired t-test ranking function for gene selection. Using the CA we also show empirically that a given classifier’s error is related to the degree of linear separability of a given data set. Finally we propose gene subset selection methods based on the CA ranking function for LS-pairs and a ranking function for linearly separation genes (LS-genes), and which select only among LS-genes and LS-pairs. Our methods give better results in terms of subset sizes and classification accuracy when compared to a well-performing method, on many data sets.
KeywordsLinearly Separating Features Gene Expression Microarray Gene Selection Feature Ranking Filtering Subset Selection
- 1.Unger, G., Chor, B.: Linear Separability of Gene Expression Datasets. IEEE/ACM Transactions on Computational Biology and Bioinformatics 7(2) (April-June 2010)Google Scholar
- 2.Bø, T.H., Jonassen, I.: New Feature Subset Selection Procedures for Classification of Expression Profiles. Genome Biology 3(4), 0017.1–0017.11 (2002)Google Scholar
- 6.Beer, D.G., et al.: Gene-Expression Profiles Predict Survival of Patients with Lung Adenocarcinoma. Nature Medicine 8(8), 816–824 (2002)Google Scholar
- 9.Gordon, G.J., et al.: Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma. Cancer Research 62(17), 4963–4967 (2002)Google Scholar
- 10.Jafarian, A., Ngom, A.: A New Gene Subset Selection Approach Based on Linear Separating Gene Pairs. In: IEEE International Conference on Computational Advances in Bio and medical Sciences (ICCABS 2011), Orlando FL, February 3-5, pp. 105–110 (2011)Google Scholar