New Gene Subset Selection Approaches Based on Linear Separating Genes and Gene-Pairs

  • Amirali Jafarian
  • Alioune Ngom
  • Luis Rueda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7036)


The concept of linear separability of gene expression data sets with respect to two classes has been recently studied in the literature. The problem is to efficiently find all pairs of genes which induce a linear separation of the data. It has been suggested that an underlying molecular mechanism relates together the two genes of a separating pair to the phenotype under study, such as a specific cancer. In this paper we study the Containment Angle (CA) defined on the unit circle for a linearly separating gene-pair (LS-pair) as an alternative to the paired t-test ranking function for gene selection. Using the CA we also show empirically that a given classifier’s error is related to the degree of linear separability of a given data set. Finally we propose gene subset selection methods based on the CA ranking function for LS-pairs and a ranking function for linearly separation genes (LS-genes), and which select only among LS-genes and LS-pairs. Our methods give better results in terms of subset sizes and classification accuracy when compared to a well-performing method, on many data sets.


Linearly Separating Features Gene Expression Microarray Gene Selection Feature Ranking Filtering Subset Selection 


  1. 1.
    Unger, G., Chor, B.: Linear Separability of Gene Expression Datasets. IEEE/ACM Transactions on Computational Biology and Bioinformatics 7(2) (April-June 2010)Google Scholar
  2. 2.
    Bø, T.H., Jonassen, I.: New Feature Subset Selection Procedures for Classification of Expression Profiles. Genome Biology 3(4), 0017.1–0017.11 (2002)Google Scholar
  3. 3.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeeck, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  4. 4.
    Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96, 6745–6750 (1999)CrossRefGoogle Scholar
  5. 5.
    Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology 3(2), 185–205 (2005)CrossRefGoogle Scholar
  6. 6.
    Beer, D.G., et al.: Gene-Expression Profiles Predict Survival of Patients with Lung Adenocarcinoma. Nature Medicine 8(8), 816–824 (2002)Google Scholar
  7. 7.
    Bhattacharjee, A., et al.: Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinoma Subclasses. Proc. Nat’l Academy of Sciences of the USA 98(24), 13790–13795 (2001)CrossRefGoogle Scholar
  8. 8.
    Kohavi, R., John, G.: Wrapper for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)CrossRefzbMATHGoogle Scholar
  9. 9.
    Gordon, G.J., et al.: Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma. Cancer Research 62(17), 4963–4967 (2002)Google Scholar
  10. 10.
    Jafarian, A., Ngom, A.: A New Gene Subset Selection Approach Based on Linear Separating Gene Pairs. In: IEEE International Conference on Computational Advances in Bio and medical Sciences (ICCABS 2011), Orlando FL, February 3-5, pp. 105–110 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Amirali Jafarian
    • 1
  • Alioune Ngom
    • 1
  • Luis Rueda
    • 1
  1. 1.School of Computer ScienceUniversity of WindsorWindsorCanada

Personalised recommendations