A Novel Feature Selection Technique for SAGE Data Classification

  • K. R. Seeja
Part of the Communications in Computer and Information Science book series (CCIS, volume 375)


Computational diagnosis of cancer from gene expression data is a binary classification problem. Serial Analysis of Gene Expression (SAGE) is a sequencing technique used for measuring the expression levels of genes. Each SAGE library contains expression levels of thousands of genes (or features). It is impossible to consider all these features for classification and also the general feature selection algorithms are not efficient with this data. In this paper, a data mining technique called closed frequent itemset mining is proposed for feature selection. Subsequently these selected genes or features are used for the training and testing of two well known classifiers- Extreme Learning Machine (ELM) and Support Vector Machine (SVM). The performance evaluation of ELM and SVM classifiers shows that the proposed feature selection method works well with these classifiers.


Closed frequent itemset mining Feature Selection Serial Analysis of Gene Expression Extreme Learning Machine Support Vector Machine Classification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ng, R.T., Sander, J., Sleumer, M.C.: Hierarchical Cluster Analysis of SAGE Data for Cancer Profiling. In: Workshop on Data Mining in Bioinformatics, pp. 65–72 (2001)Google Scholar
  2. 2.
    Tzanis, G., Vlahavas, I.: Mining High Quality Clusters of SAGE Data. In: 2nd VLDB Workshop on Data Mining in Bioinformatics, Vienna, Austria (2007)Google Scholar
  3. 3.
    Becquet, C., Blachon, S., Jeudy, B., Boulicaut, J.F., Gandrillon, O.: Strong-association-Rule Mining for Large-scale Gene-expression Data Analysis: A Case Study on Human SAGE Data. Genome Biology 3(12) (2002)Google Scholar
  4. 4.
    Seeja, K.R., Alam, M.A., Jain, S.K.: An Association Rule Mining Approach for Co-Regulated Signature Genes Identification in Cancer. Journal of Circuits, Systems, and Computers 18(8), 1409–1423 (2009)CrossRefGoogle Scholar
  5. 5.
    Becker, B., Kohavi, R., Sommerfield, D.: Isualizing The Simple Baysian Classifier. In: Information Visualization in Data Mining and Knowledge Discovery, pp. 237–249. Morgan Kaufmann Publishers (2001)Google Scholar
  6. 6.
    Cortes, C., Vapnik, V.: Support Vector Networks. Machine Learning 20(3), 273–297 (1995)zbMATHGoogle Scholar
  7. 7.
    Cunningham, P., Delany, S.J.: K-Nearest Neighbour Classifiers, Technical Report UCD-CSI-2007-4,March 27 (2007)Google Scholar
  8. 8.
    Han, J., Kamber, M.: Data Mining Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)zbMATHGoogle Scholar
  9. 9.
    Jin, X., Xu, A., Zhao, G., Ma, J., Bie, R.: Multinomial Event Naive Bayesian Modeling for SAGE Data Classification. Springer Journal of Computational Statistics 22(11), 133–143 (2007)zbMATHCrossRefGoogle Scholar
  10. 10.
    Jin, X., Xu, A., Zhao, G., Ma, J., Bie, R.: Cancer Classification from Serial Analysis of Gene Expression with Event Models. Springer Journal of Applied Intelligence 29(1), 35–46 (2008)CrossRefGoogle Scholar
  11. 11.
    Gamberoni, G., Storari, S.: Supervised and Unsupervised Learning Techniques for Profiling SAGE Results. In: ECML/PKDD Discovery Challenge Workshop, Pisa, Italy, pp. 121–126 (2004)Google Scholar
  12. 12.
    Okun, O., Priisalu, H.: Ensembles of Nearest Neighbour Classifiers and Serial Analysis of Gene Expression. In: SCAI 2006, Helsinki, Finland, pp. 106–113 (2006)Google Scholar
  13. 13.
    Tzanis, G., Vlahavas, I.: Accurate Classification of SAGE Data Based on Frequent Patterns of Gene Expression. ICTAI (1), 96–100 (2007)Google Scholar
  14. 14.
    Yang, C.-H., Shih, T.-M., Chuang, L.-Y.: Reducing SAGE Data Using Genetic Algorithms. International Journal of Information and Mathematical Sciences 5(4), 268–272 (2009)Google Scholar
  15. 15.
    Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme Learning Machine: Theory and Applications. Neurocomputing 70, 489–501 (2006)CrossRefGoogle Scholar
  16. 16.
    Huang, G.-B., Wang, D.H., Lan, Y.: Extreme Learning Machines: A Survey. International Journal of Machine Leaning and Cybernetics 2(2), 107–122 (2011)CrossRefGoogle Scholar
  17. 17.
    Huang, G.-B., Zhou, H., Ding, X., Zhang, R.: Extreme Learning Machine for Regresion and Multiclass Classification. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics 42(2), 513–529 (2012)CrossRefGoogle Scholar
  18. 18.
    Velculescu, V.E., Zhang, L., Vogelstein, B., Kinzler, K.W.: Serial Analysis of Gene Expression. Science 270, 484–487 (1995)CrossRefGoogle Scholar
  19. 19.
    Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules Between Sets of Items in Large Databases. In: SIGMOD Conference, pp. 207–216 (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • K. R. Seeja
    • 1
  1. 1.Department of Computer ScienceJamia Hamdard UniversityNew DelhiIndia

Personalised recommendations