Simultaneous Gene Selection and Cancer Classification Using a Hybrid Intelligent Water Drop Approach

  • Manish Kumar
  • Shameek Ghosh
  • Jayaraman Valadi
  • Patrick Siarry
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8251)


Computational Analysis of gene expression data is extremely difficult, due to the existence of a huge number of genes and less number of samples (limited number of patients). Thus,it is of significant importance to provide a subset of the most informative genesto a learning algorithm, for constructing robust prediction models. In this study, we propose a hybrid Intelligent Water Drop (IWD) - Support Vector Machines (SVM) algorithm, with weighted gene ranking as a heuristic, for simultaneous gene subset selection and cancer prediction. Our results, evaluated on three cancer datasets, demonstrate that the genes selected by the IWD technique yield classification accuracies comparable to previously reported algorithms.


Gene Selection Cancer Classification Intelligent Water Drop based Optimization Weighted Ranking 


  1. 1.
    Sharma, S., Ghosh, S., Anantharaman, N., Jayaraman, V.K.: Simultaneous informative gene extraction and cancer classification using ACO-antMiner and ACO-random forests. In: Satapathy, S.C., Avadhani, P.S., Abraham, A. (eds.) Proceedings of the InConINDIA 2012. AISC, vol. 132, pp. 755–761. Springer, Heidelberg (2012)Google Scholar
  2. 2.
    Shah-Hosseini, H.: Problem solving by intelligent water drops. In: IEEE Congress on Evolutionary Computation, CEC 2007, pp. 3226–3231 (2007)Google Scholar
  3. 3.
    Han, J., Kamber, M.: Data mining: concepts and techniques. Morgan Kaufmann (2006)Google Scholar
  4. 4.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)CrossRefGoogle Scholar
  5. 5.
    Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT 1992, pp. 144–152. ACM, New York (1992)CrossRefGoogle Scholar
  6. 6.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)CrossRefzbMATHGoogle Scholar
  7. 7.
    Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011)Google Scholar
  8. 8.
    Kent ridge bio-medical dataset,
  9. 9.
    Martn-Merino, M., Blanco, A., De Las Rivas, J.: Combining dissimilarity based classifiers for cancer prediction using gene expression profiles. BMC Bioinformatics 8 (2008)Google Scholar
  10. 10.
    Cong, G., Tan, K.-L., Tung, A.K.H., Xu, X.: Mining top-k covering rule groups for gene expression data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2005, pp. 670–681. ACM, New York (2005)Google Scholar
  11. 11.
    Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(1), 67–82 (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Manish Kumar
    • 1
  • Shameek Ghosh
    • 2
  • Jayaraman Valadi
    • 2
    • 3
  • Patrick Siarry
    • 4
  1. 1.Bioinformatics CenterUniversity of PunePuneIndia
  2. 2.Centre for Development of Advanced ComputingPuneIndia
  3. 3.Shiv Nadar UniversityIndia
  4. 4.Université Paris-EstCréteil, Val-de-Marne, LiSSi (EA 3956)France

Personalised recommendations