A Novel Discretization Method for Microarray-Based Cancer Classification

  • Ding Li
  • Rui Li
  • Hong-Qiang Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7389)


In this paper, we propose a gene expression diversity-based method for gene expression discretization. By counting the numbers of samples of different classes in an open expression intervals, the method calculates class distribution diversity and then expression diversity for genes. Based on the gene expression diversity, three discretization criteria are established for discretizing gene expression levels. We evaluate the proposed method on the publicly available leukemia dataset and compare it with several previous methods.


gene expression gene expression diversity gene regulation discretization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Berns, A.: Cancer: gene expression diagnosis. Nature 403, 491–492 (2000)CrossRefGoogle Scholar
  2. 2.
    Borges, H.B., Nievola, J.C.: Feature Selection as a Preprocessing Step for Classification in Gene Expression Data. In: Seventh International Conference on Intelligent Systems Design and Applications, pp. 157–162 (2007), doi:10.1109/ISDA.2007.80Google Scholar
  3. 3.
    Boullé, M.: MODL: A Bayes Optimal Discretization Method for Continuous Attributes. Machine Learning, 131–165 (2006)Google Scholar
  4. 4.
    Brijs, T., Vanhoof, K.: Cost-sensitive Discretization of Numeric Attributes. In: Żytkow, J.M. (ed.) PKDD 1998. LNCS, vol. 1510, pp. 102–110. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  5. 5.
    Butterworth, R., Simovici, D.A., Santos, G.S., Ohno-Machado, L.: A Greedy Algorithm for Supervised Discretization. Journal of Biomedical Informatics, 285–292 (2004)Google Scholar
  6. 6.
    Dougherty, J., Kohavi, R., Sahami, M.: Supervised and Unsupervised Discretization of Continuous Features. In: Prieditis, A., Russell, S.J. (eds.) Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, California, pp. 194–202 (1995)Google Scholar
  7. 7.
    Fayyad, U.M., Irani, K.B.: Multi-interval Discretization of Continuous-valued Attributes for Classification Learning. In: Proceedings of the Thirteenth International Joint Conference on AI (IJCAI 1993), Chamberry, France, pp. 1022–1027 (1993)Google Scholar
  8. 8.
    Kohavi, R., Sahami, M.: Error-based and Entropy-based Discretization of Continuous Features. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 114–119. AAAI Press, Portland (1996)Google Scholar
  9. 9.
    Liu, H., Hissain, F., Tan, C.L., Dash, M.: Discretization: An Enabling Technique. Data Mining and Knowledge Discovery, 393–423 (2002)Google Scholar
  10. 10.
    Kerber, R.: ChiMerge: Discretization of Numeric Attributes. In: Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 123–128 (1992)Google Scholar
  11. 11.
    Wang, H.Q., Huang, D.S.: Regulation Probability Method for Gene Selection. Pattern Recognition Letters 27, 116–122 (2006)zbMATHCrossRefGoogle Scholar
  12. 12.
    Golub, T.R., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Ding Li
    • 1
    • 2
  • Rui Li
    • 1
    • 2
  • Hong-Qiang Wang
    • 2
  1. 1.Department of AutomationUniversity of Science and Technology of ChinaHefeiP.R. China
  2. 2.Intelligent Computation Lab, Hefei Institute of Intelligent MachinesChinese Academy of ScienceHefeiP.R. China

Personalised recommendations