Colon cancer data analysis by chameleon algorithm

  • Juanying XieEmail author
  • Yuchen Wang
  • Zhaozhong Wu


Detecting the key differential genes of colon cancers is very important to tell colon cancer patients from normal people. A gene selection algorithm for colon cancers is proposed by using the dynamic modeling properties of chameleon algorithm and its capability to discover any arbitrary shape clusters. This chameleon algorithm based gene selection algorithm comprises three steps. The first step is to select those genes with higher Fisher function values as candidate genes. The second step is to detect gene groups by using chameleon algorithm based on Euclidean distance. The third step is to select the most important gene from each gene cluster to comprise the gene subset by using the information index to classification of each gene. After that the chameleon algorithm is used to detect groups of colon cancer patients and normal people only with genes in gene subset. The final clustering accuracy of chameleon algorithm with the selected genes is up to 85.48%. The clustering analysis to colon cancer data and the comparisons to the other related studies demonstrate that the proposed algorithm is effective in detecting the differential genes of colon cancers.


Gene subset selection Chameleon algorithm Colon cancer Fisher function Information index to classification Clustering 



This work is supported in part by the National Natural Science Foundation of China under Grant No. 61673251. It is also supported by the National Key Research and Development Program of China under Grant No. 2016YFC0901900 and the Fundamental Research Funds for the Central Universities under Grant Nos. GK201701006 and GK201806013. At the same time, it is supported by the Innovation Funds of Graduate Programs at Shaanxi Normal University under Grant Nos. 2015CXS028 and 2016CSY009 as well.


  1. 1.
    Abeel T, Helleputte T, Peer YVd, et al. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics. 2009;26(3):392–8.Google Scholar
  2. 2.
    Alon U, Barkai N, Notterman DA, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA. 1999;96(12):6745–50.Google Scholar
  3. 3.
    Apostolakis J. An introduction to data mining. Data Mining in Crystallography. New York: Springer; 2009.Google Scholar
  4. 4.
    Belhumeur PN, Hespanha JP, Kriegman DJ. Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell. 2002;19(7):711–20.Google Scholar
  5. 5.
    Ben-Dor A, Bruhn L, Friedman N, et al. Tissue classification with gene expression profiles. J Comput Biol. 2000;7(3–4):559–83.Google Scholar
  6. 6.
    Coates A, Ng AY. Learning feature representations with k-means. Lect Notes Comput Sci. 2012;7700:561–80.Google Scholar
  7. 7.
    Díaz-Uriarte R, De Andres SA. Gene selection and classification of microarray data using random forest. BMC Bioinform. 2006;7(1):3.Google Scholar
  8. 8.
    Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol. 2005;3(02):185–205.Google Scholar
  9. 9.
    Golub TR, Slonim DK, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286(5439):531–7.Google Scholar
  10. 10.
    Guha S, Rastogi R, Shim K. Cure: an efficient clustering algorithm for large databases. In: ACM Sigmod Record, vol. 27, pp. 73–84. ACM; 1998.Google Scholar
  11. 11.
    Guha S, Rastogi R, Shim K. Rock: a robust clustering algorithm for categorical attributes. Inf Syst. 1999;25(5):345–66.Google Scholar
  12. 12.
    Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003;3(6):1157–82.zbMATHGoogle Scholar
  13. 13.
    Harrington P. Machine learning in action. New York: Manning Publications; 2012.Google Scholar
  14. 14.
    He JY, Rong J, Sun L, et al. D-ecg: a dynamic framework for cardiac arrhythmia detection from iot-based ecgs. In: International Conference on Web Information Systems Engineering, pp. 85–99. Springer; 2018.Google Scholar
  15. 15.
    Hu H, Li JY, Wang H, et al. Combined gene selection methods for microarray data analysis. In: International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, pp. 976–983. Springer; 2006.Google Scholar
  16. 16.
    Karypis G, Han EHS, Kumar V. Chameleon: hierarchical clustering using dynamic modeling. IEEE Comput. 1999;32(8):68–75.Google Scholar
  17. 17.
    Karypis G, Kumar V. hMETIS 1.5: A hypergraph partitioning package. (1998).
  18. 18.
    Kaufman L, Rousseeuw PJ. Finding groups in data: an introduction to cluster analysis. Finding groups in data: an introduction to cluster analysis. New York: Wiley; 1990. p. 126–63.zbMATHGoogle Scholar
  19. 19.
    Li S, Wu X, Hu X. Gene selection using genetic algorithm and support vectors machines. Soft Computing. 2008;12(7):693–8.Google Scholar
  20. 20.
    Li YX, Li JG, Ruan XG. Study of informative gene selection for tissue classification based on tumor gene expression profiles. Chin J Comput. 2006;29(2):324–30.Google Scholar
  21. 21.
    Li YX, Ruan X. Feature selection for cancer classification based on support vector machine. J Comput Res Dev. 2005;42(10):1796–801.Google Scholar
  22. 22.
    Liu F, Zhou XS, Cao JL, et al, Arrhythmias classification by integrating stacked bidirectional LSTM and two-dimensional CNN. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 136–149. Springer; 2019.Google Scholar
  23. 23.
    Liu F, Zhou XS, Wang Z, et al. Unobtrusive mattress-based identification of hypertension by integrating classification and association rule mining. Sensors. 2019;19(7):1489.Google Scholar
  24. 24.
    Ma JG, Sun L, Wang H, et al. Supervised anomaly detection in uncertain pseudoperiodic data streams. ACM Trans Internet Technol. 2016;16(1):4.Google Scholar
  25. 25.
    Ma S, Song X, Huang J. Supervised group lasso with applications to microarray data analysis. BMC Bioinform. 2007;8(1):60.Google Scholar
  26. 26.
    Powers DM. Evaluation: from precision, recall and f-measure to ROC, informedness, markedness and correlation. J Mach Learn Technol. 2011;2(1):37–63.MathSciNetGoogle Scholar
  27. 27.
    Siuly S, Kabir E, Wang H, et al. Exploring sampling in the detection of multicategory EEG signals. Comput Math Methods Med. 2015. Scholar
  28. 28.
    Stoer M, Wagner F. A simple min cut algorithm. J ACM. 1997;44(4):585–91.MathSciNetzbMATHGoogle Scholar
  29. 29.
    Xie JY, Fan W. Gene markers identification algorithm for detecting colon cancer patients. Pattern Recognit Artif Intell. 2017;30(11):1019–29.Google Scholar
  30. 30.
    Xu JC, Li T, Sun L, et al. Feature gene selection based on SNR and neighborhood rough set. J Acquis Process. 2015;30(5):973–81.Google Scholar
  31. 31.
    Zhang H, Yu CY, Singer B, et al. Recursive partitioning for tumor classification with gene expression microarray data. Proc Natl Acad Sci USA. 2001;98(12):6730–5.Google Scholar
  32. 32.
    Zhang JM, Study on feature selection based on maximum weight and minimum redundancy. Master’s thesis, Dalian: Dalian University of Technology; 2016.Google Scholar
  33. 33.
    Zwick U. The smallest networks on which the ford-fulkerson maximum flow procedure may fail to terminate. Theor Comput Sci. 1995;148(1):165–70.MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Computer ScienceShaanxi Normal UniversityXi’anPeople’s Republic of China

Personalised recommendations