Skip to main content

Classification of Colorectal Cancer Using Clustering and Feature Selection Approaches

  • Conference paper
  • First Online:
Book cover 11th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2017)

Abstract

Accurate cancer classification and responses to treatment are important in clinical cancer research since cancer acts as a family of gene-based diseases. Microarray technology has widely developed to measure gene expression level changes under normal and experimental conditions. Normally, gene expression data are high dimensional and characterized by small sample sizes. Thus, feature selection is needed to find the smallest number of informative genes and improve the classification accuracy and the biological interpretability results. Due to some feature selection methods neglect the interactions among genes, thus, clustering is used to group the similar genes together. Besides, the quality of the selected data can determine the effectiveness of the classifiers. This research proposed clustering and feature selection approaches to classify the gene expression data of colorectal cancer. Subsequently, a feature selection approach based on centroid clustering provide higher classification accuracy compared with other approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aliahmadipour, L., Eslami, E.: GHFHC: generalized hesitant fuzzy hierarchical clustering algorithm. Int. J. Intell. Syst. 31, 855–871 (2016)

    Article  Google Scholar 

  2. Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Nat. Acad. Sci. 96(12), 6745–6750 (1999)

    Article  Google Scholar 

  3. Arakawa, Y., Shimada, M., Utsunomiya, T., Imura, S., Morine, Y., Ikemoto, T., Mori, H., Kanamoto, M., Iwahashi, S., Saito, Y., Takasu, C.: Gene profile in the spleen under massive partial hepatectomy using complementary DNA microarray and pathway analysis. J. Gastroenterol. Hepatol. 29, 1645–1653 (2014). doi:10.1111/jgh.12573

    Article  Google Scholar 

  4. Bajo, J., De Paz, J.F., Rodríguez, S., González, A.: A new clustering algorithm applying a hierarchical method neural network. Logic JIGPL 19, 304–314 (2010)

    Article  MathSciNet  Google Scholar 

  5. Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., Benítez, J.M., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014). doi:10.1016/j.ins.2014.05.042

    Article  Google Scholar 

  6. Campo, L., Aliaga, I.J., De Paz, J.F., García, A.E., Bajo, J., Villarubia, G., Corchado, J.M.: Retreatment predictions in odontology by means of CBR systems. Comput. Intell. Neurosci. 2016, 39 (2016)

    Article  Google Scholar 

  7. Chan, W.H., Mohamad, M.S., Deris, S., Corchado, J.M., Omatu, S., Ibrahim, Z., Kasim, S.: An improved gSVM-SCADL2 with firefly algorithm for identification of informative genes and pathways. Int. J. Bioinf. Res. Appl. 12(1), 72–93 (2016)

    Article  Google Scholar 

  8. Chen, T.S., Tsai, T.H., Chen, Y.T., Lin, C.C., Chen, R.C., Li, S.Y., Chen, H.Y.: A combined K-means and hierarchical clustering method for improving the clustering efficiency of microarray. In: Proceedings of 2005 International Symposium on Intelligent Signal Processing and Communication Systems, ISPACS 2005, pp. 405–408. IEEE, December 2005

    Google Scholar 

  9. Davidson, I., Ravi, S.S.: Agglomerative hierarchical clustering with constraints: theoretical and empirical results. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 59–70. Springer, Heidelberg, October 2005

    Google Scholar 

  10. De Paz, J.F., Bajo, J., López, V.F., Corchado, J.M.: Biomedic organizations: an intelligent dynamic architecture for KDD. Inf. Sci. 224, 49–61 (2013)

    Article  MathSciNet  Google Scholar 

  11. Garzón, J.A.C., González, J.R.: A gene selection approach based on clustering for classification tasks in colon cancer. ADCAIJ Adv. Distrib. Comput. Artif. Intell. J. 4(3), 1–10 (2015)

    Google Scholar 

  12. Ghalwash, M.F., Cao, X.H., Stojkovic, I., Obradovic, Z.: Structured feature selection using coordinate descent optimization. BMC Bioinf. 17(1), 158 (2016)

    Article  Google Scholar 

  13. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002). doi:10.1023/A:1012487302797

    Article  MATH  Google Scholar 

  14. Hall, M.A.: Correlation-based feature selection for machine learning (Doctoral dissertation, The University of Waikato) (1999)

    Google Scholar 

  15. Hancer, E., Karaboga, D.: A comprehensive survey of traditional, merge-split and evolutionary approaches proposed for determination of cluster number. Swarm Evol. Comput. 32, 49–67 (2016)

    Article  Google Scholar 

  16. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)

    Article  Google Scholar 

  17. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis, vol. 344. John Wiley & Sons, Hoboken, NJ, USA (1990)

    Book  MATH  Google Scholar 

  18. Kavya, D.S., Desai, C.D.: Comparative Analysis of K means clustering sequentially and parallely. Int. Res. J. Eng. Technol. 3(4), 2311–2315 (2016)

    Google Scholar 

  19. Kelly, D.L., Rizzino, A.: DNA microarray analyses of genes regulated during the differentiation of embryonic stem cells. Mol. Reprod. Dev. 56, 113–123 (2000)

    Article  Google Scholar 

  20. Khanmohammadi, S., Adibeig, N., Shanehbandy, S.: An improved overlapping k-means clustering method for medical applications. Expert Syst. Appl. 67, 12–18 (2017)

    Article  Google Scholar 

  21. Kothandan, R., Biswas, S.: Identifying microRNAs involved in cancer pathway using support vector machines. Comput. Biol. Chem. 55, 31–36 (2015)

    Article  Google Scholar 

  22. Maroco, J., Silva, D., Rodrigues, A., Guerreiro, M., Santana, I., de Mendonça, A.: Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests. BMC Res. Notes 4(1), 299 (2011)

    Article  Google Scholar 

  23. Mohamad, M., Omatu, S., Deris, S., Misman, M., Yoshioka, M.: Selecting informative genes from microarray data by using hybrid methods for cancer classification. Artif. Life Robot. 13, 414–417 (2009). doi:10.1007/s10015-008-0534-4

    Article  Google Scholar 

  24. Moorthy, K., Mohamad, M.S.: Random forest for gene selection and microarray data classification. Bioinformation 7, 142–146 (2011). doi:10.6026/97320630007142

    Article  Google Scholar 

  25. Önskog, J., Freyhult, E., Landfors, M., Rydén, P., Hvidsten, T.R.: Classification of microarrays; synergistic effects between normalization, gene selection and machine learning. BMC Bioinf. 12, 390 (2011). doi:10.1186/1471-2105-12-390

    Article  Google Scholar 

  26. Roffo, G., Melzi, S., Cristani, M.: Infinite feature selection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4202–4210 (2015)

    Google Scholar 

  27. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Article  MATH  Google Scholar 

  28. Seetha, H., Murty, M.N., Saravanan, R.: Classification by majority voting in feature partitions. Int. J. Inf. Decis. Sci. 8(2), 109–124 (2016)

    Google Scholar 

  29. Tarek, S., Elwahab, R.A., Shoman, M.: Cancer classification ensemble system based on gene expression profiles. In: 2016 5th International Conference on Electronic Devices, Systems and Applications (2016)

    Google Scholar 

  30. Vattani, A.: k-means requires exponentially many iterations even in the plane. Discrete Comput. Geom. 45(4), 596–616 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  31. Wang, Y., Tetko, I.V., Hall, M.A., Frank, E., Facius, A., Mayer, K.F., Mewes, H.W.: Gene selection from microarray data for cancer classification—a machine learning approach. Comput. Biol. Chem. 29(1), 37–46 (2005)

    Article  MATH  Google Scholar 

  32. Zaki, N.M., Deris, S., Illias, R.: Application of string kernels in protein sequence classification. Appl. Bioinf. 4(1), 45–52 (2005)

    Article  Google Scholar 

  33. Zheng, B., Yoon, S.W., Lam, S.S.: Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Syst. Appl. 41(4), 1476–1482 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank Universiti Teknologi Malaysia for funding this research through GUP Research Grants (grant numbers: Q.J130000.2528.12H12 and Q.J130000.2528.11H05). This research is also funded by Malaysian Ministry of Higher Education under a fundamental research grant (grant number: 1559).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohd Saberi Mohamad .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Nies, H.W. et al. (2017). Classification of Colorectal Cancer Using Clustering and Feature Selection Approaches. In: Fdez-Riverola, F., Mohamad, M., Rocha, M., De Paz, J., Pinto, T. (eds) 11th International Conference on Practical Applications of Computational Biology & Bioinformatics. PACBB 2017. Advances in Intelligent Systems and Computing, vol 616. Springer, Cham. https://doi.org/10.1007/978-3-319-60816-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60816-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60815-0

  • Online ISBN: 978-3-319-60816-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics