Gene selection and disease prediction from gene expression data using a two-stage hetero-associative memory

  • Laura Cleofas-Sánchez
  • J. Salvador Sánchez
  • Vicente García
Regular Paper
  • 29 Downloads

Abstract

In general, gene expression microarrays consist of a vast number of genes and very few samples, which represents a critical challenge for disease prediction and diagnosis. This paper develops a two-stage algorithm that integrates feature selection and prediction by extending a type of hetero-associative neural networks. In the first level, the algorithm generates the associative memory, whereas the second level picks the most relevant genes. With the purpose of illustrating the applicability and efficiency of the method proposed here, we use four different gene expression microarray databases and compare their classification performance against that of other renowned classifiers built on the whole (original) feature (gene) space. The experimental results show that the two-stage hetero-associative memory is quite competitive with standard classification models regarding the overall accuracy, sensitivity and specificity. In addition, it also produces a significant decrease in computational efforts and an increase in the biological interpretability of microarrays because worthless (irrelevant and/or redundant) genes are discarded.

Keywords

Associative memory Gene selection Disease prediction Gene expression microarray 

Notes

Acknowledgements

This study was partially supported by the Valencian Council of Education, Research, Culture and Sport [PROMETEOII/2014/062], the Mexican PRODEP [DSA/103.5/15/7004], and the Spanish Ministry of Economy, Industry and Competitiveness under Grant [TIN2013-46522-P].

References

  1. 1.
    Aghajari, Z.H., Teshnehlab, M., Jahed Motlagh, M.R.: A novel chaotic hetero-associative memory. Neurocomputing 167, 352–358 (2015)CrossRefGoogle Scholar
  2. 2.
    Aihara, K., Takabe, T., Toyoda, M.: Chaotic neural networks. Phys. Lett. A 144(6), 333–340 (1990)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Aldape-Pérez, M., Yáñez-Márquez, C., Camacho-Nieto, O., Argüelles-Cruz, A.J.: An associative memory approach to medical decision support systems. Comput. Methods Prog. Biomed. 106(3), 287–307 (2012)CrossRefGoogle Scholar
  4. 4.
    Anderson, J.A.: A simple neural network generating an interactive memory. Math. Biosci. 14, 197–220 (1972)CrossRefMATHGoogle Scholar
  5. 5.
    Ang, J.C., Mirzal, A., Haron, H., Hamed, H.N.A.: Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE ACM Trans Comput. Biol. Bioinform. 13(5), 971–989 (2016)CrossRefGoogle Scholar
  6. 6.
    Arya, K.V., Singh, V., Mitra, P., Gupta, P.: Face recognition using parallel associative memory. In: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Singapore, pp. 1332–1336 (2008)Google Scholar
  7. 7.
    Babu, M., Sarkar, K.: A comparative study of gene selection methods for cancer classification using microarray data. In: Proceedings of the 2nd International Conference on Research in Computational Intelligence and Communication Networks, Kolkata, India, pp. 204–211 (2016)Google Scholar
  8. 8.
    Ben-Hur, A., Weston, J.: A user’s guide to support vector machines. In: Carugo, O., Eisenhaber, F. (eds.) Data Mining Techniques for the Life Sciences, Methods in Molecular Biology, vol. 609, pp. 223–239. Humana Press, New York (2010)CrossRefGoogle Scholar
  9. 9.
    Berns, A.: Cancer: gene expression in diagnosis. Nature 403, 491–492 (2000)CrossRefGoogle Scholar
  10. 10.
    Braga-Neto, U.M., Dougherty, E.R.: Is cross-validation valid for small-sample microarray classification? Bioinformatics 20(3), 374–380 (2004)CrossRefGoogle Scholar
  11. 11.
    Chartier, S., Lepage, R.: Learning and extracting edges from images by a modified hopfield neural network. In: Proceedings of the 16th International Conference on Pattern Recognition, Quebec City, Canada, vol. 3, pp. 431–434 (2002)Google Scholar
  12. 12.
    Cleofas-Sánchez, L., García, V., Marqués, A., Sánchez, J.: Financial distress prediction using the hybrid associative memory with translation. Appl. Soft Comput. 44, 144–152 (2016)CrossRefGoogle Scholar
  13. 13.
    Dougherty, E.R.: Small sample issues for microarray-based classification. Comp. Funct. Genom. 2(1), 28–34 (2001)CrossRefGoogle Scholar
  14. 14.
    Dudoit, S., Fridlyand, J.: Classification in microarray experiments. In: Speed, T.P. (ed.) Statistical Analysis of Gene Expression Microarray Data, pp. 93–158. Chapman & Hall/CRC Press, London (2003)Google Scholar
  15. 15.
    Ein-Dor, L., Zuk, O., Domany, E.: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. 103(15), 5923–5928 (2006)CrossRefGoogle Scholar
  16. 16.
    García, V., Sánchez, J.S.: Mapping microarray gene expression data into dissimilarity spaces for tumor classification. Inform. Sci. 294, 362–375 (2015)MathSciNetCrossRefGoogle Scholar
  17. 17.
    García, V., Sánchez, J.S., Cleofas-Sánchez, L., Ochoa-Domínguez, H.J., López-Orozco, F.: An insight on the ‘large G, small n’ problem in gene-expression microarray classification. In: Proceedings of the 8th Iberian Conference on Pattern Recognition and Image Analysis, Faro, Portugal, pp. 483–490 (2017)Google Scholar
  18. 18.
    Hassanien, A.E., Al-Shammari, E.T., Ghali, N.I.: Computational intelligence techniques in bioinformatics. Comput. Biol. Chem. 47, 37–47 (2013)CrossRefGoogle Scholar
  19. 19.
    Hira, Z.M., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015(ID 198363), 1–13 (2015)CrossRefGoogle Scholar
  20. 20.
    Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)CrossRefGoogle Scholar
  21. 21.
    Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. In: Anderson, J.A., Rosenfeld, E. (eds.) Neurocomputing: Foundations of Research, pp. 457–464. Proceedings of the National Academy of Sciences USA, Cambridge (1988)Google Scholar
  22. 22.
    Hruschka, E.R., Hruschka, E.R., Ebecken, N.F.F.: Towards efficient imputation by nearest-neighbors: a clustering-based approach. In: Proceedings of the 17th Australian Joint Conference on Artificial Intelligence, Cairns, Australia, pp. 513–525 (2004)Google Scholar
  23. 23.
    Hua, J., Xiong, Z., Lowey, J., Suh, E., Dougherty, E.R.: Optimal number of features as a function of sample size for various classification rules. Bioinformatics 21(8), 1509–1515 (2005)CrossRefGoogle Scholar
  24. 24.
    Irsoy, O., Yildiz, O.T., Alpaydin, E.: Design and analysis of classifier learning experiments in bioinformatics: survey and case studies. IEEE ACM Trans. Comput. Biol. 9(6), 1663–1675 (2012)CrossRefGoogle Scholar
  25. 25.
    Japkowicz, N.: Assessment metrics for imbalanced learning. In: He, H., Ma, Y. (eds.) Imbalanced Learning: Foundations, Algorithms, and Applications, pp. 187–210. Wiley IEEE Press, New York (2013)Google Scholar
  26. 26.
    Kohonen, T.: Correlation matrix memories. IEEE Trans. Comput. C–21(4), 353–359 (1972)CrossRefMATHGoogle Scholar
  27. 27.
    Kohonen, T.: Associative Memory. A System—Theoretical Approach. Springer, Berlin (1977)CrossRefMATHGoogle Scholar
  28. 28.
    Kosko, B.: Bidirectional associative memories. IEEE Trans. Syst. Man Cybern. 18(1), 49–60 (1988)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Larrañaga, P., Calvo, B., Santana, R., Bielza, C., Galdiano, J., Inza, I., Lozano, J.A., Armañanzas, R., Santafé, G., Pérez, A., Robles, V.: Machine learning in bioinformatics. Brief. Bioinform. 7(1), 86–112 (2011)CrossRefGoogle Scholar
  30. 30.
    Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C., de Schaetzen, V., Duque, R., Bersini, H., Nowe, A.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE ACM Trans. Comput. Biol. Bioinform. 9(4), 1106–1119 (2012)CrossRefGoogle Scholar
  31. 31.
    Lee, J.W., Lee, J.B., Park, M., Song, S.H.: An extensive evaluation of recent classification tools applied to microarray data. Comput. Stat. Data Anal. 48, 869–885 (2005)CrossRefMATHGoogle Scholar
  32. 32.
    Li, D., Deogun, J., Spaulding, W., Shuart, B.: Towards missing data imputation: a study of fuzzy K-means clustering method. In: Proceedings of the 4th International Conference on Rough Sets and Current Trends in Computing, Uppsala, Sweden, pp. 573–579 (2004)Google Scholar
  33. 33.
    Lu, Y., Han, J.: Cancer classification using gene expression data. Inform. Syst. 28(4), 243–268 (2003)CrossRefMATHGoogle Scholar
  34. 34.
    Ma, S., Huang, J.: Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics 21(2), 4356–4362 (2005)CrossRefGoogle Scholar
  35. 35.
    Mahata, P., Mahata, K.: Selecting differentially expressed genes using minimum probability of classification error. J. Biomed. Inform. 40(6), 775–786 (2007)CrossRefGoogle Scholar
  36. 36.
    Nakano, K.: Associatron—a model on associative memory. IEEE Trans. Syst. Man Cybern. 2(3), 380–388 (1972)CrossRefGoogle Scholar
  37. 37.
    Raspe, E., Decraene, C., Berx, G.: Gene expression profiling to dissect the complexity of cancer biology: pitfalls and promise. Semin. Cancer Biol. 22(3), 250–260 (2012)CrossRefGoogle Scholar
  38. 38.
    Raudys, S.J., Jain, A.K.: Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans. Pattern Anal. Mach. Intell. 13(3), 252–264 (1991)CrossRefGoogle Scholar
  39. 39.
    Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefGoogle Scholar
  40. 40.
    Sharma, N., Ray, A., Sharma, S., Shukla, K., Pradhan, S., Aggarwal, L.: Segmentation and classification of medical images using texture-primitive features: application of BAM-type artificial neural network. J. Med. Phys. 33(3), 119–126 (2008)CrossRefGoogle Scholar
  41. 41.
    Steinbuch, K.: Die lernmatrix. Kybernetik 1(1), 36–45 (1961). In GermanCrossRefMATHGoogle Scholar
  42. 42.
    Sudo, A., Sato, A., Hasegawa, O.: Associative memory for online learning in noisy environments using self-organizing incremental neural network. IEEE Trans. Neural Netw. 20(6), 964–972 (2009)CrossRefGoogle Scholar
  43. 43.
    Sun, X., Liu, Y., Wei, D., Xu, M., Chen, H., Han, J.: Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis. J. Biomed. Inform. 46(2), 252–258 (2013)CrossRefGoogle Scholar
  44. 44.
    Vaishnavi, Y., Shreyas, R., Suhas, S., Surya, U.N., Ladwani, V.M., Ramasubramanian, V.: Associative memory framework for speech recognition: adaptation of hopfield network. In: 2016 IEEE Annual India Conference, Bangalore, India, pp. 1–6 (2016)Google Scholar
  45. 45.
    Villuendas-Rey, Y., Rey-Benguría, C.F., Ferreira-Santiago, A., Camacho-Nieto, O., Yáñez-Márquez, C.: The naïve associative classifier (NAC): a novel, simple, transparent, and accurate classification model evaluated on financial data. Neurocomputing 265, 105–115 (2017)CrossRefGoogle Scholar
  46. 46.
    Weigelt, B., Baehner, F.L., Reis-Filho, J.S.: The contribution of gene expression profiling to breast cancer classification, prognostication and prediction: a retrospective of the last decade. J. Pathol. 220(2), 263–280 (2010)Google Scholar
  47. 47.
    Xing, E.P., Jordan, M.I., Karp, R.M.: Feature selection for high-dimensional genomic microarray data. In: Proceedings of the 8th International Conference on Machine Learning, Williamstown, MA, pp. 601–608 (2001)Google Scholar
  48. 48.
    Yáñez-Márquez, C.: Associative memories based on order relations and binary operators. Ph.D. thesis, Centro de Investigación en Computación - Instituto Politécnico Nacional, Mexico, (In Spanish) (2002)Google Scholar
  49. 49.
    Yoon, Y., Lee, J., Park, S., Bien, S., Chung, H.C., Rha, S.Y.: Direct integration of microarrays for selecting informative genes and phenotype classification. Inf. Sci. 178(1), 88–105 (2008)CrossRefGoogle Scholar
  50. 50.
    Zhang, Z., Zhuo, H., Liu, S., de B Harrington, P.: Classification of cancer patients based on elemental contents of serums using bidirectional associative memory networks. Anal. Chim. Acta 436(2), 281–291 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.National Institute of Genomic MedicineCiudad de MéxicoMexico
  2. 2.Department of Computer Languages and Systems, Institute of New Imaging TechnologiesUniversitat Jaume ICastelló de la PlanaSpain
  3. 3.Multidisciplinary University DivisionUniversidad Autónoma de Ciudad JuárezCiudad JuárezMexico

Personalised recommendations