Manifold Based Data Refinement for Biological Analysis

  • Dao Nam AnhEmail author
Part of the Studies in Computational Intelligence book series (SCI, volume 899)


This work presents the study into a new manifold method for dimension reduction in digital biological analysis. Extracting features from experiments for multiclass classification task using machine learning is challenging due to different resource populations and various biological sub domains. In data training with a large number of features and samples, errors in classification can occur if efficient feature detection method is not pursued. The aim of the paper is to make clear why some subsets of training samples and features are more appropriate than others. We used Bayesian reasoning under multivariate analysis of learning process to validate and then decrease the number of features used in both training and testing. During training, the number of samples is also reduced by suitability assessment. The method have been designed for rapid and scalable learning by combining selection of features and filtering training samples. Further the article includes experiments of the method with SVM classification model and performance evaluation for digital biological analysis.


  1. 1.
    Krohs, U.: Convenience experimentation. Stud. Hist. Philos. Sci. Part C: Stud. Hist. Philos. Biol. Biomed. Sci. 43, 52–57 (2011)CrossRefGoogle Scholar
  2. 2.
    Breur, T.: Statistical power analysis and the contemporary “crisis” in social sciences. J. Mark. Anal. 4(2–3), 61–6 (2016)CrossRefGoogle Scholar
  3. 3.
    Ma, Y., Fu, Y.: Manifold Learning Theory and Applications. CRC Press Taylor & Francis Group, Boca Raton (2012)Google Scholar
  4. 4.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  5. 5.
    Kuncheva, L.I., Rodriguez, J.J.: On feature selection protocols for very low-sample-size data. Pattern Recogn. 81, 660–673 (2018)CrossRefGoogle Scholar
  6. 6.
    Xue, H., Song, Y., Xu, H.M.: Multiple indefinite kernel learning for feature selection. In: Sierra, C. (ed.) Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI 2017), pp. 3210-3216. AAAI Press (2017)Google Scholar
  7. 7.
    Ching, T., Himmelstein, D.S., Beaulieu-Jones, B.K., et al.: Opportunities and obstacles for deep learning in biology and medicine. J. Roy. Soc. Interface 15(141), 20170387 (2018)CrossRefGoogle Scholar
  8. 8.
    Agbehadji, I.E., Millham, R., Fong, S.J., Yang, H.: Kestrel-Based Search Algorithm (KSA) for parameter tuning unto long short term memory (LSTM) network for feature selection in classification of high-dimensional bioinformatics datasets. In: 2018 Federated Conference on Computer Science and Information Systems (FedCSIS), Poznan, pp. 15-20 (2018)Google Scholar
  9. 9.
    Li, C., Wang, X., Dong, W., Yan, J., Liu, Q., Zha, H.: Joint active learning with feature selection via CUR matrix decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 41(6), 1382–1396 (2019)CrossRefGoogle Scholar
  10. 10.
    Christos, B., Woodruff, D.P.: Optimal CUR matrix decompositions. In: STOC 2014 Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing (2014)Google Scholar
  11. 11.
    Manoranjan, D., Huan, L.: Consistency-based search in feature selection. Artif. Intell. 151(1–2), 155–176 (2003)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Tschandl, P., Rosendahl, C., Kittler, H.: The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 (2018)CrossRefGoogle Scholar
  13. 13.
    Barber, D.: Bayesian Reasoning and Machine Learning. Cambridge University Press, Cambridge (2012)zbMATHGoogle Scholar
  14. 14.
    Izenman, A.J.: Modern Multivariate Statistical Techniques Regression, Classification, and Manifold Learning. Springer, Heidelberg (2008). Scholar
  15. 15.
  16. 16.
    ISO 5725-1:1994: Accuracy (trueness and precision) of measurement methods and results - part 1: general principles and definitions (1994)Google Scholar

Copyright information

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021

Authors and Affiliations

  1. 1.Faculty of Information TechnologyElectric Power UniversityHanoiVietnam

Personalised recommendations