Skip to main content

Benchmarking Manifold Learning Methods on a Large Collection of Datasets

  • Conference paper
  • First Online:
Genetic Programming (EuroGP 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12101))

Included in the following conference series:

Abstract

Manifold learning, a non-linear approach of dimensionality reduction, assumes that the dimensionality of multiple datasets is artificially high and a reduced number of dimensions is sufficient to maintain the information about the data. In this paper, a large scale comparison of manifold learning techniques is performed for the task of classification. We show the current standing of genetic programming (GP) for the task of classification by comparing the classification results of two GP-based manifold leaning methods: GP-Mal and ManiGP - an experimental manifold learning technique proposed in this paper. We show that GP-based methods can more effectively learn a manifold across a set of 155 different problems and deliver more separable embeddings than many established methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://lvdmaaten.github.io/tsne/.

  2. 2.

    Empirical tests have surprisingly shown the superior performance of this technique in comparison to tournament selection.

  3. 3.

    https://github.com/athril/manigp/.

References

  1. Al-Madi, N., Ludwig, S.A.: Improving genetic programming classification for binary and multiclass datasets. In: 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 166–173. IEEE (2013)

    Google Scholar 

  2. Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems, pp. 585–591 (2002)

    Google Scholar 

  3. Bhowan, U., Zhang, M., Johnston, M.: Genetic programming for classification with unbalanced data. In: Esparcia-Alcázar, A.I., Ekárt, A., Silva, S., Dignum, S., Uyar, A.Ş. (eds.) EuroGP 2010. LNCS, vol. 6021, pp. 1–13. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12148-7_1

    Chapter  Google Scholar 

  4. Borg, I., Groenen, P.: Modern multidimensional scaling: theory and applications. J. Educ. Meas. 40(3), 277–280 (2003)

    Article  Google Scholar 

  5. Brodersen, K.H., Ong, C.S., Stephan, K.E., Buhmann, J.M.: The balanced accuracy and its posterior distribution. In: 2010 20th International Conference on Pattern Recognition, pp. 3121–3124. IEEE (2010)

    Google Scholar 

  6. Cano, A., Ventura, S., Cios, K.J.: Multi-objective genetic programming for feature extraction and data visualization. Soft. Comput. 21(8), 2069–2089 (2015). https://doi.org/10.1007/s00500-015-1907-y

    Article  Google Scholar 

  7. Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmonic Anal. 21(1), 5–30 (2006)

    Article  MathSciNet  Google Scholar 

  8. Cunningham, J.P., Ghahramani, Z.: Linear dimensionality reduction: survey, insights, and generalizations. J. Mach. Learn. Res. 16(1), 2859–2900 (2015)

    MathSciNet  MATH  Google Scholar 

  9. De Backer, S., Naud, A., Scheunders, P.: Non-linear dimensionality reduction techniques for unsupervised feature extraction. Pattern Recogn. Lett. 19(8), 711–720 (1998)

    Article  Google Scholar 

  10. Donoho, D.L., Grimes, C.: Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Nat. Acad. Sci. 100(10), 5591–5596 (2003)

    Article  MathSciNet  Google Scholar 

  11. Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 40(2), 121–144 (2009)

    Article  Google Scholar 

  12. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)

    Article  Google Scholar 

  13. Fortin, F.A., De Rainville, F.M., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)

    MathSciNet  Google Scholar 

  14. Gisbrecht, A., Hammer, B.: Data visualization by nonlinear dimensionality reduction. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 5(2), 51–73 (2015)

    Google Scholar 

  15. Guo, H., Zhang, Q., Nandi, A.K.: Feature extraction and dimensionality reduction by genetic programming based on the fisher criterion. Expert Syst. 25(5), 444–459 (2008)

    Article  Google Scholar 

  16. Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.A.: Feature Extraction: Foundations and Applications. STUDFUZZ, vol. 207. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-35488-8

    Book  MATH  Google Scholar 

  17. Hotelling, H.: Relations between two sets of variates. In: Kotz, S., Johnson, N.L. (eds.) Breakthroughs in Statistics. SSS, pp. 162–190. Springer, New York (1992). https://doi.org/10.1007/978-1-4612-4380-9_14

    Chapter  Google Scholar 

  18. Khalid, S., Khalil, T., Nasreen, S.: A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and Information Conference, pp. 372–378. IEEE (2014)

    Google Scholar 

  19. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)

    Article  MathSciNet  Google Scholar 

  20. La Cava, W., Silva, S., Danai, K., Spector, L., Vanneschi, L., Moore, J.H.: Multidimensional genetic programming for multiclass classification. Swarm Evol. Comput. 44, 260–272 (2019)

    Article  Google Scholar 

  21. La Cava, W., Williams, H., Fu, W., Moore, J.H.: Evaluating recommender systems for AI-driven data science. arXiv preprint arXiv:1905.09205 (2019)

  22. Lee, J.A., Verleysen, M.: Nonlinear Dimensionality Reduction. Springer, New York (2007). https://doi.org/10.1007/978-0-387-39351-3

    Book  MATH  Google Scholar 

  23. Lensen, A., Xue, B., Zhang, M.: Can genetic programming do manifold learning too? In: Sekanina, L., Hu, T., Lourenço, N., Richter, H., García-Sánchez, P. (eds.) EuroGP 2019. LNCS, vol. 11451, pp. 114–130. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16670-0_8

    Chapter  Google Scholar 

  24. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Article  MathSciNet  Google Scholar 

  25. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)

    MATH  Google Scholar 

  26. Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5(1), 32–38 (1957)

    Article  MathSciNet  Google Scholar 

  27. Olson, R.S., La Cava, W., Orzechowski, P., Urbanowicz, R.J., Moore, J.H.: PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Min. 10(1), 36 (2017)

    Article  Google Scholar 

  28. Orzechowski, P., Boryczko, K.: Parallel approach for visual clustering of protein databases. Comput. Inform. 29(6+), 1221–1231 (2012)

    MATH  Google Scholar 

  29. Orzechowski, P., La Cava, W., Moore, J.H.: Where are we now?: a large benchmark study of recent symbolic regression methods. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1183–1190. ACM (2018)

    Google Scholar 

  30. Pearson, K.: LIII. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci. 2(11), 559–572 (1901)

    Article  Google Scholar 

  31. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  32. Rainville, D., Fortin, F.A., Gardner, M.A., Parizeau, M., Gagné, C., et al.: DEAP: a python framework for evolutionary algorithms. In: Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 85–92. ACM (2012)

    Google Scholar 

  33. Rao, C.R.: The utilization of multiple measurements in problems of biological classification. J. Roy. Stat. Soc.: Ser. B (Methodol.) 10(2), 159–203 (1948)

    MathSciNet  MATH  Google Scholar 

  34. Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A., Jain, A.K.: Dimensionality reduction using genetic algorithms. IEEE Trans. Evol. Comput. 4(2), 164–171 (2000)

    Article  Google Scholar 

  35. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)

    Article  Google Scholar 

  36. Sorzano, C.O.S., Vargas, J., Montano, A.P.: A survey of dimensionality reduction techniques. arXiv preprint arXiv:1403.2877 (2014)

  37. Spearmen, C.: General intelligence objectively determined and measured. Am. J. Psychol. 15, 107–197 (1904)

    Google Scholar 

  38. Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)

    Article  Google Scholar 

  39. Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 61(3), 611–622 (1999)

    Article  MathSciNet  Google Scholar 

  40. Torgerson, W.S.: Multidimensional scaling: I. Theory and method. Psychometrika 17(4), 401–419 (1952). https://doi.org/10.1007/BF02288916

    Article  MathSciNet  MATH  Google Scholar 

  41. Vural, E., Guillemot, C.: Out-of-sample generalizations for supervised manifold learning for classification. IEEE Trans. Image Process. 25(3), 1410–1424 (2016)

    Article  MathSciNet  Google Scholar 

  42. Vural, E., Guillemot, C.: A study of the classification of low-dimensional data with supervised manifold learning. J. Mach. Learn. Res. 18(1), 5741–5795 (2017)

    MathSciNet  Google Scholar 

  43. Wattenberg, M., Viégas, F., Johnson, I.: How to use t-SNE effectively. Distill (2016). https://doi.org/10.23915/distill.00002. http://distill.pub/2016/misread-tsne

  44. Weinberger, K.Q., Saul, L.K.: An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In: AAAI, vol. 6, pp. 1683–1686 (2006)

    Google Scholar 

  45. Yao, H., Tian, L.: A genetic-algorithm-based selective principal component analysis (GA-SPCA) method for high-dimensional data feature extraction. IEEE Trans. Geosci. Remote Sens. 41(6), 1469–1478 (2003)

    Article  MathSciNet  Google Scholar 

  46. Zhang, Z., Wang, J.: MLLE: modified locally linear embedding using multiple weights. In: Advances in Neural Information Processing Systems, pp. 1593–1600 (2007)

    Google Scholar 

  47. Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J. Sci. Comput. 26(1), 313–338 (2004)

    Article  MathSciNet  Google Scholar 

  48. Zhao, D., Lin, Z., Tang, X.: Laplacian PCA and its applications. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8. IEEE (2007)

    Google Scholar 

Download references

Acknowledgements

This research was supported in part by PL-Grid Infrastructure and by National Institutes of Health (NIH) grant LM012601. The authors would like to thank Dr. Andrew Lensen from Victoria University of Wellington for his help in running GP-MaL.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patryk Orzechowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Orzechowski, P., Magiera, F., Moore, J.H. (2020). Benchmarking Manifold Learning Methods on a Large Collection of Datasets. In: Hu, T., Lourenço, N., Medvet, E., Divina, F. (eds) Genetic Programming. EuroGP 2020. Lecture Notes in Computer Science(), vol 12101. Springer, Cham. https://doi.org/10.1007/978-3-030-44094-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-44094-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-44093-0

  • Online ISBN: 978-3-030-44094-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics