Advertisement

Reversible Data Visualization to Support Machine Learning

  • Boris KovalerchukEmail author
  • Vladimir Grishin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10904)

Abstract

An important challenge for Machine Learning (ML) methods such as the Support Vector Machine (SVM), and others, is the selection of the structure of ML models for given data. This paper shows that the abilities of the pure analytical ML methods to address this challenge are limited. It is due to the fundamental nature of the ML methods, which rely on the available training data, which can result in overgeneralized or overfitted model. In the proposed visual analytics approach, domain experts are put into the “driving seat” of the ML model development to control the model overgeneralization and overfitting. In this approach, domain experts work interactively with multidimensional data, and the ML data classification models, presented in the lossless reversible visualizations. This paper shows that it enhances the ML classification models, and decreases the use of external and irrelevant-to-the-domain assumptions in the ML models.

Keywords

Multidimensional data Visualization Machine learning Classification Reversible lossless visualization 

References

  1. Bennett, K.P., Campbell, C.: Support vector machines: hype or hallelujah? ACM SIGKDD Explor. Newsl. 2(2), 1–13 (2000)CrossRefGoogle Scholar
  2. Bennett, K.P., Bredensteiner, E.J.: Duality and geometry in SVM classifiers. In: ICML, pp. 57–64, 29 June 2000Google Scholar
  3. Big Data and Machine Learning (2018). http://www.cnblogs.com/luweiseu/p/7826679.html
  4. Carbonell, J.G., Michalski, R.S., Mitchell, T.M.: An overview of machine learning. In: Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning. SYMBOLIC, vol. I, pp. 3–23. Springer, Heidelberg (1983).  https://doi.org/10.1007/978-3-662-12405-5_1CrossRefGoogle Scholar
  5. Gorban, A.N., Kégl, B., Wunsch, D.C., Zinovyev, A. (eds.): Principal Manifolds for Data Visualisation and Dimension Reduction. LNCSE, vol. 58. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-73750-6. ISBN 978-3-540-73749-0CrossRefGoogle Scholar
  6. Grishin, V., Soula, A.: Pictorial analysis: a multi-resolution data visualization for monitoring and diagnosis of complex systems. Int. J. Inf. Sci. 152, 1–24 (2003)Google Scholar
  7. Kovalerchuk, B., Vityaev, E., Ruiz, J.: Consistent knowledge discovery in medical diagnosis. IEEE Eng. Med. Biol. 19(4), 26–37 (2000)CrossRefGoogle Scholar
  8. Kovalerchuk, B., Delizy, F., Riggs, L., Vityaev, E.: Visual data mining and discovery with binarized vectors. In: Holmes, D.E., Jain, L.C. (eds.) Data Mining: Foundations and Intelligent Paradigms. ISRL, vol. 24, pp. 135–156. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-23241-1_7CrossRefzbMATHGoogle Scholar
  9. Kovalerchuk, B., Grishin, V.: Adjustable general line coordinates for visual knowledge discovery in n-D data. Inf. Vis. (2017).  https://doi.org/10.1177/1473871617715860
  10. Kovalerchuk, B.: Visual Knowledge Discovery and Machine Learning. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-73040-0CrossRefGoogle Scholar
  11. Kovalerchuk, B., Gharawi, A.: Decreasing occlusion in interactive visual knowledge discovery. In: Human-Computer Interaction International Conference, Las Vegas (2018, in print)Google Scholar
  12. Lichman, M.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA (2013). http://archive.ics.uci.edu/ml
  13. McQueen, J., Meila, M., VanderPlas, J., Zhang, Z.: megaman: manifold learning with millions of points (2016). https://arxiv.org/abs/1603.02763v1
  14. Taylor, J.: STAT 2002, Data Mining, Stanford (2011). http://statweb.stanford.edu/~jtaylo/courses/stats202/trees.html
  15. Pham, H.N.A., Triantaphyllou, E.: The impact of overfitting and overgeneralization on the classification accuracy in data mining. In: Maimon, O., Rokach, L. (eds.) Soft Computing for Knowledge Discovery and Data Mining, pp. 391–431. Springer, Boston (2008).  https://doi.org/10.1007/978-0-387-69935-6_16CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceCentral Washington UniversityEllensburgUSA
  2. 2.ViewTrend Int.Palm BayUSA

Personalised recommendations