Reversible Data Visualization to Support Machine Learning

  • Boris KovalerchukEmail author
  • Vladimir Grishin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10904)


An important challenge for Machine Learning (ML) methods such as the Support Vector Machine (SVM), and others, is the selection of the structure of ML models for given data. This paper shows that the abilities of the pure analytical ML methods to address this challenge are limited. It is due to the fundamental nature of the ML methods, which rely on the available training data, which can result in overgeneralized or overfitted model. In the proposed visual analytics approach, domain experts are put into the “driving seat” of the ML model development to control the model overgeneralization and overfitting. In this approach, domain experts work interactively with multidimensional data, and the ML data classification models, presented in the lossless reversible visualizations. This paper shows that it enhances the ML classification models, and decreases the use of external and irrelevant-to-the-domain assumptions in the ML models.


Multidimensional data Visualization Machine learning Classification Reversible lossless visualization 


  1. Bennett, K.P., Campbell, C.: Support vector machines: hype or hallelujah? ACM SIGKDD Explor. Newsl. 2(2), 1–13 (2000)CrossRefGoogle Scholar
  2. Bennett, K.P., Bredensteiner, E.J.: Duality and geometry in SVM classifiers. In: ICML, pp. 57–64, 29 June 2000Google Scholar
  3. Big Data and Machine Learning (2018).
  4. Carbonell, J.G., Michalski, R.S., Mitchell, T.M.: An overview of machine learning. In: Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning. SYMBOLIC, vol. I, pp. 3–23. Springer, Heidelberg (1983). Scholar
  5. Gorban, A.N., Kégl, B., Wunsch, D.C., Zinovyev, A. (eds.): Principal Manifolds for Data Visualisation and Dimension Reduction. LNCSE, vol. 58. Springer, Heidelberg (2007). ISBN 978-3-540-73749-0CrossRefGoogle Scholar
  6. Grishin, V., Soula, A.: Pictorial analysis: a multi-resolution data visualization for monitoring and diagnosis of complex systems. Int. J. Inf. Sci. 152, 1–24 (2003)Google Scholar
  7. Kovalerchuk, B., Vityaev, E., Ruiz, J.: Consistent knowledge discovery in medical diagnosis. IEEE Eng. Med. Biol. 19(4), 26–37 (2000)CrossRefGoogle Scholar
  8. Kovalerchuk, B., Delizy, F., Riggs, L., Vityaev, E.: Visual data mining and discovery with binarized vectors. In: Holmes, D.E., Jain, L.C. (eds.) Data Mining: Foundations and Intelligent Paradigms. ISRL, vol. 24, pp. 135–156. Springer, Heidelberg (2012). Scholar
  9. Kovalerchuk, B., Grishin, V.: Adjustable general line coordinates for visual knowledge discovery in n-D data. Inf. Vis. (2017).
  10. Kovalerchuk, B.: Visual Knowledge Discovery and Machine Learning. Springer, Cham (2018). Scholar
  11. Kovalerchuk, B., Gharawi, A.: Decreasing occlusion in interactive visual knowledge discovery. In: Human-Computer Interaction International Conference, Las Vegas (2018, in print)Google Scholar
  12. Lichman, M.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA (2013).
  13. McQueen, J., Meila, M., VanderPlas, J., Zhang, Z.: megaman: manifold learning with millions of points (2016).
  14. Taylor, J.: STAT 2002, Data Mining, Stanford (2011).
  15. Pham, H.N.A., Triantaphyllou, E.: The impact of overfitting and overgeneralization on the classification accuracy in data mining. In: Maimon, O., Rokach, L. (eds.) Soft Computing for Knowledge Discovery and Data Mining, pp. 391–431. Springer, Boston (2008). Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceCentral Washington UniversityEllensburgUSA
  2. 2.ViewTrend Int.Palm BayUSA

Personalised recommendations