Determining Data Relevance Using Semantic Types and Graphical Interpretation Cues

  • Eduardo Haruo KamiokaEmail author
  • André Freitas
  • Frederico Caroli
  • Siegfried Handschuh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9897)


The increasing volume of data generated and the shortage of professionals trained to extract value from it, raises a question of how to automate data analysis processes. This work investigates how to increase the automation in the data interpretation process by proposing a relevance classification heuristic model, which can be used to express which views over the data are potentially meaningful and relevant. The relevance classification model uses the combination of semantic types derived from the data attributes and visual human interpretation cues as input features. The evaluation shows the impact of these features in improving the prediction of data relevance, where the best classification model achieves a F1 score of 0.906.


  1. Botia, J.A., Garijo, M., Bot’ia, J., Velasco, J., Skarmeta, A.: A Generic Datamining System. Basic Design and Implementation Guidelines (1998)Google Scholar
  2. Bremm, S., von Landesberger, T., Bernard, J., Schreck, T.: Assisted descriptor selection based on visual comparative data analysis. In: Computer Graphics Forum, vol. 30, pp. 891–900. Wiley Online Library (2011)Google Scholar
  3. Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, New york (2005)CrossRefGoogle Scholar
  4. de Souza, D.F.P.: Time-series classification with kernelcanvas and wisard. Thèse de doctorat, Universidade Federal do Rio de Janeiro (2015)Google Scholar
  5. Dinsmore, T.W.: Automated predictive modelling (2014). [Online; posted 09-April-2014]Google Scholar
  6. Duvenaud, D., Lloyd, J.R., Grosse, R., Tenenbaum, J.B., Ghahramani, Z.: Structure discovery in nonparametric regression throughcompositional kernel search (2013). arXiv preprint arXiv:1302.4922
  7. Grosse, R., Salakhutdinov, R.R., Freeman, W.T., Tenenbaum, J.B.: Exploiting compositionality to explore a large space of model structures (2012). arXiv preprint arXiv:1210.4856
  8. Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)CrossRefGoogle Scholar
  9. Lloyd, J.R., Duvenaud, D., Grosse, R., Tenenbaum, J.B., Ghahra-mani, Z.: Automatic construction and natural-language description of nonparametric regression models (2014). arXiv preprint arXiv:1402.4304
  10. Lubinsky, D., Pregibon, D.: Data analysis as search. J. Econometrics 38(1–2), 247–268 (1988)CrossRefGoogle Scholar
  11. Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big data: the next frontier for innovation, competition, and productivity (2011)Google Scholar
  12. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv preprint arXiv:1301.3781
  13. Spott, M., Nauck, D.: Towards the automation of intelligent data analysis. Appl. Soft Comput. 6(4), 348–356 (2006)CrossRefGoogle Scholar
  14. St. Amant, R., Cohen, P.R.: Interaction with a mixed-initiative system for exploratory data analysis. In: Proceedings of the 2nd International Conference on Intelligent User Interfaces, pp. 15–22. ACM (1997)Google Scholar
  15. St. Amant, R., Cohen, P.R.: Intelligent support for exploratory data analysis. J. Comput. Graph. Stat. 7(4), 545–558 (1998)Google Scholar
  16. Záková, M., Křemen, P., Železný, F., Lavrač, N.: Automating knowledge discovery workflow composition through ontology-based planning. Autom. Sci. Eng., IEEE Trans. 8(2), 253–264 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Eduardo Haruo Kamioka
    • 1
    Email author
  • André Freitas
    • 1
  • Frederico Caroli
    • 1
  • Siegfried Handschuh
    • 1
  1. 1.Universität PassauPassauGermany

Personalised recommendations