Advertisement

Elements About Exploratory, Knowledge-Based, Hybrid, and Explainable Knowledge Discovery

  • Miguel CouceiroEmail author
  • Amedeo Napoli
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11511)

Abstract

Knowledge Discovery in Databases (KDD) and especially pattern mining can be interpreted along several dimensions, namely data, knowledge, problem-solving and interactivity. These dimensions are not disconnected and have a direct impact on the quality, applicability, and efficiency of KDD. Accordingly, we discuss some objectives of KDD based on these dimensions, namely exploration, knowledge orientation, hybridization, and explanation. The data space and the pattern space can be explored in several ways, depending on specific evaluation functions and heuristics, possibly related to domain knowledge. Furthermore, numerical data are complex and supervised numerical machine learning methods are usually the best candidates for efficiently mining such data. However, the work and output of numerical methods are most of the time hard to understand, while symbolic methods are usually more intelligible. This calls for hybridization, combining numerical and symbolic mining methods to improve the applicability and interpretability of KDD. Moreover, suitable explanations about the operating models and possible subsequent decisions should complete KDD, and this is far from being the case at the moment. For illustrating these dimensions and objectives, we analyze a concrete case about the mining of biological data, where we characterize these dimensions and their connections. We also discuss dimensions and objectives in the framework of Formal Concept Analysis and we draw some perspectives for future research.

References

  1. 1.
    Aggarwal, C.C., Han, J. (eds.): Frequent Pattern Mining. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-07821-2CrossRefzbMATHGoogle Scholar
  2. 2.
    Alam, M., Buzmakov, A., Codocedo, V., Napoli, A.: Mining definitions from RDF annotations using formal concept analysis. In: Yang, Q., Wooldridge, M. (eds.) Proceedings of IJCAI, pp. 823–829. AAAI Press (2015)Google Scholar
  3. 3.
    Alam, M., Buzmakov, A., Napoli, A.: Exploratory knowledge discovery over web of data. Discret. Appl. Math. 249, 2–17 (2018)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P. (eds.): The Description Logic Handbook. Cambridge University Press, Cambridge (2003)zbMATHGoogle Scholar
  5. 5.
    Belfodil, A., Belfodil, A., Kaytoue, M.: Anytime subgroup discovery in numerical domains with guarantees. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11052, pp. 500–516. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-10928-8_30CrossRefGoogle Scholar
  6. 6.
    Bendimerad, A.A., Plantevit, M., Robardet, C.: Mining exceptional closed patterns in attributed graphs. Knowl. Inf. Syst. 56(1), 1–25 (2018)CrossRefGoogle Scholar
  7. 7.
    Bertet, K., Demko, C., Viaud, J.-F., Guérin, C.: Lattices, closures systems and implication bases: a survey of structural aspects and algorithms. Theor. Comput. Sci. 743, 93–109 (2018)MathSciNetCrossRefGoogle Scholar
  8. 8.
    De Bie, T.: Subjective interestingness in exploratory data mining. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds.) IDA 2013. LNCS, vol. 8207, pp. 19–31. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-41398-8_3CrossRefGoogle Scholar
  9. 9.
    Blockeel, H.: Data mining: from procedural to declarative approaches. New Gener. Comput. 33(2), 115–135 (2015)CrossRefGoogle Scholar
  10. 10.
    Brachman, R.J., Anand, T.: The process of knowledge discovery in databases. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 37–57. AAAI Press/MIT Press (1996)Google Scholar
  11. 11.
    Brazdil, P., Giraud-Carrier, C.G., Soares, C., Vilalta, R.: Metalearning - Applications to Data Mining. Cognitive Technologies. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-540-73263-1CrossRefzbMATHGoogle Scholar
  12. 12.
    Buzmakov, A., Kuznetsov, S.O., Napoli, A.: Scalable estimates of concept stability. In: Glodeanu, C.V., Kaytoue, M., Sacarea, C. (eds.) ICFCA 2014. LNCS (LNAI), vol. 8478, pp. 157–172. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-07248-7_12CrossRefzbMATHGoogle Scholar
  13. 13.
    Buzmakov, A., Kuznetsov, S.O., Napoli, A.: Fast generation of best interval patterns for nonmonotonic constraints. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9285, pp. 157–172. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-23525-7_10CrossRefGoogle Scholar
  14. 14.
    Carpineto, C., Romano, G.: Concept Data Analysis: Theory and Applications. Wiley, Chichester (2004)CrossRefGoogle Scholar
  15. 15.
    Codocedo, V., Lykourentzou, I., Napoli, A.: A semantic approach to concept lattice-based information retrieval. Ann. Math. Artif. Intell. 72, 169–195 (2014)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Codocedo, V., Napoli, A.: Formal concept analysis and information retrieval – a survey. In: Baixeries, J., Sacarea, C., Ojeda-Aciego, M. (eds.) ICFCA 2015. LNCS (LNAI), vol. 9113, pp. 61–77. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-19545-2_4CrossRefzbMATHGoogle Scholar
  17. 17.
    d’Avila Garcez, A.S., et al.: Neural-symbolic learning and reasoning: contributions and challenges. In: AAAI Spring Symposium (2015)Google Scholar
  18. 18.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000).  https://doi.org/10.1007/3-540-45014-9_1CrossRefGoogle Scholar
  19. 19.
    Duivesteijn, W., Feelders, A., Knobbe, A.J.: Exceptional Model Mining - supervised descriptive local pattern mining with complex target concepts. Data Min. Knowl. Discov. 30(1), 47–98 (2016)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Duquenne, V.: Latticial structures in data analysis. Theor. Comput. Sci. 217, 407–436 (1999)CrossRefGoogle Scholar
  21. 21.
    Eklund, P., Villerd, J.: A survey of hybrid representations of concept lattices in conceptual knowledge processing. In: Kwuida, L., Sertkaya, B. (eds.) ICFCA 2010. LNCS (LNAI), vol. 5986, pp. 296–311. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-11928-6_21CrossRefzbMATHGoogle Scholar
  22. 22.
    Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27(8), 861–874 (2006)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Ganter, B., Wille, R.: Formal Concept Analysis - Mathematical Foundations. Springer, Heidelberg (1999).  https://doi.org/10.1007/978-3-642-59830-2CrossRefzbMATHGoogle Scholar
  24. 24.
    Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: Delugach, H.S., Stumme, G. (eds.) ICCS-ConceptStruct 2001. LNCS (LNAI), vol. 2120, pp. 129–142. Springer, Heidelberg (2001).  https://doi.org/10.1007/3-540-44583-8_10CrossRefGoogle Scholar
  25. 25.
    Ganter, B., Obiedkov, S.A.: Conceptual Exploration. Springer, Heidelberg (2016).  https://doi.org/10.1007/978-3-662-49291-8CrossRefzbMATHGoogle Scholar
  26. 26.
    Ganter, B., Stumme, G., Wille, R. (eds.): Formal Concept Analysis. LNCS (LNAI), vol. 3626. Springer, Heidelberg (2005).  https://doi.org/10.1007/978-3-540-31881-1CrossRefzbMATHGoogle Scholar
  27. 27.
    Grgic-Hlaca, N., Zafar, M.B., Gummadi, K.P., Weller, A.: Beyond distributive fairness in algorithmic decision making: feature selection for procedurally fair learning. In: McIlraith, S.A., Weinberger, K.Q. (eds.) Proceedings of AAAI 2018, pp. 51–60. AAAI Press (2018)Google Scholar
  28. 28.
    Grissa, D., Comte, B., Pétéra, M., Pujos-Guillot, E., Napoli, A.: A hybrid and exploratory approach to knowledge discovery in metabolomic data. Discrete Applied Mathematics (2019, to be published)Google Scholar
  29. 29.
    Grissa, D., Comte, B., Pujos-Guillot, E., Napoli, A.: A hybrid knowledge discovery approach for mining predictive biomarkers in metabolomic data. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9851, pp. 572–587. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46128-1_36CrossRefGoogle Scholar
  30. 30.
    Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Gianotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5) (2018)CrossRefGoogle Scholar
  31. 31.
    Hilario, M., Nguyen, P., Do, H., Woznica, A., Kalousis, A.: Ontology-based meta-mining of knowledge discovery workflows. In: Jankowski, N., Duch, W., Grabczewski, K. (eds.) Meta-Learning in Computational Intelligence, vol. 358, pp. 273–315. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-20980-2_9CrossRefGoogle Scholar
  32. 32.
    Holzinger, A., Dehmer, M., Jurisica, I.: Knowledge discovery and interactive data mining in bioinformatics - state-of-the-art, future challenges and research directions. BMC Bioinform. 15(S-6), I1 (2014)Google Scholar
  33. 33.
    Hristoskova, A., Boeva, V., Tsiporkova, E.: An integrative clustering approach combining particle swarm optimization and formal concept analysis. In: Böhm, C., Khuri, S., Lhotská, L., Renda, M.E. (eds.) ITBAM 2012. LNCS, vol. 7451, pp. 84–98. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-32395-9_7CrossRefGoogle Scholar
  34. 34.
    Janowicz, K., van Harmelen, F., Hendler, J.A., Hitzler, P.: Why the data train needs semantic rails. AI Mag. 36(1), 5–14 (2015)CrossRefGoogle Scholar
  35. 35.
    Kaytoue, M., Codocedo, V., Baixeries, J., Napoli, A.: Three interrelated FCA methods for mining biclusters of similar values on columns. In: Bertet, K., Rudolph, S. (eds.) Proceedings of CLA. CEUR Workshop Proceedings, vol. 1252, pp. 243–254 (2014)Google Scholar
  36. 36.
    Kaytoue, M., Codocedo, V., Buzmakov, A., Baixeries, J., Kuznetsov, S.O., Napoli, A.: Pattern structures and concept lattices for data mining and knowledge processing. In: Bifet, A., et al. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9286, pp. 227–231. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-23461-8_19CrossRefGoogle Scholar
  37. 37.
    Kaytoue, M., Kuznetsov, S.O., Macko, J., Napoli, A.: Biclustering meets triadic concept analysis. Ann. Math. Artif. Intell. 70(1–2), 55–79 (2014)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Kaytoue, M., Kuznetsov, S.O., Napoli, A., Duplessis, S.: Mining gene expression data with pattern structures in formal concept analysis. Inf. Sci. 181(10), 1989–2001 (2011)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Kaytoue, M., Plantevit, M., Zimmermann, A., Bendimerad, A.A., Robardet, C.: Exceptional contextual subgraph mining. Mach. Learn. 106(8), 1171–1211 (2017)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Kuznetsov, S.O., Makhalova, T.P.: On interestingness measures of formal concepts. Inf. Sci. 442–443, 202–219 (2018)MathSciNetCrossRefGoogle Scholar
  41. 41.
    Lavrac, N., Kavsek, B., Flach, P.A., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)MathSciNetGoogle Scholar
  42. 42.
    Makhalova, T.P., Kuznetsov, S.O., Napoli, A.: A first study on what MDL can do for FCA. In: Ignatov, D.I., Nourine, L. (eds.) Proceedings of CLA, CEUR Workshop Proceedings, vol. 2123, pp. 25–36 (2018)Google Scholar
  43. 43.
    Nguyen, P., Hilario, M., Kalousis, A.: Using meta-mining to support data mining workflow planning and optimization. J. Artif. Intell. Res. (JAIR) 51, 605–644 (2014)CrossRefGoogle Scholar
  44. 44.
    Ribeiro, M.T., Singh, S., Guestrin, C.: “Why Should I Trust You?”: explaining the predictions of any classifier. In: Krishnapuram, B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R. (eds.) Proceedings of SIGKDD, pp. 1135–1144. ACM (2016)Google Scholar
  45. 45.
    Rouane-Hacene, M., Huchard, M., Napoli, A., Valtchev, P.: Relational concept analysis: mining concept lattices from multi-relational data. Ann. Math. Artif. Intell. 67(1), 81–108 (2013)MathSciNetCrossRefGoogle Scholar
  46. 46.
    Sagi, O., Rokach, L.: Ensemble learning: a survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8(4) (2018)Google Scholar
  47. 47.
    Sourek, G., Aschenbrenner, V., Zelezný, F., Schockaert, S., Kuzelka, O.: Lifted relational neural networks: efficient learning of latent relational structures. J. Artif. Intell. Res. 62, 140–151 (2018)MathSciNetCrossRefGoogle Scholar
  48. 48.
    Tan, P.-N., Steinbach, M., Karpatne, A., Kumar, V.: Introduction to Data Mining, 2nd edn. Pearson, New York (2018)Google Scholar
  49. 49.
    Tran, S.N., d’Avila Garcez, A.S.: Deep logic networks: inserting and extracting knowledge from deep belief networks. IEEE Trans. Neural Netw. Learn. Syst. 29(2), 246–258 (2018)MathSciNetCrossRefGoogle Scholar
  50. 50.
    Tukey, J.W.: Exploratory Data Analysis. Addison-Wesley Publishing Company, Reading (1977)zbMATHGoogle Scholar
  51. 51.
    Ugarte, W., et al.: Skypattern mining: from pattern condensed representations to dynamic constraint satisfaction problems. Artif. Intell. 244, 48–69 (2017)MathSciNetCrossRefGoogle Scholar
  52. 52.
    Leeuwen, M.: Interactive data exploration using pattern mining. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 169–182. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-662-43968-5_9CrossRefGoogle Scholar
  53. 53.
    Vreeken, J., Tatti, N.: Interesting patterns. In: Aggarwal and Han [1], pp. 105–134Google Scholar
  54. 54.
    Yoneda, Y., Sugiyama, M., Washio, T.: Learning graph representation via formal concept analysis. CoRR, abs/1812.03395 (2018)Google Scholar
  55. 55.
    Zafar, M.B., Valera, I., Gomez-Rodriguez, M., Gummadi, K.P., Weller, A.: From parity to preference-based notions of fairness in classification. In: Guyon, I., et al. (eds.) Proceedings of NIPS, pp. 228–238 (2017)Google Scholar
  56. 56.
    Zaki, M.J., Meira Jr., W.: Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, New York (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Université de Lorraine, CNRS, Inria, LORIANancyFrance

Personalised recommendations