Skip to main content

Ontology-Based Meta-Mining of Knowledge Discovery Workflows

  • Chapter
Book cover Meta-Learning in Computational Intelligence

Part of the book series: Studies in Computational Intelligence ((SCI,volume 358))

Abstract

This chapter describes a principled approach to meta-learning that has three distinctive features. First, whereas most previous work on meta-learning focused exclusively on the learning task, our approach applies meta-learning to the full knowledge discovery process and is thus more aptly referred to as meta-mining. Second, traditional meta-learning regards learning algorithms as black boxes and essentially correlates properties of their input (data) with the performance of their output (learned model). We propose to tear open the black box and analyse algorithms in terms of their core components, their underlying assumptions, the cost functions and optimization strategies they use, and the models and decision boundaries they generate. Third, to ground meta-mining on a declarative representation of the data mining (dm) process and its components, we built a DM ontology and knowledge base using the Web Ontology Language (owl).

The Data Mining Optimization Ontology (dmop, pronounced dee-mope)) provides a unified conceptual framework for analysing dm tasks, algorithms, models, datasets, workflows and performance metrics, as well as their relationships. The dm knowledge base uses concepts from dmop to describe existing data mining algorithms and their implementations in major dm software packages. Meta-data collected from data mining experiments are also described in terms of concepts from the ontology and linked to algorithm and operator descriptions in the knowledge base; they are then stored in data mining experiment data bases to serve as training and evaluation data for the meta-miner.

These three features together lay the groundwork for what we call deep or semantic meta-mining, i.e., dm process or workflow mining that is driven simultaneously by meta-data and by the collective expertise of data miners embodied in the data mining ontology and knowledge base. In Section 1, we review the state of the art in the fields of meta-learning and data mining ontologies; at the same time, we motivate the need for ontology-based meta-mining and distinguish our approach from related work in these two areas. Section 2 gives a detailed description of dmop, while Section 3 introduces a novel method for ontology-based discovery of generalized patterns from data mining workflows. Section 4 reports on proof-of-concept experiments conducted to gauge the efficacy of dmop-based workflow mining, and Section 5 concludes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aha, D.W.: Lazy learning (editorial). Artificial Intelligence Review 11, 7–10 (1997)

    Article  Google Scholar 

  2. Ali, S., Smith-Miles, K.: A meta-learning approach to automatic kernel selection for support vector machines. Neurocomputing 70(1-3), 173–186 (2006)

    Article  Google Scholar 

  3. Anderson, M.L., Oates, T.: A review of recent research in metareasoning and metalearning. AI Magazine 28(1), 7–16 (2007)

    Google Scholar 

  4. Arimura, H.: Efficient algorithms for mining frequent and closed patterns from semi-structured data. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 2–13. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Bartlett, P.: For valid generalization, the size of the weights is more important than the size of the network. In: Advances in Nueral Information Processing Systems, NIPS-1997 (1997)

    Google Scholar 

  6. Basu, M., Ho, T.K. (eds.): Data Complexity in Pattern Recognition. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  7. Bensusan, H., Giraud-Carrier, C.: Discovering task neighbourhoods through landmark learning performances. In: Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 325–330 (2000)

    Google Scholar 

  8. Bensusan, H., Giraud-Carrier, C., Kennedy, C.: A higher-order approach to meta-learning. In: Proceedings of the ECML 2000 workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, June 2000, pp. 109–117 (2000)

    Google Scholar 

  9. Bernstein, A., Provost, F., Hill, S.: Toward intelligent assistance for a data mining process: An ontology-based approach for cost-sensitive classification. IEEE Transactions on Knowledge and Data Engineering 17(4), 503–518 (2005)

    Article  Google Scholar 

  10. Bishop, C.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  11. Blockeel, H., Vanschoren, J.: Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 6–17. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  12. Brazdil, P., Gama, J., Henery, B.: Characterizing the applicability of classification algorithms using meta-level learning. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 83–102. Springer, Heidelberg (1994)

    Google Scholar 

  13. Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R. (eds.): Metalearning: Applications to Data Mining. Springer, Heidelberg (2009)

    MATH  Google Scholar 

  14. Brezany, P., Janciak, I., Min Tjoa, A.: Ontology-based construction of grid data mining workflows. In: Nigro, H.O., Gonzalez Cisaro, S.E., Xodo, D.H. (eds.) Data Mining with Ontologies: Implementations, Findings and Frameworks, IGI Global (2008)

    Google Scholar 

  15. Bringmann, B.: Matching in frequent tree discovery. In: Proc.4th IEEE International Conference on Data Mining (ICDM 2004), pp. 335–338 (2004)

    Google Scholar 

  16. Cacoveanu, S., Vidrighin, C., Potolea, R.: Evolutional meta-learning framework for automatic classifier selection. In: Proceedings of the IEEE 5th International Conference on Intelligent Computer Communication and Processing (ICCP 2009), pp. 27–30 (2009)

    Google Scholar 

  17. Cannataro, M., Comito, C.: A data mining ontology for grid programming. In: Proc. 1st Int. Workshop on Semantics in Peer-to-Peer and Grid Computing, in conjunction with WWW 2003, pp. 113–134 (2003)

    Google Scholar 

  18. Chapman, P., Clinton, J., Khabaza, T., Reinartz, T., Wirth, R.: The CRISP-DM process model. Technical report, CRISP-DM consortium (1999), http://www.crisp-dm.org

  19. Cherkassky, V.: Model complexity control and statistical learning theory. Natural Computing 1, 109–133 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  20. Diamantini, C., Potena, D., Storti, E.: Supporting users in KDD process design: A semantic similarity matching approach. In: Proc. 3rd Planning to Learn Workshop (held in conjunction with ECAI 2010), Lisbon, pp. 27–34 (2010)

    Google Scholar 

  21. DomingosA, P.: unified bias-variance decomposition for zero-one and squared loss. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence, pp. 564–569 (2000)

    Google Scholar 

  22. Duch, W., Grudzinski, K.: Meta-learning: Searching in the model space. In: Proc. of the Int. Conf. on Neural Information Processing (ICONIP), Shanghai 2001, pp. 235–240 (2001)

    Google Scholar 

  23. Duch, W., Grudziński, K.: Meta-learning via search combined with parameter optimization. In: Advances in Soft Computing, pp. 13–22. Springer, Heidelberg (2002)

    Google Scholar 

  24. Džeroski, S.: Towards a general framework for data mining. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 259–300. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  25. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: An overview. In: Advances in Knowledge Discovery and Data Mining, pp. 1–34. MIT Press, Cambridge (1996)

    Google Scholar 

  26. Frank, A., Asuncion, A.: UCI machine learning repository (2010)

    Google Scholar 

  27. Fürnkranz, J., Petrak, J.: An evaluation of landmarking variants. In: Proceedings of the ECML Workshop on Integrating Aspects of Data Mining, Decision Support and Meta-learning, pp. 57–68 (2001)

    Google Scholar 

  28. Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Computation 4, 1–58 (1992)

    Article  Google Scholar 

  29. Giraud-Carrier, C., Vilalta, R., Brazdil, P.: Introduction to the special issue on meta-learning. Machine Learning 54, 187–193 (2004)

    Article  Google Scholar 

  30. Gordon, D., DesJardins, M.: Evaluation and selection of biases in machine learning. Machine Learning 20, 5–22 (1995)

    Google Scholar 

  31. Grąbczewski, K., Jankowski, N.: Versatile and efficient meta-learning architecture: knowledge representation and management in computational intelligence. In: IEEE Symposium on Computational Intelligence and Data Mining, pp. 51–58 (2007)

    Google Scholar 

  32. Data Mining Group. Predictive Model Markup Language (PMML), http://www.dmg.org/

  33. Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.A. (eds.): Feature Extraction: Foundations and Applications. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  34. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  35. Hall, M.: Correlation-based Feature Selection in Machine Learning. PhD thesis, University of Waikato (1999)

    Google Scholar 

  36. Hilario, M., Kalousis, A.: Fusion of meta-knowledge and meta-data for case-based model selection. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 180–191. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  37. Hilario, M., Kalousis, A., Nguyen, P., Woznica, A.: A data mining ontology for algorithm selection and meta-mining. In: Workshop on Third-Generation Data Mining: Towards Service-Oriented Knowledge Discovery, SoKD 2009 (2009)

    Google Scholar 

  38. Ho, T.K., Basu, M.: Measures of geometrical complexity in classification problems. In: Data Complexity in Pattern Recognition, ch. 1, pp. 3–23. Springer, Heidelberg (2006)

    Google Scholar 

  39. Hotho, A., Maedche, A., Staab, S., Studer, R.: Seal-II - the soft spot between richly structured and unstructured knowledge. Journal of Universal Computer Science 7(7), 566–590 (2001)

    MATH  Google Scholar 

  40. Jankowski, N., Grąbczewski, K.: Building meta-learning algorithms basing on search controlled by machine complexity. In: IEEE World Congress on Computational Intelligence, pp. 3600–3607 (2008)

    Google Scholar 

  41. Kalousis, A.: Algorithm Selection via Meta-Learning. PhD thesis, University of Geneva (2002)

    Google Scholar 

  42. Kalousis, A., Gama, J., Hilario, M.: On data and algorithms: understanding inductive performance. Machine Learning 54, 275–312 (2004)

    Article  MATH  Google Scholar 

  43. Kalousis, A., Hilario, M.: Representational issues in meta-learning. In: Proc. of the 20th International Conference on Machine Learning, Washington, DC, Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  44. Kietz, J.-U., Serban, F., Bernstein, A., Fischer, S.: Data mining workflow templates for intelligent discovery assistance and auto-experimentation. In: Proc. 3rd Workshop on Third-Generation Data Mining: Towards Service-Oriented Knowledge Discovery (SoKD 2010), pp. 1–12 (2010)

    Google Scholar 

  45. Kira, K., Rendell, L.: The feature selection problem: traditional methods and a new algorithm. In: Proc. Nat. Conf. on Artificial Intelligence (AAAI 1992), pp. 129–134 (1992)

    Google Scholar 

  46. Köpf, C., Keller, J.: Meta-analysis: from data characterization for meta-learning to meta-regression. In: PKDD 2000 Workshop on Data Mining, Decision Support, Meta-Learning and ILP (2000)

    Google Scholar 

  47. Leite, R., Brazdil, P.: Predicting a relative performance of classifiers from samples. In: Proc. International Conference on Machine Learning (2005)

    Google Scholar 

  48. Ler, D., Koprinska, I., Chawla, S.: Utilising regression-based landmarkers within a meta-learning framework for algorithm selection. In: Proc. ICML 2005 Workshop on Meta-Learning, pp. 44–51 (2005)

    Google Scholar 

  49. Liu, H., Setiono, R.: A probabilistic approach to feature selection—a filter solution. In: Proc. 13th International Conference on Machine Learning (ICML 1996), Bari, Italy, pp. 319–327 (1996)

    Google Scholar 

  50. Michie, D., Spiegelhalter, D.J., Taylor, C.C. (eds.): Machine learning, neural and statistical classification. Ellis-Horwood (1994)

    Google Scholar 

  51. Mitchell, T.M.: The need for biases in learning generalizations. Technical report, Rutgers University, New Brunswick, NJ (1980)

    Google Scholar 

  52. Morik, K., Scholz, M.: The MiningMart Approach to Knowledge Discovery in Databases. In: Intelligent Technologies for Information Analysis, Springer, Heidelberg (2004)

    Google Scholar 

  53. Panov, P., Dzeroski, S., Soldatova, L.: Ontodm: An ontology of data mining. In: Proceedings of the 2008 IEEE International Conference on Data Mining Workshops, pp. 752–760 (2008)

    Google Scholar 

  54. Pearl, J.: Heuristics: intelligent search strategies for computer problem solving. Addison-Wesley, Reading (1984)

    Google Scholar 

  55. Peng, Y., Flach, P., Brazdil, P., Soares, C.: Decision tree-based data characterization for meta-learning. In: 2nd International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning (2002)

    Google Scholar 

  56. Peng, Y., Flach, P., Soares, C., Brazdil, P.: Improved dataset characterisation for meta-learning. In: Discovery Science, pp. 141–152 (2002)

    Google Scholar 

  57. Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Meta-learning by landmarking various learning algorithms. In: Proc. Seventeenth International Conference on Machine Learning, ICML 2000, pp. 743–750. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  58. Piatetskey-Shapiro, G.: Data mining and knowledge discovery: The third generation. In: Raś, Z.W., Skowron, A. (eds.) ISMIS 1997. LNCS, vol. 1325, Springer, Heidelberg (1997)

    Google Scholar 

  59. Quinlan, J.R.: Improved use of continuous attributes in c4.5. Journal of Artificial Intelligence Research 4, 77–90 (1996)

    MATH  Google Scholar 

  60. Rector, A.: Modularisation of domain ontologies implemented in description logics and related formalisms including OWL. In: Proc. International Conference on Knowledge Capture, K-CAP 2003 (2003)

    Google Scholar 

  61. Rendell, L., Seshu, R., Tcheng, D.: Layered concept-learning and dynamically variable bias management. In: Proc. of the 10th International Joint Conference on Artificial Intelligence, pp. 308–314 (1987)

    Google Scholar 

  62. Rice, J.: The algorithm selection problem. Advances in Computing 15, 65–118 (1976)

    Article  Google Scholar 

  63. Schaffer, C.: A conservation law for generalization performance. In: Proc. of the 11th International Conference on Machine Learning, pp. 259–265 (1994)

    Google Scholar 

  64. Sikonja, M.R., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning 53, 23–69 (2003)

    Article  MATH  Google Scholar 

  65. Skiena, S.: Implementing discrete mathematics: combinatorics and graph theory with Mathematica. Addison-Wesly Longman Publishing Co., Inc., Boston (1991)

    Google Scholar 

  66. Smith-Miles, K.A.: Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Computing Surveys 41(1) (2008)

    Google Scholar 

  67. Soares, C., Brazdil, P.B.: Zoomed ranking: Selection of classification algorithms based on relevant performance information. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 126–135. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  68. Soares, C., Brazdil, P., Kuba, P.: A meta-learning method to select the kernel width in support vector regression. Machine Learning 54(3), 195–209 (2004)

    Article  MATH  Google Scholar 

  69. Souto, M., Prudêncio, R., Soares, R., Araújo, D., Costa, I., Ludermir, T., Schliep, A.: Ranking and selecting clustering algorithms using a meta-learning approach. In: International Joint Conference on Neural Networks (2008)

    Google Scholar 

  70. Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: Proc.5th International Conference on Extending Database Technology, pp. 3–17. Springer, Heidelberg (1996)

    Google Scholar 

  71. Suyama, A., Yamaguchi, T.: Specifying and learning inductive learning systems using ontologies. In: Working Notes from the 1998 AAAI Workshop on the Methodology of Applying Machine Learning: Problem Definition, Task Decomposition and Technique Selection (1998)

    Google Scholar 

  72. Todorovski, L., Sžeroski, S.: Experiments in meta-level learning with ILP. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 98–106. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  73. Tsymbal, A., Puuronen, S., Terziyan, V.Y.: Arbiter meta-learning with dynamic selection of classifiers and its experimental investigation. In: Advances in Databases and Information Systems, pp. 205–217 (1999)

    Google Scholar 

  74. Utgoff, P.E.: Machine learning of inductive bias. Kluwer Academic Publishers, Dordrecht (1986)

    Google Scholar 

  75. Utgoff, P.E.: Shift of bias for inductive learning. In: Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning. An Artificial Intelligence Approach, ch. 5, vol. 2, pp. 107–148. Morgan Kaufmann, San Francisco (1986)

    Google Scholar 

  76. Vanschoren, J., Soldatova, L.: Exposé: An ontology for data mining experiments. In: International Workshop on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery (SoKD 2010) (September 2010)

    Google Scholar 

  77. Vilalta, R., Drissi, Y.: A perspective view and survey of meta-learning. Artificial Intelligence Review 18, 77–95 (2002)

    Article  Google Scholar 

  78. Vilalta, R., Giraud-Carrier, C., Brazdil, P., Soares, C.: Using meta-learning to support data mining. International Journal of Computer Science and Applications 1(1), 31–45 (2004)

    Google Scholar 

  79. Wolpert, D.: The lack of a priori distinctions between learning algorithms. Neural Computation 8(7), 1381–1390 (1996)

    Google Scholar 

  80. Yang, Q., Wu, X.: Ten challenging problems in data mining research. International Journal of Inform 5, 594–604 (2006)

    Google Scholar 

  81. Zaki, M.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering 17 (2005)

    Google Scholar 

  82. Zakova, M., Kremen, P., Zelezny, F., Lavrac, N.: Automating knowledge discovery workflow composition through ontology-based planning. IEEE Transactions on Automation Science and Engineering (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hilario, M., Nguyen, P., Do, H., Woznica, A., Kalousis, A. (2011). Ontology-Based Meta-Mining of Knowledge Discovery Workflows. In: Jankowski, N., Duch, W., Gra̧bczewski, K. (eds) Meta-Learning in Computational Intelligence. Studies in Computational Intelligence, vol 358. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20980-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20980-2_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20979-6

  • Online ISBN: 978-3-642-20980-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics