Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Blockeel, Hendrik; Vanschoren, Joaquin

doi:10.1007/978-3-540-74976-9_5

Hendrik Blockeel¹ &
Joaquin Vanschoren¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4702))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

4654 Accesses
26 Citations

Abstract

Machine learning research often has a large experimental component. While the experimental methodology employed in machine learning has improved much over the years, repeatability of experiments and generalizability of results remain a concern. In this paper we propose a methodology based on the use of experiment databases. Experiment databases facilitate large-scale experimentation, guarantee repeatability of experiments, improve reusability of experiments, help explicitating the conditions under which certain results are valid, and support quick hypothesis testing as well as hypothesis generation. We show that they have the potential to significantly increase the ease with which new results in machine learning can be obtained and correctly interpreted.

Download to read the full chapter text

Chapter PDF

Selecting Data for Experiments: Past, Present and Future

Metadata Repositories

Data Sets

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Blockeel, H.: Experiment databases: A novel methodology for experimental research. In: Bonchi, F., Boulicaut, J.-F. (eds.) Knowledge Discovery in Inductive Databases. LNCS, vol. 3933, pp. 72–85. Springer, Heidelberg (2006)
Chapter Google Scholar
Cohn, D.A.: Neural Network Exploration Using Optimal Experiment Design. Advances in Neural Information Processing Systems 6, 679–686 (1994)
Google Scholar
Demšar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7, 1–30 (2006)
Google Scholar
Fromont, E., Blockeel, H., Struyf, J.: Integrating Decision Tree Learning into Inductive Databases. In: Bonchi, F., Boulicaut, J.-F. (eds.) Knowledge Discovery in Inductive Databases. LNCS, vol. 3933, Springer, Heidelberg (2006)
Google Scholar
Holte, R.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–91 (1993)
Article MATH Google Scholar
Hoste, V., Daelemans, W.: Comparing Learning Approaches to Coreference Resolution. There is More to it Than ’Bias’. In: ICML- 2005. Proceedings of the Workshop on Meta-Learning, pp. 20–27 (2005)
Google Scholar
Kalousis, A., Hilario, M.: Building Algorithm Profiles for prior Model Selection in Knowledge Discovery Systems. Engineering Intelligent Systems 8(2) (2000)
Google Scholar
Keogh, E., Kasetty, S.: On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 102–111. ACM Press, New York (2002)
Google Scholar
Peng, Y., et al.: Improved Dataset Characterisation for Meta-Learning. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 141–152. Springer, Heidelberg (2002)
Chapter Google Scholar
Perlich, C., Provost, F., Siminoff, J.: Tree induction vs. logistic regression: A learning curve analysis. Journal of Machine Learning Research 4, 211–255 (2003)
Article Google Scholar
METAL-consortium: METAL Data Mining Advisor, http://www.metal-kdd.org
Michie, D., Spiegelhalter, D.J., Taylor, C.C.: Machine Learning, Neural and Statistical Classification. Ellis Horwood, New York (1994)
MATH Google Scholar
Van Someren, M.: Model Class Selection and Construction: Beyond the Procrustean Approach to Machine Learning Applications. In: Paliouras, G., Karkaletsis, V., Spyropoulos, C.D. (eds.) Machine Learning and Its Applications. LNCS (LNAI), vol. 2049, pp. 196–217. Springer, Heidelberg (2001)
Chapter Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Wolpert, D., Macready, W.: No free lunch theorems for search. SFI-TR-95-02-010 Santa Fe Institute (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Dept., K.U.Leuven, Celestijnenlaan 200A, 3001 Leuven, Belgium
Hendrik Blockeel & Joaquin Vanschoren

Authors

Hendrik Blockeel
View author publications
You can also search for this author in PubMed Google Scholar
Joaquin Vanschoren
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Joost N. Kok Jacek Koronacki Ramon Lopez de Mantaras Stan Matwin Dunja Mladenič Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Blockeel, H., Vanschoren, J. (2007). Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds) Knowledge Discovery in Databases: PKDD 2007. PKDD 2007. Lecture Notes in Computer Science(), vol 4702. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74976-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-74976-9_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74975-2
Online ISBN: 978-3-540-74976-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Abstract

Chapter PDF

Similar content being viewed by others

Selecting Data for Experiments: Past, Present and Future

Metadata Repositories

Data Sets

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Abstract

Chapter PDF

Similar content being viewed by others

Selecting Data for Experiments: Past, Present and Future

Metadata Repositories

Data Sets

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation