Skip to main content

Abstract

Next to running machine learning algorithms based on inductive queries, much can be learned by immediately querying the combined results of many prior studies. Indeed, all around the globe, thousands of machine learning experiments are being executed on a daily basis, generating a constant stream of empirical information on machine learning techniques. While the information contained in these experiments might have many uses beyond their original intent, results are typically described very concisely in papers and discarded afterwards. If we properly store and organize these results in central databases, they can be immediately reused for further analysis, thus boosting future research. In this chapter, we propose the use of experiment databases: databases designed to collect all the necessary details of these experiments, and to intelligently organize them in online repositories to enable fast and thorough analysis of a myriad of collected results. They constitute an additional, queriable source of empirical meta-data based on principled descriptions of algorithm executions, without reimplementing the algorithms in an inductive database. As such, they engender a very dynamic, collaborative approach to experimentation, in which experiments can be freely shared, linked together, and immediately reused by researchers all over the world. They can be set up for personal use, to share results within a lab or to create open, community-wide repositories. Here, we provide a high-level overview of their design, and use an existing experiment database to answer various interesting research questions about machine learning algorithms and to verify a number of recent studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aha, D.: Generalizing from case studies: A case study. Proceedings of the Ninth International Conference on Machine Learning pp. 1–10 (1992)

    Google Scholar 

  2. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. nature genetics 25, 25–29 (2000)

    Article  Google Scholar 

  3. Asuncion, A., Newman, D.: UCI machine learning repository. University of California, School of Information and Computer Science (2007)

    Google Scholar 

  4. Ball, C., Brazma, A., Causton, H., Chervitz, S.: Submission of microarray data to public repositories. PLoS Biology 2(9), e317 (2004)

    Article  Google Scholar 

  5. Blockeel, H.: Experiment databases: A novel methodology for experimental research. Lecture Notes in Computer Science 3933, 72–85 (2006)

    Article  Google Scholar 

  6. Blockeel, H., Vanschoren, J.: Experiment databases: Towards an improved experimental methodology in machine learning. Lecture Notes in Computer Science 4702, 6–17 (2007)

    Article  Google Scholar 

  7. Brain, D., Webb, G.: The need for low bias algorithms in classification learning from large data sets. PKDD ’02: Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery pp. 62—73 (2002)

    Google Scholar 

  8. Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R.: Metalearning: Applications to data mining. Springer (2009)

    Google Scholar 

  9. Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C.A., Causton, H.C., Gaasterland, T., Glenisson, P., Holstege, F.C., Kim, I.F., Markowitz, V., Matese, J.C., Parkinson, H., Robinson, A., Sarkans, U., Schulze-Kremer, S., Stewart, J., Taylor, R., Vingron, J.V.M.: Minimum information about a microarray experiment. nature genetics 29, 365 – 371 (2001)

    Article  Google Scholar 

  10. Brown, D., Vogt, R., Beck, B., Pruet, J.: High energy nuclear database: a testbed for nuclear data information technology. International Conference on Nuclear Data for Science and Technology p. Article 250 (2007)

    Google Scholar 

  11. Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. Proceedings of the 23rd International Conference on Machine Learning (ICML’06) pp. 161–168 (2006)

    Google Scholar 

  12. Chandrasekaran, B., Josephson, J.: What are ontologies, and why do we need them? IEEE Intelligent systems 14(1), 20–26 (1999)

    Article  Google Scholar 

  13. Derriere, S., Preite-Martinez, A., Richard, A.: UCDs and ontologies. ASP Conference Series 351, 449 (2006)

    Google Scholar 

  14. Hall, M.: Correlation-based feature selection for machine learning. Ph.D dissertation Hamilton, NZ: Waikato University, Department of Computer Science (1998)

    Google Scholar 

  15. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The weka data mining software: An update. SIGKDD Explorations 11(1), 10–18 (2009)

    Article  Google Scholar 

  16. Hilario, M., Kalousis, A., Nguyen, P., Woznica, A.: A data mining ontology for algorithm selection and meta-mining. Proceedings of the ECML/PKDD09 Workshop on 3rd generation Data Mining (SoKD-09) pp. 76–87 (2009)

    Google Scholar 

  17. Hirsh, H.: Data mining research: Current status and future opportunities. Statistical Analysis and Data Mining 1(2), 104–107 (2008)

    Article  MathSciNet  Google Scholar 

  18. Holte, R.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–91 (1993)

    Article  MATH  Google Scholar 

  19. Hoste, V., Daelemans, W.: Comparing learning approaches to coreference resolution. there is more to it than bias. Proceedings of theWorkshop on Meta-Learning (ICML-2005) pp. 20–27 (2005)

    Google Scholar 

  20. Kalousis, A., Hilario, M.: Building algorithm profiles for prior model selection in knowledge discovery systems. Engineering Intelligent Systems 8(2) (2000)

    Google Scholar 

  21. Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: A survey and empirical demonstration. Data Mining and Knowledge Discovery 7(4), 349–371 (2003)

    Article  MathSciNet  Google Scholar 

  22. Kietz, J., Serban, F., Bernstein, A., Fischer, S.: Towards cooperative planning of data mining workflows. Proceedings of the Third Generation Data MiningWorkshop at the 2009 European Conference on Machine Learning (ECML 2009) pp. 1–12 (2009)

    Google Scholar 

  23. King, R., Rowland, J., Oliver, S., Young, M., Aubrey, W., Byrne, E., Liakata, M., Markham, M., Pir, P., Soldatova, L., Sparkes, A., Whelan, K., Clare, A.: The automation of science. Science 324(3)(5923), 85–89 (2009)

    Google Scholar 

  24. Manolescu, I., Afanasiev, L., Arion, A., Dittrich, J., Manegold, S., Polyzotis, N., Schnaitter, K., Senellart, P., Zoupanos, S.: The repeatability experiment of SIGMOD 2008. ACM SIGMOD Record 37(1) (2008)

    Google Scholar 

  25. Michie, D., Spiegelhalter, D., Taylor, C.: Machine learning, neural and statistical classification. Ellis Horwood (1994)

    Google Scholar 

  26. Nielsen, M.: The future of science: Building a better collective memory. APS Physics 17(10) (2008)

    Google Scholar 

  27. Ochsenbein, F., Williams, R., Davenhall, C., Durand, D., Fernique, P., Hanisch, R., Giaretta, D., McGlynn, T., Szalay, A., Wicenec, A.: Votable: tabular data for the virtual observatory. Toward an International Virtual Observatory. Springer pp. 118–123 (2004)

    Google Scholar 

  28. Panov, P., Soldatova, L., Džeroski, S.: Towards an ontology of data mining investigations. Discovery Science (DS09). Lecture Notes in Artificial Intelligence 5808, 257–271 (2009)

    Google Scholar 

  29. Pedersen, T.: Empiricism is not a matter of faith. Computational Linguistics 34, 465–470 (2008)

    Article  Google Scholar 

  30. Perlich, C., Provost, F., Simonoff, J.: Tree induction vs. logistic regression: A learning-curve analysis. The Journal of Machine Learning Research 4, 211–255 (2003)

    Article  MathSciNet  Google Scholar 

  31. Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Meta-learning by landmarking various learning algorithms. Proceedings of the Seventeenth International Conference on Machine Learning pp. 743–750 (2000)

    Google Scholar 

  32. Schaaff, A.: Data in astronomy: From the pipeline to the virtual observatory. Lecture Notes in Computer Science 4832, 52–62 (2007)

    Article  Google Scholar 

  33. Soldatova, L., King, R.: An ontology of scientific experiments. Journal of the Royal Society Interface 3(11), 795–803 (2006)

    Article  Google Scholar 

  34. Sonnenburg, S., Braun, M., Ong, C., Bengio, S., Bottou, L., Holmes, G., LeCun, Y., Muller, K., Pereira, F., Rasmussen, C.E., Ratsch, G., Scholkopf, B., Smola, A., Vincent, P., Weston, J., Williamson, R.: The need for open source software in machine learning. Journal of Machine Learning Research 8, 2443–2466 (2007)

    Google Scholar 

  35. Stoeckert, C., Causton, H., Ball, C.: Microarray databases: standards and ontologies. nature genetics 32, 469–473 (2002)

    Article  Google Scholar 

  36. Szalay, A., Gray, J.: The world-wide telescope. Science 293, 2037–2040 (2001)

    Article  Google Scholar 

  37. Van Someren, M.: Model class selection and construction: Beyond the procrustean approach to machine learning applications. Lecture Notes in Computer Science 2049, 196–217 (2001)

    Article  Google Scholar 

  38. Vanschoren, J., Van Assche, A., Vens, C., Blockeel, H.: Meta-learning from experiment databases: An illustration. Proceedings of the 16th Annual Machine Learning Conference of Belgium and The Netherlands (Benelearn07) pp. 120–127 (2007)

    Google Scholar 

  39. Vanschoren, J., Blockeel, H.: Investigating classifier learning behavior with experiment databases. Data Analysis, Machine Learning and Applications: 31st Annual Conference of the Gesellschaft f¨ur Klassifikation pp. 421–428 (2008)

    Google Scholar 

  40. Vanschoren, J., Blockeel, H.: A community-based platform for machine learning experimentation. Lecture Notes in Artificial Intelligence 5782, 750–754 (2009)

    Google Scholar 

  41. Vanschoren, J., Blockeel, H., Pfahringer, B.: Experiment databases: Creating a new platform for meta-learning research. Proceedings of the ICML/UAI/COLT Joint Planning to Learn Workshop (PlanLearn08) pp. 10–15 (2008)

    Google Scholar 

  42. Vanschoren, J., Blockeel, H., Pfahringer, B., Holmes, G.: Organizing the world’s machine learning information. Communications in Computer and Information Science 17, 693–708 (2008)

    Article  Google Scholar 

  43. Vanschoren, J., Pfahringer, B., Holmes, G.: Learning from the past with experiment databases. Lecture Notes in Artificial Intelligence 5351, 485–492 (2008)

    Google Scholar 

  44. Vizcaino, J.A., Cote, R., Reisinger, F., Foster, J.M., Mueller, M., Rameseder, J., Hermjakob, H., Martens, L.: A guide to the proteomics identifications database proteomics data repository. Proteomics 9(18), 4276–4283 (2009)

    Article  Google Scholar 

  45. Yasuda, N., Mizumoto, Y., Ohishi, M., amd T Budavári, W.O., Haridas, V., Li, N., Malik, T., Szalay, A., Hill, M., Linde, T., Mann, B., Page, C.: Astronomical data query language: Simple query protocol for the virtual observatory. ASP Conference Proceedings 314, 293 (2004)

    Google Scholar 

  46. Žáková, M., Kremen, P., Železný, F., Lavrač, N.: Planning to learn with a knowledge discovery ontology. Second planning to learn workshop at the joint ICML/COLT/UAI Conference pp. 29–34 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joaquin Vanschoren .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Vanschoren, J., Blockeel, H. (2010). Experiment Databases. In: Džeroski, S., Goethals, B., Panov, P. (eds) Inductive Databases and Constraint-Based Data Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7738-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-7738-0_14

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-7737-3

  • Online ISBN: 978-1-4419-7738-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics