Advertisement

Machine Learning

, 73:3 | Cite as

Structured machine learning: the next ten years

  • Thomas G. Dietterich
  • Pedro Domingos
  • Lise Getoor
  • Stephen Muggleton
  • Prasad Tadepalli
Article

Abstract

The field of inductive logic programming (ILP) has made steady progress, since the first ILP workshop in 1991, based on a balance of developments in theory, implementations and applications. More recently there has been an increased emphasis on Probabilistic ILP and the related fields of Statistical Relational Learning (SRL) and Structured Prediction. The goal of the current paper is to consider these emerging trends and chart out the strategic directions and open problems for the broader area of structured machine learning for the next 10 years.

Keywords

Inductive logic programming Relational learning Statistical relational learning Structured machine learning 

References

  1. Amini, A., Muggleton, S. H. H. L., & Sternberg, M. (2007). A novel logic-based approach for quantitative toxicology prediction. Journal of Chemical Informatics Modelling, 47(3), 998–1006. doi: 0.1021/ci600223dS1549-9596(60)00223-4. CrossRefGoogle Scholar
  2. Anzai, Y., & Simon, H. A. (1979). The theory of learning by doing. Psychological Review, 86, 124–140. CrossRefGoogle Scholar
  3. Bakir G. H., Hofmann T., Schölkopf B., Smola A. J., Taskar B., & Vishwanathan S. V. N. (Eds.) (2007). Predicting structured data. New York: MIT Press. Google Scholar
  4. Bertsekas, D. (1999). Nonlinear programming. Belmont: Athena Scientific. MATHGoogle Scholar
  5. Bhattacharya, I., & Getoor, L. (2004). Iterative record linkage for cleaning and integration. In The ACM SIGMOD workshop on research issues on data mining and knowledge discovery (DMKD), Paris, France. Google Scholar
  6. Bryant, C., Muggleton, S., Oliver, S., Kell, D., Reiser, P., & King, R. (2001). Combining inductive logic programming, active learning and robotics to discover the function of genes. Electronic Transactions in Artificial Intelligence, 5-B1(012), 1–36. Google Scholar
  7. Chakrabarti, S., Dom, B., & Indyk, P. (1998). Enhanced hypertext categorization using hyperlinks. In International conference on management of data (pp. 307–318). Google Scholar
  8. Collins, M. (2002). Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In Proceedings of the conference on empirical methods in natural language processing (EMNLP 2002) (pp. 1–8), Morristown, NJ, USA. Association for Computational Linguistics. Google Scholar
  9. Collins, M., & Roark, B. (2004). Incremental parsing with the perceptron algorithm. In Proceedings of the association for computational linguistics (ACL-2004) (pp. 111–118). Association for Computational Linguistics. Google Scholar
  10. Colton, S., & Muggleton, S. (2006). Mathematical applications of inductive logic programming. Machine Learning, 64, 25–64. doi: 10.1007/s10994-006-8259-x. MATHCrossRefGoogle Scholar
  11. Costa, V., Page, D., Qazi, M., & Cussens, J. (2003). CLP(BN): constraint logic programming for probabilistic knowledge. In Proceedings of the 19th annual conference on uncertainty in artificial intelligence (UAI-03) (pp. 517–552), San Francisco. San Mateo: Morgan Kaufmann. Google Scholar
  12. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research, 7, 551–585. MathSciNetGoogle Scholar
  13. Culotta, A., Wick, M., Hall, R., & McCallum, A. (2007). First-order probabilistic models for coreference resolution. In HLT/NAACL (pp. 81–88). Google Scholar
  14. Cussens, J. (1997). Part-of-speech tagging using Progol. In LNAI : Vol. 1297. Proc. of the 7th international workshop on inductive logic programming (ILP-97) (pp. 93–108). Berlin: Springer. Google Scholar
  15. Cussens, J. (2001). Parameter estimation in stochastic logic programs. Machine Learning, 44(3), 245–271. MATHCrossRefGoogle Scholar
  16. Daumé III, H., & Marcu, D. (2005). Learning as search optimization: Approximate large margin methods for structured prediction. In Proceedings of the 22nd international conference on machine learning (ICML-2005) (pp. 169–176). Madison: Omnipress. CrossRefGoogle Scholar
  17. Daumé III, H., Langford, J., & Marcu, D. (2007). Search-based structured prediction (Technical Report). University of Utah, Department of Computer Science. Google Scholar
  18. De Raedt, L., & Kersting, K. (2004). Probabilistic inductive logic programming. In S. Ben-David, J. Case, & A. Maruoka (Eds.), Lecture notes in computer science : Vol. 3244. Proceedings of the 15th international conference on algorithmic learning theory (pp. 19–36). Berlin: Springer. Google Scholar
  19. De Raedt L., Frasconi P., Kersting K., & Muggleton S. H. (Eds.) (2008). Lecture notes in computer science. Probabilistic inductive logic programming. Berlin: Springer. Google Scholar
  20. DeJong, G., & Mooney, R. (1986). Explanation-based learning: An alternative view. Machine Learning, 1, 145–176. Google Scholar
  21. Dietterich, T. G., & Michalski, R. S. (1985). Discovering patterns in sequences of events. Artificial Intelligence, 25(2), 187–232. CrossRefGoogle Scholar
  22. DiMaio, F., & Shavlik, J. (2004). Learning an approximation to inductive logic programming clause evaluation. In R. Camacho, R. King, & A. Srinivasan (Eds.), Lecture notes in artificial intelligence : Vol. 3194. Proceedings of the 14th international conference on inductive logic programming (pp. 80–96). Berlin: Springer. Google Scholar
  23. Domingos, P., Kok, S., Poon, H., Richardson, M., & Singla, P. (2006). Unifying logical and statistical AI. In Proceedings of the 21’st national conference on artificial intelligence (AAAI 2006) (pp. 2–7). Menlo Park: AAAI Press. Google Scholar
  24. Duchi, J., Tarlow, D., Elidan, G., & Koller, D. (2007). Using combinatorial optimization within max-product belief propagation. In B. Schölkopf, J. Platt, & T. Hoffman (Eds.), Advances in neural information processing systems (Vol. 19, pp. 369–376). Cambridge: MIT Press. Google Scholar
  25. Dz̆eroski, S., De Raedt, L., & Driessens, K. (2001). Relational reinforcement learning. Machine Learning, 43, 7–52. CrossRefGoogle Scholar
  26. Evans, T. G. (1968). A program for the solution of a class of geometric-analogy intelligence-test questions. In M. Minsky (Ed.), Semantic information processing. Boston: MIT Press. Google Scholar
  27. Fern, A., & Givan, R. (2006). Sequential inference with reliable observations: Learning to construct force-dynamic models. Artificial Intelligence, 170(14–15), 1081–1122. MATHCrossRefMathSciNetGoogle Scholar
  28. Fern, A., Yoon, S., & Givan, R. (2006). Approximate policy iteration with a policy language bias: Solving relational Markov decision processes. Journal of Artificial Intelligence Research, 25, 75–118. MathSciNetMATHGoogle Scholar
  29. Fikes, R., Hart, P., & Nilsson, N. (1972). Learning and executing generalized robot plans. Artificial Intelligence, 3, 251–288. CrossRefGoogle Scholar
  30. Finn, P., Muggleton, S., Page, D., & Srinivasan, A. (1998). Pharmacophore discovery using the Inductive Logic Programming system Progol. Machine Learning, 30, 241–271. CrossRefGoogle Scholar
  31. Friedman, N., Getoor, L., Koller, D., & Pfeffer, A. (1999). Learning probabilistic relational models. In Proceedings of the international joint conference on artificial intelligence (pp. 1300–1307), Sweden, Stockholm. San Mateo: Morgan Kaufman. Google Scholar
  32. Gärtner, T. (2003). A survey of kernels for structured data. SIGKDD Explorations, 5(1), 49–58. CrossRefGoogle Scholar
  33. Getoor L. & Taskar B. (Eds.) (2007). Introduction to statistical relational learning. New York: MIT Press. MATHGoogle Scholar
  34. Getoor, L., Friedman, N., Koller, D., & Pfeffer, A. (2001a). Learning probabilistic relational models. In S. Džeroski & N. Lavrač (Eds.), Relational data mining (pp. 307–335). Dordrecht: Kluwer. Google Scholar
  35. Getoor, L., Segal, E., Taskar, B., & Koller, D. (2001b). Probabilistic models of text and link structure for hypertext classification. In IJCAI workshop on text learning: beyond supervision. Google Scholar
  36. Getoor, L., Friedman, N., Koller, D., & Taskar, B. (2002). Learning probabilistic models of link structure. Journal of Machine Learning Research, 3, 679–707. CrossRefMathSciNetGoogle Scholar
  37. Heckerman, D., Meek, C., & Koller, D. (2004). Probabilistic models for relational data (Technical Report MSR-TR-04-30). Microsoft Research. Google Scholar
  38. Hinton, G. E., Osindero, S., & Teh, Y.-W. (1993). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554. CrossRefMathSciNetGoogle Scholar
  39. Jaeger, M. (1997). Relational Bayesian networks. In M. Kaufmann (Ed.), Proceedings of the 13’th annual conference on uncertainty in artificial intelligence (pp. 266–273). Google Scholar
  40. Kersting, K., Raedt, L. D., & Kramer, S. (2000). Interpreting Bayesian logic programs. In Proceedings of the AAAI-2000 workshop on learning statistical models from relational data (pp. 29–35), Banff, Alberta, Canada. Menlo Park: AAAI Press. Google Scholar
  41. Kersting, K., Van Otterlo, M., & De Raedt, L. (2004). Bellman goes relational. In Proceedings of the Twenty-First International Conference on Machine Learning (pp. 59–67), Banff, Alberta, Canada. Menlo Park: AAAI Press. CrossRefGoogle Scholar
  42. King, R., Whelan, K., Jones, F., Reiser, P., Bryant, C., Muggleton, S., Kell, D., & Oliver, S. (2004). Functional genomic hypothesis generation and experimentation by a robot scientist. Nature, 427, 247–252. CrossRefGoogle Scholar
  43. Kok, S., & Domingos, P. (2005). Learning the structure of Markov logic networks. In L. De Raedt & S. Wrobel (Eds.), Proceedings of the 22’nd annual international conference on machine learning (ICML-2005) (pp. 441–448). Madison: Omnipress. CrossRefGoogle Scholar
  44. Kok, S., & Domingos, P. (2007). Statistical predicate invention. In Z. Ghahramani (Ed.), Proceedings of the 24’th annual international conference on machine learning (ICML-2007) (pp. 433–440). Madison: Omnipress. Google Scholar
  45. Koller, D., & Pfeffer, A. (1998). Probabilistic frame-based systems. In Proceedings of the 14’th annual conference on uncertainty in artificial intelligence (pp. 580–587). Google Scholar
  46. Kubica, J., Moore, A., Schneider, J., & Yang, Y. (2002). Stochastic link and group detection. In Proceedings of the 18’th national conference on artificial intelligence (pp. 798–804). Menlo Park: AAAI Press. Google Scholar
  47. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18’th international conference on machine learning (ICML-2001) (pp. 282–289). Google Scholar
  48. Lavrač, N., & Džeroski, S. (1994). Inductive logic programming: techniques and applications. Chichester: Ellis-Horwood. MATHGoogle Scholar
  49. Leathwick, J., Rowe, D., Richardson, J., Elith, J., & Hastie, T. (2005). Using multivariate adaptive regression splines to predict the distributions of New Zealand’s freshwater diadromous fish. Freshwater Biology, 50, 2034–2052. CrossRefGoogle Scholar
  50. Liang, P., Bouchard-Côté, A., Klein, D., & Taskar, B. (2006). An end-to-end discriminative approach to machine translation. In Proceedings of the 21’st international conference on computational linguistics (COLING/ACL) (pp. 761–768). Google Scholar
  51. Liben-Nowell, D., & Kleinberg, J. (2003). The link prediction problem for social networks. In International conference on information and knowledge management (CIKM) (pp. 556–559). Google Scholar
  52. Lowd, D., & Domingos, P. (2005). Naive Bayes models for probability estimation. In L. De Raedt & S. Wrobel (Eds.), Proceedings of the 22’nd annual international conference on machine learning (ICML-2005). New York: Assoc. Comput. Mach. Google Scholar
  53. Lowd, D., & Domingos, P. (2007). Recursive random fields. In Proceedings of the international joint conference on artificial intelligence (pp. 950–955). IJCAI. Google Scholar
  54. Lu, Q., & Getoor, L. (2003). Link based classification. In Proceedings of the 20’th international conference on machine learning. Google Scholar
  55. Macskassy, S., & Provost, F. (2007). Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning, 8, 935–983. Google Scholar
  56. Mark, W., & Perrault, R. (2007). CALO: a cognitive assistant that learns and organizes (Technical Report). SRI International. Google Scholar
  57. Mihalkova, L., Huynh, T., & Mooney, R. (2007). Mapping and revising Markov logic networks for transfer learning. In Proceedings of the 22’nd national conference on artificial intelligence (pp. 608–614). Google Scholar
  58. Milch, B., & Russell, S. (2006). First-order probabilistic languages: into the unknown. In S. M. R. Otero & A. Tamaddoni-Nezhad (Eds.), Lecture notes in artificial intelligence : Vol. 4455. Proceedings of the 16th international conference on inductive logic programming (pp. 10–24). Berlin: Springer. Google Scholar
  59. Milch, B., Marthi, B., & Russell, S. (2004). BLOG: Relational modeling with unknown objects. In ICML 2004 workshop on statistical relational learning and its connections to other fields. Google Scholar
  60. Mitchell, T. M., Keller, R. M., & Kedar-Cabelli, S. T. (1986). Explanation-based generalization: A unifying view. Machine Learning, 1(1), 47–80. Google Scholar
  61. Muggleton, S. (1995). Inverse entailment and Progol. New Generation Computing, 13, 245–286. CrossRefGoogle Scholar
  62. Muggleton, S. (1996). Stochastic logic programs. In L. de Raedt (Ed.), Advances in inductive logic programming (pp. 254–264). Amsterdam: IOS Press. Google Scholar
  63. Muggleton, S. (2005). Machine learning for systems biology. In LNAI : Vol. 3625. Proceedings of the 15th international conference on inductive logic programming (pp. 416–423). Berlin: Springer. Google Scholar
  64. Muggleton, S. (2006). Exceeding human limits. Nature, 440(7083), 409–410. CrossRefGoogle Scholar
  65. Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods. Journal of Logic Programming, 19, 629–679. CrossRefMathSciNetGoogle Scholar
  66. Muggleton, S., & Feng, C. (1990). Efficient induction of logic programs. In Proceedings of the first conference on algorithmic learning theory (pp. 368–381). Berlin: Springer. Google Scholar
  67. Neville, J., & Jensen, D. (2000). Iterative classification in relational data. In AAAI workshop on statistical relational learning. Google Scholar
  68. Neville, J., & Jensen, D. (2003). Collective classification with relational dependency networks. In Proceedings of the 2’nd multi-relational data mining workshop. Google Scholar
  69. Neville, J., & Jensen, D. (2007). Relational dependency networks. Journal of Machine Learning Research, 8, 653–692. Google Scholar
  70. Nocedal, J., & Wright, S. J. (1999). Numerical optimization. New York: Springer. MATHGoogle Scholar
  71. Paes, A., Revoredo, K., Zaverucha, G., & Costa, V. S. (2005). Probabilistic first-order theory revision from examples. In S. Kramer & B. Pfahringer (Eds.), Lecture notes in artificial intelligence : Vol. 3625. Proceedings of the 15’th international conference on inductive logic programming (pp. 295–311). Berlin: Springer. Google Scholar
  72. Parker, C., Fern, A., & Tadepalli, P. (2006). Gradient boosting for sequence alignment. In Proceedings of the 21st national conference on artificial intelligence (AAAI-2006), Boston. AAAI Press: Menlo Park. Google Scholar
  73. Parker, C., Fern, A., & Tadepalli, P. (2007). Learning for efficient retrieval of structured data with noisy queries. In Z. Ghahramani (Ed.), Proceedings of the 24th International Conference on Machine Learning (ICML-2007) (pp. 729–736). Oregon. Omnipress, Madison: Corvalis. Google Scholar
  74. Pasula, H., Marthi, B., Milch, B., Russell, S., & Shpitser, I. (2002). Identity uncertainty and citation matching. Advances in Neural Information Processing Systems (NIPS), 15, 1401–1408. Google Scholar
  75. Pfeffer, A. (2001). IBAL: A probabilistic rational programming language. In Proceedings of the international joint conference on artificial intelligence (pp. 733–740). Google Scholar
  76. Plotkin, G. (1969). A note on inductive generalisation. In B. Meltzer & D. Michie (Eds.), Machine intelligence (Vol. 5, pp. 153–163). Edinburgh: Edinburgh University Press. Google Scholar
  77. Poole, D. (1993). Probabilistic horn abduction and Bayesian networks. Artificial Intelligence, 64(1), 81–129. MATHCrossRefGoogle Scholar
  78. Puech, A., & Muggleton, S. (2003). A comparison of stochastic logic programs and Bayesian logic programs. In IJCAI workshop on learning statistical models from relational data. IJCAI. Google Scholar
  79. Quinlan, J. (1990). Learning logical definitions from relations. Machine Learning, 5, 239–266. Google Scholar
  80. Reid, M. (2004). Improving rule evaluation using multi-task learning. In R. Camacho, R. King, & A. Srinivasan (Eds.), Lecture notes in artificial intelligence : Vol. 3194. Proceedings of the 14th international conference on inductive logic programming (pp. 252–269). Berlin: Springer. Google Scholar
  81. Richardson, M., & Domingos, P. (2006). Markov logic networks. Machine Learning, 62(1–2), 107–136. CrossRefGoogle Scholar
  82. Rosenfeld, A., Hummel, R., & Zucker, S. (1976). Scene labeling by relaxation operations. IEEE Transactions on Systems, Man and Cybernetics, SMC-6, 420–433. CrossRefMathSciNetGoogle Scholar
  83. Sanner, S., & Boutilier, C. (2006). Practical linear value-approximation techniques for first-order MDPs. In Proceedings of the 22’nd annual conference on uncertainty in artificial intelligence. Google Scholar
  84. Sato, T. (2005). Generative modeling with failure in PRISM. International joint conference on artificial intelligence (pp. 847–852). San Mateo: Morgan Kaufmann. Google Scholar
  85. Sato, T., & Kameya, Y. (1997). PRISM: a symbolic-statistical modeling language. In Proceedings of the 15’th international joint conference on artificial intelligence (pp. 1330–1335). Google Scholar
  86. Shapiro, E. (1983). Algorithmic program debugging. Cambridge: MIT Press. Google Scholar
  87. Sutton, C., & McCallum, A. (2007). Piecewise pseudolikelihood for efficient training of conditional random fields. In Z. Ghahramani (Ed.), Proceedings of the 24’th international conference on machine learning (ICML-2007) (pp. 863–870). Omnipress. Google Scholar
  88. Tadepalli, P., Givan, B., & Driessens, K. (2004). Relational reinforcement learning: An overview. In ICML workshop on relational reinforcement learning, Banff, Canada. Google Scholar
  89. Tamaddoni-Nezhad, A., Chaleil, R., Kakas, A., & Muggleton, S. (2006). Application of abductive ILP to learning metabolic network inhibition from temporal data. Machine Learning, 64, 209–230. doi: 10.1007/s10994-006-8988-x. MATHCrossRefGoogle Scholar
  90. Tamaddoni-Nezhad, A., Chaleil, R., Kakas, A., Sternberg, M., Nicholson, J., & Muggleton, S. (2007). Modeling the effects of toxins in metabolic networks. IEEE Engineering in Medicine and Biology, 26, 37–46. doi: 10.1109/MEMB.2007.335590. CrossRefGoogle Scholar
  91. Taskar, B., Segal, E., & Koller, D. (2001). Probabilistic classification and clustering in relational data. In Proceedings of the international joint conference on artificial intelligence (pp. 870–878). Google Scholar
  92. Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. In Proceedings of the 18’th annual conference on uncertainty in artificial intelligence (pp. 485–492). Google Scholar
  93. Taskar, B., Guestrin, C., & Koller, D. (2003b). Max-margin Markov networks. Advances in Neural Information Processing Systems, 16. Google Scholar
  94. Taskar, B., Wong, M., Abbeel, P., & Koller, D. (2003a). Link prediction in relational data. Advances in Neural Information Processing Systems, 16. Google Scholar
  95. Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484. MathSciNetGoogle Scholar
  96. Wang, C., & Khardon, R. (2007). Policy iteration for relational MDPs. In Proceedings of the 23’rd annual conference on uncertainty in artificial intelligence. Google Scholar
  97. Wellman, M., Breese, J., & Goldman, R. (1992). From knowledge bases to decision models. The Knowledge Engineering Review, 7(1), 35–53. Google Scholar
  98. Winston, P. (1975). Learning structural descriptions from examples. In P. Winston (Ed.), The psychology of computer vision. New York: McGraw Hill. Google Scholar
  99. Wrobel, S. (1995). First-order theory refinement. In L. D. Raedt (Ed.), Advances in inductive logic programming (pp. 14–33). Amsterdam: IOS Press. Google Scholar
  100. Xu, Y., & Fern, A. (2007). On learning linear ranking functions for beam search. In Z. Ghahramani (Ed.) Proceedings of the 24’th international conference on machine learning (ICML-2007) (pp. 1047–1054). Omnipress. Google Scholar
  101. Xu, Y., Fern, A., & Yoon, S. (2007). Discriminative learning of beam-search heuristics for planning. In M.M. Veloso (Ed.) Proceedings of the international joint conference on artificial intelligence (IJCAI-07) (pp. 2041–2046). IJCAI. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Thomas G. Dietterich
    • 1
  • Pedro Domingos
    • 2
  • Lise Getoor
    • 3
  • Stephen Muggleton
    • 4
  • Prasad Tadepalli
    • 1
  1. 1.Oregon State UniversityCorvallisUSA
  2. 2.University of WashingtonSeattleUSA
  3. 3.University of MarylandCollege ParkUSA
  4. 4.Imperial CollegeLondonUK

Personalised recommendations