Machine Learning

, 77:271 | Cite as

Structured prediction with reinforcement learning



We formalize the problem of Structured Prediction as a Reinforcement Learning task. We first define a Structured Prediction Markov Decision Process (SP-MDP), an instantiation of Markov Decision Processes for Structured Prediction and show that learning an optimal policy for this SP-MDP is equivalent to minimizing the empirical loss. This link between the supervised learning formulation of structured prediction and reinforcement learning (RL) allows us to use approximate RL methods for learning the policy. The proposed model makes weak assumptions both on the nature of the Structured Prediction problem and on the supervision process. It does not make any assumption on the decomposition of loss functions, on data encoding, or on the availability of optimal policies for training. It then allows us to cope with a large range of structured prediction problems. Besides, it scales well and can be used for solving both complex and large-scale real-world problems. We describe two series of experiments. The first one provides an analysis of RL on classical sequence prediction benchmarks and compares our approach with state-of-the-art SP algorithms. The second one introduces a tree transformation problem where most previous models fail. This is a complex instance of the general labeled tree mapping problem. We show that RL exploration is effective and leads to successful results on this challenging task. This is a clear confirmation that RL could be used for large size and complex structured prediction problems.


Structured prediction Reinforcement learning Sequence labeling Tree transformation HTML to XML 


  1. Baxter, J., Bartlett, P. L., & Weaver, L. (2001). Experiments with infinite-horizon, policy-gradient estimation. Journal of Artificial Intelligence Research, 15, 2001. MathSciNetGoogle Scholar
  2. Berger, A., Della Pietra, S., & Della Pietra, V. (1996). A maximum entropy approach to natural language processing. In Computational linguistics. Google Scholar
  3. Chidlovskii, B., & Fuselier, J. (2005). A probabilistic learning method for xml annotation of documents. In IJCAI. Google Scholar
  4. Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In EMNLP. Google Scholar
  5. Collins, M., & Roark, B. (2004). Incremental parsing with the perceptron algorithm. In Proceedings of the 42nd meeting of the association for computational linguistics (ACL’04), main volume (pp. 111–118). Barcelona, Spain, July 2004. Google Scholar
  6. Daumé III, H., & Marcu, D. (2005). Learning as search optimization: Approximate large margin methods for structured prediction. In International conference on machine learning (ICML), Bonn, Germany, 2005. New York: ACM. Google Scholar
  7. Daumé III, H., Langford, J., & Marcu, D. (2006). Search-based structured prediction. Machine Learning Journal (submitted). Google Scholar
  8. Denoyer, L., & Gallinari, P. (2006). The wikipedia xml corpus. SIGIR Forum. Google Scholar
  9. Denoyer, L., & Gallinari, P. (2007). Report on the xml mining track at inex 2005 and inex 2006: categorization and clustering of xml documents. SIGIR Forum, 41(1), 79–90. CrossRefGoogle Scholar
  10. Doan, A., Domingos, P., & Halevy, A. (2003). Learning to match the schemas of data sources: A multistrategy approach. Maching Learning, 50(3), 279–301. MATHCrossRefGoogle Scholar
  11. Fuhr, N., Gövert, N., Kazai, G., & Lalmas, M. (Eds.) (2002). Proceedings of the first workshop of the initiative for the evaluation of XML retrieval (INEX), Schloss Dagstuhl, Germany, December 9–11, 2002. Google Scholar
  12. Garcia, F., & Ndiaye, S. M. (1998). A learning rate analysis of reinforcement learning algorithms in finite-horizon. In ICML ’98: Proceedings of the fifteenth international conference on machine learning (pp. 215–223), San Francisco, CA, USA, 1998. San Mateo: Morgan Kaufmann. Google Scholar
  13. Globerson, A., Koo, T., Carreras, X., & Collins, M. (2007). Exponentiated gradient algorithms for log-linear structured prediction. In ICML (pp. 305–312). Google Scholar
  14. Jousse, F., Gilleron, R., Tellier, I., & Tommasi, M. (2006). Conditional random fields for xml trees. In ECML workshop on mining and learning in graphs. Google Scholar
  15. Kassel, R. H. (1995). A comparison of approaches to on-line handwritten character recognition. Ph.D. thesis, Cambridge, MA, USA. Google Scholar
  16. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. 18th international conf. on machine learning (pp. 282–289). San Mateo: Morgan Kaufmann. Google Scholar
  17. Maes, F., Denoyer, L., & Gallinari, P. (2007). Sequence labelling with reinforcement learning and ranking algorithms. In ECML, Warsaw, Poland. Google Scholar
  18. Phan, X.-H., & Nguyen, L.-M. (2005). Flexcrfs: Flexible conditional random field toolkit.
  19. Ramshaw, L., & Marcus, M. (1995). Text chunking using transformation-based learning. In D. Yarovsky & K. Church (Eds.), Proceedings of the third workshop on very large corpora (pp. 82–94), Somerset, New Jersey, 1995. Mant-de-Marsan: ACL. Google Scholar
  20. Ruzzo, W. L. (1979) On the complexity of general context-free language parsing and recognition. In Proceedings of the 6th colloquium, on automata, languages and programming (pp. 489–497), London, UK, 1979. Berlin: Springer. Google Scholar
  21. Sutton, R., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT Press. Google Scholar
  22. Taskar, B., Guestrin, C., & Koller, D. (2003). Max-margin Markov networks. In NIPS. Google Scholar
  23. Titov, I., & Henderson, J. (2007). Incremental Bayesian networks for structure prediction. In ICML (pp. 887–894). Google Scholar
  24. Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In International conference on machine learning (ICML). New York: ACM. Google Scholar
  25. Wisniewski, G., Denoyer, L., Francis, M., & Gallinari, P. (2007). Probabilistic model for structured document mapping. In 5th international conference on machine learning and data mining in pattern recognition (MLDM’07), Germany, 2007. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Francis Maes
    • 1
  • Ludovic Denoyer
    • 1
  • Patrick Gallinari
    • 1
  1. 1.LIP6University Pierre et Marie Curie (Paris 6)ParisFrance

Personalised recommendations