Exploiting Structural Consistencies with Stacked Conditional Random Fields

  • Peter KlueglEmail author
  • Martin Toepfer
  • Florian Lemmerich
  • Andreas Hotho
  • Frank Puppe
Conference paper
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 30)


Conditional Random Fields (CRF) are popular methods for labeling unstructured or textual data. Like many machine learning approaches, these undirected graphical models assume the instances to be independently distributed. However, in real-world applications data is grouped in a natural way, e.g., by its creation context. The instances in each group often share additional structural consistencies. This paper proposes a domain-independent method for exploiting these consistencies by combining two CRFs in a stacked learning framework. We apply rule learning collectively on the predictions of an initial CRF for one context to acquire descriptions of its specific properties. Then, we utilize these descriptions as dynamic and high quality features in an additional (stacked) CRF. The presented approach is evaluated with a real-world dataset for the segmentation of references and achieves a significant reduction of the labeling error.


Collective information extraction Crf Stacked graphical models Structural consistencies Rule learning 


  1. 1.
    Arnold, A., Cohen, W.W.: Intra-document structural frequency features for semi-supervised domain adaptation. In: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 1291–1300. ACM, New York (2008)Google Scholar
  2. 2.
    Batal, I., Hauskrecht, M.: Constructing classification features using minimal predictive patterns. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 869–878. CIKM ’10, ACM, New York (2010)Google Scholar
  3. 3.
    Bunescu, R., Mooney, R.J.: Collective information extraction with relational markov networks. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. ACL ’04, Association for Computational Linguistics, Stroudsburg, PA (2004)Google Scholar
  4. 4.
    Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123. Morgan Kaufmann, Los Altos (1995)Google Scholar
  5. 5.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  6. 6.
    Councill, I., Giles, C.L., Kan, M.Y.: ParsCit: an open-source CRF reference string parsing package. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC’08). ELRA, Marrakech, Morocco (2008)Google Scholar
  7. 7.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370. ACL ’05, Association for Computational Linguistics, Stroudsburg, PA (2005)Google Scholar
  8. 8.
    Gulhane, P., Rastogi, R., Sengamedu, S.H., Tengli, A.: Exploiting content redundancy for web information extraction. Proc. VLDB Endow. 3, 578–587 (2010)Google Scholar
  9. 9.
    Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining. Menlo Park, CA, USA, pp. 249–271. (1996)Google Scholar
  10. 10.
    Kou, Z., Cohen, W.W.: Stacked graphical models for efficient inference in markov random fields. In: Proceedings of the 2007 SIAM International Conference on Data Mining, SIAM, address of publisher is not known. probably. Minneapolis, Minnesota, USA (2007)Google Scholar
  11. 11.
    Krishnan, V., Manning, C.D.: An effective two-stage model for exploiting non-local dependencies in named entity recognition. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, pp. 1121–1128. ACL-44, ACL, Stroudsburg, PA (2006)Google Scholar
  12. 12.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp. 282–289 (2001)Google Scholar
  13. 13.
    McCallum, A.: Efficiently inducing features of conditional random fields. In: Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI03), Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2003)Google Scholar
  14. 14.
    Peng, F., McCallum, A.: Accurate information extraction from research papers using conditional random fields. In: HLT-NAACL, Association for Computational Linguistics, Boston, Massachusetts, USA, pp. 329–336 (2004)Google Scholar
  15. 15.
    Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62(1–2), 107–136 (2006)CrossRefGoogle Scholar
  16. 16.
    Sutton, C., McCallum, A.: Collective segmentation and labeling of distant entities in information extraction. In: ICML Workshop on Statistical Relational Learning and Its Connections to Other Fields (2004)Google Scholar
  17. 17.
    Wolpert, D.H.: Stacked generalization. Neural Networks 5, 241–259 (1992)CrossRefGoogle Scholar
  18. 18.
    Yang, J.M., Cai, R., Wang, Y., Zhu, J., Zhang, L., Ma, W.Y.: Incorporating site-level knowledge to extract structured data from web forums. In: Proceedings of the 18th International Conference on World Wide Web, pp. 181–190. ACM, New York (2009)Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Peter Kluegl
    • 1
    • 2
    Email author
  • Martin Toepfer
    • 1
  • Florian Lemmerich
    • 1
  • Andreas Hotho
    • 1
  • Frank Puppe
    • 1
  1. 1.Department of Computer Science VIUniversity of WuerzburgWuerzburgGermany
  2. 2.Comprehensive Heart Failure CenterUniversity of WuerzburgWuerzburgGermany

Personalised recommendations