Information Extraction



In its most basic form, text is a sequence of tokens, which is not annotated with the properties of these tokens. The goal of information extraction is to discover specific types of useful properties of these tokens and their interrelationships relationships.


Open Information Extraction Maximum Entropy Markov Models Parse Tree Named Entity Recognition Conditional Random Fields 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. [2]
    C. Aggarwal. Data mining: The textbook. Springer, 2015.Google Scholar
  2. [17]
    E. Agichtein and L. Gravano. Snowball: Extracting relations from large plain-text collections. ACM Conference on Digital Libraries, pp. 85–94, 2000.Google Scholar
  3. [35]
    M. Banko and O. Etzioni. The tradeoffs between open and traditional relation extraction. ACL Conference, pp. 28–36, 2008.Google Scholar
  4. [45]
    O. Bender, F. Och, and H. Ney. Maximum entropy models for named entity recognition. Conference on Natural Language Learning at HLT-NAACL 2003, pp. 148–51, 2003.Google Scholar
  5. [49]
    D. Bikel, S. Miller, R. Schwartz, and R. Weischedel. Nymble: a high-performance learning name-finder. Applied Natural Language Processing Conference, pp. 194–201, 1997.Google Scholar
  6. [63]
    S. Brin. Extracting patterns and relations from the World Wide Web. International Workshop on the Web and Databases, 1998.
  7. [67]
    R. Bunescu and R. Mooney. A shortest path dependency kernel for relation extraction. Human Language Technology and Empirical Methods in Natural Language Processing, pp. 724–731, 2005.Google Scholar
  8. [68]
    R. Bunescu and R. Mooney. Subsequence kernels for relation extraction. NIPS Conference, pp. 171–178, 2005.Google Scholar
  9. [73]
    M. Califf and R. Mooney. Bottom-up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research, 4, pp. 177-210, 2003.MathSciNetzbMATHGoogle Scholar
  10. [86]
    Y. Chan and D. Roth. Exploiting syntactico-semantic structures for relation extraction. ACL Conference: Human Language Technologies, pp. 551–560, 2011.Google Scholar
  11. [100]
    F. Ciravegna. Adaptive information extraction from text by rule induction and generalisation. International Joint Conference on Artificial Intelligence, 17(1), pp. 1251–1256, 2001.Google Scholar
  12. [107]
    M. Collins and N. Duffy. Convolution kernels for natural language. NIPS Conference, pp. 625–632, 2001.Google Scholar
  13. [121]
    S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. EMNLP-CoNLL, pp. 708–716, 2007.Google Scholar
  14. [122]
    A. Culotta and J. Sorensen. Dependency tree kernels for relation extraction. ACL Conference, 2004.Google Scholar
  15. [123]
    J. Curran and S. Clark. Language independent NER using a maximum entropy tagger. Conference on Natural Language Learning at HLT-NAACL 2003, pp. 164–167, 2003.Google Scholar
  16. [129]
    G. DeJong. Prediction and substantiation: A new approach to natural language processing. Cognitive Science, 3(3), pp. 251–273, 1979.CrossRefGoogle Scholar
  17. [134]
    T. Dietterich. Machine learning for sequential data: A review. Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), pp. 15–30, 2002.Google Scholar
  18. [158]
    O. Etzioni, M. Cafarella, D. Downey, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence, 165(1), pp. 91–134, 2005.CrossRefGoogle Scholar
  19. [160]
    A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545, 2011.Google Scholar
  20. [171]
    D. Freitag and A. McCallum. Information extraction with HMMs and shrinkage. AAAI-99 Workshop on Machine Learning for Information Extraction, pp. 31–36, 1999.Google Scholar
  21. [193]
    R. Grishman and B. Sundheim. Message Understanding Conference-6: A Brief History. COLING, pp. 466–471, 1996.Google Scholar
  22. [236]
    J. Jiang. Information extraction from text. Mining Text Data, Springer, pp. 11–41, 2012.CrossRefGoogle Scholar
  23. [237]
    J. Jiang and C. Zhai. A systematic exploration of the feature space for relation extraction. HLT-NAACL, pp. 113–120, 2007.Google Scholar
  24. [251]
    N. Kambhatla, Combining lexical, syntactic and semantic features with maximum entropy models for information extraction. ACL Conference, pp. 178–181, 2004.Google Scholar
  25. [266]
    A. Krogh, M. Brown, I. Mian, K. Sjolander, and D. Haussler. Hidden Markov models in computational biology: Applications to protein modeling. Journal of Molecular Biology, 235(5), pp. 1501–1531, 1994.CrossRefGoogle Scholar
  26. [268]
    J. Kupiec. Robust part-of-speech tagging using a hidden Markov model. Computer Speech and Language, 6(3), pp. 225–242, 1992.CrossRefGoogle Scholar
  27. [270]
    J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. ICML Conference, pp. 282–289, 2001.Google Scholar
  28. [303]
    B. Liu. Web data mining: exploring hyperlinks, contents, and usage data. Springer, New York, 2007.Google Scholar
  29. [318]
    R. Malouf. A comparison of algorithms for maximum entropy parameter estimation. Conference on Natural Language Learning, pp. 1–7, 2002.Google Scholar
  30. [322]
    C. Manning and H. Schütze. Foundations of statistical natural language processing. MIT Press, 1999.Google Scholar
  31. [326]
    A. McCallum, D. Freitag, and F. Pereira. Maximum entropy Markov models for information extraction and segmentation. ICML Conference, pp. 591–598, 2000.Google Scholar
  32. [348]
    M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation extraction without labeled data. Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, pp. 1003–1011, 2009.Google Scholar
  33. [362]
    T. Nguyen and A, Moschitti. End-to-end relation extraction using distant supervision from external semantic repositories. ACL Conference, pp. 277–282, 2011.Google Scholar
  34. [393]
    L. Qian, G. Zhou, F. Kong, Q. Zhu, and P. Qian. Exploiting constituent dependencies for tree kernel-based semantic relation extraction. International Conference on Computational Linguistics, pp. 697–704, 2008.Google Scholar
  35. [397]
    L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), pp. 257–286, 1989.CrossRefGoogle Scholar
  36. [400]
    A. Ratnaparkhi. A maximum entropy model for part-of-speech tagging. Conference on Empirical Methods in Natural Language Processing, pp. 133–142, 1996.Google Scholar
  37. [402]
    X. Ren, M. Jiang, J. Shang, and J. Han. Contructing Structured Information Networks from Massive Text Corpora (Tutorial), WWW Conference, 2017.Google Scholar
  38. [416]
    B. Rosenfeld and R. Feldman. Clustering for unsupervised relation identification. ACM CIKM Conference, pp. 411–418, 2007.Google Scholar
  39. [430]
    S. Sarawagi. Information extraction. Foundations and Trends in Satabases, 1(3), pp. 261–377, 2008.CrossRefGoogle Scholar
  40. [431]
    S. Sarawagi and W. Cohen. Semi-markov conditional random fields for information extraction. NIPS Conference, pp. 1185–1192, 2004.Google Scholar
  41. [442]
    K. Seymore, A. McCallum, and R. Rosenfeld. Learning hidden Markov model structure for information extraction. AAAI-99 Workshop on Machine Learning for Information Extraction, pp. 37–42, 1999.Google Scholar
  42. [446]
    Y. Shinyama and S. Sekine. Preemptive information extraction using unrestricted relation discovery. Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 304–311, 2006.Google Scholar
  43. [452]
    S. Soderland. Learning information extraction rules for semi-structured and free text. Machine Learning, 34(1–3), pp. 233–272, 1999.CrossRefGoogle Scholar
  44. [465]
    C. Sutton and A. McCallum. An introduction to conditional random fields. arXiv preprint, arXiv:1011.4088, 2010.
  45. [468]
    K. Takeuchi and N. Collier. Use of support vector machines in extended named entity recognition. Conference on Natural Language Learning, pp. 1–7, 2002.Google Scholar
  46. [526]
    D. Zelenko, C. Aone, and A. Richardella. Kernel methods for relation extraction. Journal of Machine Learning Research, 3. pp. 1083–1106, 2003.Google Scholar
  47. [533]
    M. Zhang, J. Zhang, and J. Su. Exploring syntactic features for relation extraction using a convolution tree kernel. Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pp. 288-295, 2006.Google Scholar
  48. [534]
    M. Zhang, J. Zhang, J. Su, and G. Zhou. A composite kernel to extract relations between entities with both flat and structured features. International Conference on Computational Linguistics and the Annual Meeting of the Association for Computational Linguistics, pp. 825–832, 2006.Google Scholar
  49. [535]
    S. Zhao and R. Grishman. Extracting relations with integrated information using kernel methods. ACL Conference, pp. 419–426, 2005.Google Scholar
  50. [548]
  51. [554]
  52. [556]
  53. [597]
  54. [598]
  55. [599]
  56. [600]
  57. [601]
  58. [602]
  59. [603]
  60. [604]
  61. [605]
  62. [606]
  63. [607]
  64. [608]

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.IBM T. J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations