Skip to main content

Machine Learning for Higher-Level Linguistic Tasks

  • Chapter
  • First Online:
Book cover Handbook of Linguistic Annotation

Abstract

Annotation is one of the main vehicles for supplying knowledge to machine learning systems built to automate text processing tasks. In this chapter, we discuss how linguistic annotation is used in machine learning for different natural language processing (NLP) tasks. Specifically, we focus on how different layers of annotation are leveraged in tasks that aim to discover higher-level linguistic information. We present how machine learning fits into the annotation process in the MATTER cycle, discuss some common machine learning algorithms used in NLP, explain the fundamentals of feature selection, and explore methods for leveraging limited quantities of annotated data. We close with a case study of the 2012 i2b2 NLP shared task which targeted temporal information extraction, a higher-level task that requires a synthesis of information from multiple linguistic levels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat. 37(6), 15541563 (1966). doi:10.1214/aoms/1177699147

    Article  Google Scholar 

  2. Berger, A.L., Pietra, S.A.D., Pietra, V.J.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)

    Google Scholar 

  3. Biber, D., Conrad, S., Reppen, R.: Compurs Linguistics: Investigating Language Structure and Use. Cambridge University Press, Cambridge (1998)

    Book  Google Scholar 

  4. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit. O’Reilly (2009)

    Google Scholar 

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    Google Scholar 

  6. Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100 (1998)

    Google Scholar 

  7. Bordes, A., Glorot, X., Weston, J., Bengio, Y.: Joint learning of words and meaning representations for open-text semantic parsing. In: Proceedings of 15th International Conference on Artificial Intelligence and Statistics (2012)

    Google Scholar 

  8. Chang, Y.-C.., Dai, H.-J., Wu, J.C.-Y., Chen, J.-M., Tsai, R.T.-H.: Hsu, W.-L.: TEMPTING system: a hybrid method of rule and machine learning for temporal relation extraction in patient discharge summaries. J. Biomed. Inform. 46 Supplement S54–S62 (2013)

    Google Scholar 

  9. Chen, S.F.: Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Proceedings of the 34th annual meeting on Association for Computational Linguistics (ACL ’96). Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 310-318. doi:10.3115/981863.981904 (1996)

  10. Cherry, C., Zhu, X., Martin, J., de Bruijn, B.: A la Recherche du Temps Perdu: extracting temporal relations from medical text in the 2012 i2b2 NLP challenge. J. Am. Med. Inform. Assoc. 2013(20), 843–848 (2012). doi:10.1136/amiajnl-2013-001624

    Google Scholar 

  11. Chiticariu, L., Li, Y., Reiss, F.R.: Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems! EMNLP 2013, pp. 827–832 (2013)

    Google Scholar 

  12. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273 (1995). doi:10.1007/BF00994018

    Google Scholar 

  13. D’Souza, J., Ng, V.: Classifying temporal relations in clinical data: a hybrid, knowledge-rich approach. J. Biomed. Inform. 46(Supplement), S29–S39 (2013)

    Google Scholar 

  14. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis JASIS 41:6, pp. 391–407 (1990)

    Google Scholar 

  15. Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10) (2012). doi:10.1145/2347736.2347755

  16. Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–130 (1997)

    Article  Google Scholar 

  17. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993)

    Google Scholar 

  18. Ferraro, J.P., Daume 3rd, H., Duvall, S.L., Chapman, W.W. Harkema, H., Haug, P.J.: Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation. J. Am. Med. Inform. Assoc. 20(5), 931-939 (2013). doi:10.1136/amiajnl-2012-001453. Epub 13 Mar 2013

  19. Finkel, J.R., Manning, C.D.: Hierarchical joint learning: Improving joint parsing and named entity recognition with non-jointly labeled data. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 720–728. Association for Computational Linguistics (2010)

    Google Scholar 

  20. Grouin, C., Grabar, N., Hamon, T., Rosset, S., Tannier, X., Zweigenbaum, P.: Eventual situations for timeline extraction from clinical reports. J. Am. Med. Inf. Assoc. 20, 820–827 (2013). doi:10.1136/amiajnl-2013-001627

    Article  Google Scholar 

  21. Jindal, P., Roth, D.: Extraction of events and temporal expressions from clinical narratives. J. Biomed. Inform. 46 Suppl, pp. S13-S19 (2013). doi:10.1016/j.jbi.2013.08.010. Epub 8 Sep 2013

  22. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics, 2nd edn. Prentice-Hall (2009)

    Google Scholar 

  23. Klein, D., Manning, C.D.: Conditional structure versus conditional estimation in NLP models. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10. Association for Computational Linguistics (2002)

    Google Scholar 

  24. Kovacevic, A., Dehghan, A., Filannino, M., Keane, J.A., Nenadic, G.: Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. J. Am. Med. Inform. Assoc. 20, 859–866 (2013). doi:10.1136/amiajnl-2013-001625

  25. Lafferty, J.D., McCallum, A., Pereira, Fernando C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning (2001)

    Google Scholar 

  26. Lin, Y.-K., Chen, H., Brown, R.A.: MedTime: a temporal information extraction system for clinical narratives. J. Biomed. Inform. 46, Supplement S20–S28 (2013)

    Google Scholar 

  27. Manning, C.D., Raghaven, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  Google Scholar 

  28. McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the Seventeenth International Conference on Machine Learning (2000)

    Google Scholar 

  29. Ng, A., Jordan, M.I.: On discriminative vs. Generative classiers: a comparison of logistic regression and naive bayes. In: NIPS (2001)

    Google Scholar 

  30. Nikfarjam, A., Emadzadeh, E., Gonzalez, G.: Towards generating a patients timeline: Extracting temporal relationships from clinical notes. J. Biomed. Inform. 46, Special Issue, S40–S47 (2013)

    Google Scholar 

  31. Pustejovsky, J., Rumshisky, A.: SemEval-2010 Task 7: argument selection and coercion. In: NAACL 2009 Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009). Boulder, Colorado USA (2009)

    Google Scholar 

  32. Pustejovsky, J., Stubbs, A.: Natural Language Annotation for Machine Learning. OReilly Media (2012)

    Google Scholar 

  33. Roberts, K., Rink, B., Harabagiu, S.M.: A flexible framework for recognizing events, temporal expressions, and temporal relations in clinical text. J. Am. Med. Inform. Assoc. 20, 867–875 (2013). doi:10.1136/amiajnl-2013-001619

    Article  Google Scholar 

  34. Russell, S., Norvig, P.: [1995] Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall (2003) [1995]. ISBN 978-0137903955

    Google Scholar 

  35. Settles, B.: Active Learning Literature Survey. Computer Sciences Technical Report. University of Wisconsin–Madison (2009)

    Google Scholar 

  36. Settles, B.: Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 6(1), p. 1. Morgan and Claypool. http://dx.doi.org/10.2200/S00429ED1V01Y201207AIM018 (2012)

  37. Singh, S., Riedel, S., Martin, B., Zheng, J., McCallum, A.: Joint inference of entities, relations, and coreference. In: Third International Workshop on AutomatedKnowledge Base Construction (AKBC) (2013)

    Google Scholar 

  38. Sohn, S., Wagholikar, K.B., Li, D., Jonnalagadda, S.R., Tao, C., Elayavilli, R.K., Liu, H.: Comprehensive temporal information detection from clinical text: medical events, time, and TLINK identification. J. Am. Med. Inform. Assoc. 20(5), 836–842 (2013). Published online 4 Apr 2013. doi:10.1136/amiajnl-2013-001622

  39. Sun, W., Rumshisky, A., Uzuner, O.: Annotating temporal information in clinical narratives. J. Biomed. Inform. 46(Supplement), S5–S12 (2013)

    Google Scholar 

  40. Sun, W., Rumshisky, A., Uzuner, O.: Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J. Am. Med. Inform. Assoc. 20(5), 806–813 (2013). doi:10.1136/amiajnl-2013-001628. Epub 5 Apr 2013

  41. Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. In: Getoor, L., Taskar, B. (eds.) Introduction to Statistical Relational Learning. MIT Press (2006)

    Google Scholar 

  42. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings 18th International Conference on Machine Learning, pp. 282-289. Morgan Kaufmann (2001)

    Google Scholar 

  43. Pudil, P., Novoviov, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognit. Lett. 15(11), 1119–1125 (1994)

    Article  Google Scholar 

  44. Tang, B., Cao, H., Wu, Y., Jiang, M., Xu, H.: Clinical entity recognition using structural support vector machines with rich features. In: ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics, Maui, HI, USA, pp. 13–20 (2012)

    Google Scholar 

  45. Tang, B., Wu, Y., Jiang, M., Chen, Y., Denny, J.C., Xu, H.: A hybrid system for temporal information extraction from clinical text. J. Am. Med. Inform. Assoc. doi:10.1136/amiajnl-2013-001635

  46. Xu, Y., Hong, K., Tsujii, J., Chang, E.I-C.: Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries. J. Am. Med. Inform. Assoc. 19, 824–832 (2012). doi:10.1136/amiajnl-2011-000776

  47. Xu, Y., Wang, Y., Liu, T., Tsujii, J.T., Chang, E.I-C.: An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge. J. Am. Med. Inform. Assoc. 20, 849–858 (2013). doi:10.1136/amiajnl-2012-001607

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Rumshisky .

Editor information

Editors and Affiliations

Appendix: Machine Learning Resources and Toolkits

Appendix: Machine Learning Resources and Toolkits

For more information on the inner workings of ML algorithms, we highly recommend the following books:

  • Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. 2nd edition. Prentice-Hall. 2009.

  • Christopher Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999

  • Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, 2013.

A variety of toolkits are available for building ML systems. These toolkits provide implementations of different ML algorithms, thereby allowing NLP researchers to focus on providing the appropriate feature sets to maximize the accuracy of the results of the ML system.

Many machine-learning systems for NLP are free and open source; here is a short list of commonly used ML toolkits and other systems:

The NLTK also has an accompanying book: “Natural Language Processing with Python” by Steven Bird, Ewan Klein, and Edward Loper [4].

In addition to providing implementations of many machine learning algorithms that the user can train for their own specific tasks, many of these toolkits provide already-trained systems for common NLP tasks such as part-of-speech tagging, named entity recognition, dependency trees, and so on. This additional functionality is extremely important for many NLP tasks.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Rumshisky, A., Stubbs, A. (2017). Machine Learning for Higher-Level Linguistic Tasks. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-94-024-0881-2_13

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-024-0879-9

  • Online ISBN: 978-94-024-0881-2

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics