Text and Speech Basics

  • Uday Kamath
  • John Liu
  • James Whitaker


This chapter introduces the major topics in text and speech analytics and machine learning approaches. Neural network approaches are deferred to later chapters.


  1. [AZ12]
    Charu C. Aggarwal and ChengXiang Zhai. “A Survey of Text Clustering Algorithms.” In: Mining Text Data. Springer, 2012, pp. 77–128.Google Scholar
  2. [And12]
    S.R. Anderson. Languages: A Very Short Introduction. OUP Oxford, 2012.Google Scholar
  3. [AM10]
    Ion Androutsopoulos and Prodromos Malakasiotis. “A Survey of Paraphrasing and Textual Entailment Methods”. In: J. Artif. Int. Res. 38.1 (May 2010), pp. 135–187.CrossRefGoogle Scholar
  4. [AL13]
    Samet Atdag and Vincent Labatut. “A Comparison of Named Entity Recognition Tools Applied to Biographical Texts”. In: CoRR abs/1308.0661 (2013).Google Scholar
  5. [AHG99]
    Saliha Azzam, Kevin Humphreys, and Robert Gaizauskas. “Using Coreference Chains for Text Summarization”. In: in ACL Workshop on Coreference and its Applications. 1999.Google Scholar
  6. [BB07]
    Nguyen Bach and Sameer Badaskar. “A Review of Relation Extraction”. 2007.Google Scholar
  7. [BFL98b]
    Collin F. Baker, Charles J. Fillmore, and John B. Lowe. “The Berkeley FrameNet Project”. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1. ACL ‘98. Association for Computational Linguistics, 1998, pp. 86–90.Google Scholar
  8. [Bak+09]
    Janet Baker et al. “Research Developments and Directions in Speech Recognition and Understanding, Part 1”. In: IEEE Signal Processing Magazine 26 (2009), pp. 75–80.CrossRefGoogle Scholar
  9. [BM04]
    Michele Banko and Bob Moore. “Part of Speech Tagging in Context”. In: International Conference on Computational Linguistics, 2004.Google Scholar
  10. [Bel+17]
    Anya Belz et al. “Shared Task Proposal: Multilingual Surface Realization Using Universal Dependency Trees”. In: Proceedings of the 10th International Conference on Natural Language Generation. 2017, pp. 120–123.Google Scholar
  11. [Ber03]
    Michael Berry. Survey of Text Mining : Clustering Classification, and Retrieval. Springer, 2003.Google Scholar
  12. [Bir+08]
    Istvan Biro et al. “A Comparative Analysis of Latent Variable Models for Web Page Classification”. In: Proceedings of the 2008 Latin American Web Conference. LA-WEB ‘08. IEEE Computer Society, 2008, pp. 23–28.Google Scholar
  13. [BN00]
    Branimir K. Boguraev and Mary S. Neff. “Lexical Cohesion, Discourse Segmentation and Document Summarization”. In: Content-Based Multimedia Information Access - Volume 2. RIAO ‘00. 2000, pp. 962–979.Google Scholar
  14. [Bur+07]
    Aljoscha Burchardt et al. “A Semantic Approach to Textual Entailment: System Evaluation and Task Analysis”. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing. Association for Computational Linguistics, 2007, pp. 10–15.Google Scholar
  15. [CR12]
    Claudio Carpineto and Giovanni Romano. “A Survey of Automatic Query Expansion in Information Retrieval”. In: ACM Comput. Surv. 44.1 (Jan. 2012), 1:1–1:50.CrossRefGoogle Scholar
  16. [CT94]
    William B. Cavnar and John M. Trenkle. “N-Gram-Based Text Categorization”. In: Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval. 1994, pp. 161–175.Google Scholar
  17. [Cov01]
    Michael A. Covington. “A fundamental algorithm for dependency parsing”. In: In Proceedings of the 39th Annual ACM Southeast Conference. 2001, pp. 95–102.Google Scholar
  18. [DeR88]
    Steven J. DeRose. “Grammatical Category Disambiguation by Statistical Optimization”. In: Comput. Linguist. 14.1 (Jan. 1988), pp. 31–39.Google Scholar
  19. [Dod02]
    George Doddington. “Automatic Evaluation of Machine Translation Quality Using N-gram Co-occurrence Statistics”. In: Proceedings of the Second International Conference on Human Language Technology Research. HLT ‘02. Morgan Kaufmann Publishers Inc., 2002, pp. 138–145.Google Scholar
  20. [Fu+12]
    Linyun Fu et al. “Towards Better Understanding and Utilizing Relations in DBpedia”. In: Web Intelli. and Agent Sys. 10.3 (July 2012), pp. 291–303.Google Scholar
  21. [FGP10]
    Ulrich Furbach, Ingo Glöckner, and Björn Pelzer. “An Application of Automated Reasoning in Natural Language Question Answering”. In: AI Commun. 23.2–3 (Apr. 2010), pp. 241–265.Google Scholar
  22. [GG17]
    Mahak Gambhir and Vishal Gupta. “Recent Automatic Text Summarization Techniques: A Survey”. In: Artif. Intell. Rev. 47.1 (Jan. 2017), pp. 1–66.Google Scholar
  23. [GJ02]
    Daniel Gildea and Daniel Jurafsky. “Automatic Labeling of Semantic Roles”. In: Comput. Linguist. 28.3 (Sept. 2002), pp. 245–288.Google Scholar
  24. [Gui+06]
    Yves Guiard et al. “Shakespeare’s Complete Works As a Benchmark for Evaluating Multiscale Document Navigation Techniques”. In: Proceedings of the 2006 AVI Workshop on BEyond Time and Errors: Novel Evaluation Methods for Information Visualization. ACM, 2006, pp. 1–6.Google Scholar
  25. [HMM16]
    Mohamed H, Marwa M.A., and Ahmed Mohammed. “Different Models and Approaches of Textual Entailment Recognition”. In: 142 (May 2016), pp. 32–39.Google Scholar
  26. [HEH12]
    Nizar Habash, Ramy Eskander and Abdelati Hawwari. “A Morphological Analyzer for Egyptian Arabic”. In: Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology. Association for Computational Linguistics, 2012, pp. 1–9.Google Scholar
  27. [HN14]
    Kazi Saidul Hasan and Vincent Ng. “Automatic keyphrase extraction: A survey of the state of the art”. In: In Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL). 2014.Google Scholar
  28. [Hon05]
    Gumwon Hong. “Relation Extraction Using Support Vector Machine”. In: Proceedings of the Second International Joint Conference on Natural Language Processing. Springer-Verlag, 2005, pp. 366–377.Google Scholar
  29. [JN08]
    Richard Johansson and Pierre Nugues. “Dependency-based Semantic Role Labeling of PropBank”. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. EMNLP ‘08. Association for Computational Linguistics, 2008, pp. 69–78.Google Scholar
  30. [JM09]
    Daniel Jurafsky and James H. Martin. Speech and Language Processing (2Nd Edition). Prentice-Hall, Inc., 2009.Google Scholar
  31. [KOM03]
    Philipp Koehn, Franz Josef Och, and Daniel Marcu. “Statistical Phrase-based Translation”. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1. Association for Computational Linguistics, 2003, pp. 48–54.Google Scholar
  32. [KM11]
    Oleksandr Kolomiyets and Marie-Francine Moens. “A Survey on Question Answering Technology from an Information Retrieval Perspective”. In: Inf. Sci. 181.24 (Dec. 2011), pp. 5412–5434.Google Scholar
  33. [KM01]
    Taku Kudo and Yuji Matsumoto. “Chunking with Support Vector Machines”. In: Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies. Association for Computational Linguistics, 2001, pp. 1–8.Google Scholar
  34. [Lee+13]
    Heeyoung Lee et al. “Deterministic Coreference Resolution Based on Entity-centric, Precision-ranked Rules”. In: Comput. Linguist. 39.4 (Dec. 2013), pp. 885–916.Google Scholar
  35. [Mar+94]
    Mitchell Marcus et al. “The Penn Treebank: Annotating Predicate Argument Structure”. In: Proceedings of the Workshop on Human Language Technology. Association for Computational Linguistics, 1994, pp. 114–119.Google Scholar
  36. [MSM11]
    David McClosky, Mihai Surdeanu, and Christopher D. Manning. “Event Extraction As Dependency Parsing”. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1. Association for Computational Linguistics, 2011, pp. 1626–1635.Google Scholar
  37. [McD+13]
    Ryan T. McDonald et al. “Universal Dependency Annotation for Multilingual Parsing.” In: The Association for Computer Linguistics, 2013, pp. 92–97.Google Scholar
  38. [MN02]
    Dan Moldovan and Adrian Novischi. “Lexical Chains for Question Answering”. In: Proceedings of the 19th International Conference on Computational Linguistics - Volume 1. Association for Computational Linguistics, 2002, pp. 1–7.Google Scholar
  39. [Niv+16]
    Joakim Nivre et al. “Universal Dependencies v1: A Multilingual Treebank Collection”. In: LREC. 2016.Google Scholar
  40. [PT13]
    Georgios Paltoglou and Mike Thelwall. “More than Bag-of-Words: Sentence-based Document Representation for Sentiment Analysis.” In: RANLP. RANLP 2013 Organising Committee / ACL, 2013, pp. 546–552.Google Scholar
  41. [Pap+02]
    Kishore Papineni et al. “BLEU: A Method for Automatic Evaluation of Machine Translation”. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. ACL ‘02. Association for Computational Linguistics, 2002, pp. 311–318.Google Scholar
  42. [PDM11]
    Slav Petrov, Dipanjan Das, and Ryan McDonald. “A universal part-of-speech tagset”. In: IN ARXIV:1104.2086. 2011.Google Scholar
  43. [PP09]
    Simone Paolo Ponzetto and Massimo Poesio. “State-of-the-art NLP Approaches to Coreference Resolution: Theory and Practical Recipes”. In: Tutorial Abstracts of ACL-IJCNLP 2009. Association for Computational Linguistics, 2009, pp. 6–6.Google Scholar
  44. [PWM08]
    Sameer Pradhan, Wayne Ward, and James H. Martin. “Towards robust semantic role labeling”. In: Computational Linguistics (2008).Google Scholar
  45. [Rac14]
    Jiří Raclavský “A Model of Language in a Synchronic and Diachronic Sense”. In: Lodź Studies in English and General Linguistic 2: Issues in Philosophy of Language and Linguistic. Łodź University Press, 2014, pp. 109–123.Google Scholar
  46. [Ram99]
    Juan Ramos. Using TF-IDF to Determine Word Relevance in Document Queries. 1999.Google Scholar
  47. [RR15]
    Kumar Ravi and Vadlamani Ravi. “A Survey on Opinion Mining and Sentiment Analysis”. In: Know.-Based Syst. 89.C (Nov. 2015), pp. 14–46.Google Scholar
  48. [Rit+12]
    Alan Ritter et al. “Open Domain Event Extraction from Twitter”. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2012, pp. 1104–1112.Google Scholar
  49. [Sau16]
    Ferdinand de Saussure. Cours de Linguistique Générale Payot, 1916.Google Scholar
  50. [SM99]
    Sam Scott and Stan Matwin. “Feature engineering for text classification”. In: Proceedings of ICML-99, 16th International Conference on Machine Learning. Morgan Kaufmann Publishers, San Francisco, US, 1999, pp. 379–388.Google Scholar
  51. [Seb02]
    Fabrizio Sebastiani. “Machine Learning in Automated Text Categorization”. In: ACM Comput. Surv. 34.1 (Mar. 2002), pp. 1–47.Google Scholar
  52. [SM00]
    H. Gregory Silber and Kathleen F. McCoy. “Efficient Text Summarization Using Lexical Chains”. In: Proceedings of the 5th International Conference on Intelligent User Interfaces. IUI ‘00. ACM, 2000, pp. 252–255.Google Scholar
  53. [Sin+13]
    Sameer Singh et al. “Joint Inference of Entities, Relations, and Coreference”. In: Proceedings of the 2013 Workshop on Automated Knowledge Base Construction. ACM, 2013, pp. 1–6.Google Scholar
  54. [Tab+11]
    Maite Taboada et al. “Lexicon-based Methods for Sentiment Analysis”. In: Comput. Linguist. 37.2 (June 2011), pp. 267–307.Google Scholar
  55. [TKSDM03c]
    Erik F. Tjong Kim Sang and Fien De Meulder. “Introduction to the CoNLL-2003 Shared Task: Language-independent Named Entity Recognition”. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4. Association for Computational Linguistics, 2003, pp. 142–147.Google Scholar
  56. [Wan+12]
    Chang Wang et al. “Relation Extraction and Scoring in DeepQA”. In: IBM Journal of Research and Development 56.3/4 (2012), 9:1–9:12.Google Scholar
  57. [Wei+15]
    Tingting Wei et al. “A semantic approach for text clustering using WordNet and lexical chains”. In: Expert Systems with Applications 42.4 (2015), pp. 2264–2275.CrossRefGoogle Scholar
  58. [WR05]
    Janyce Wiebe and Ellen Riloff. “Creating Subjective and Objective Sentence Classifiers from Unannotated Texts”. In: Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Text Processing. Springer-Verlag, 2005, pp. 486–497.Google Scholar
  59. [WG05]
    Florian Wolf and Edward Gibson. “Representing Discourse Coherence: A Corpus-Based Study”. In: Comput. Linguist. 31.2 (June 2005), pp. 249–288.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Uday Kamath
    • 1
  • John Liu
    • 2
  • James Whitaker
    • 1
  1. 1.Digital Reasoning Systems Inc.McLeanUSA
  2. 2.Intelluron CorporationNashvilleUSA

Personalised recommendations