Skip to main content

Sentence Segmentation and Disfluency Detection in Narrative Transcripts from Neuropsychological Tests

  • Conference paper
  • First Online:
  • 838 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11122))

Abstract

Natural Language Processing (NLP) tools aiming at the diagnosis of language impairing dementias generally extract several textual metrics of narrative transcripts. However, the absence of sentence boundary segmentation in transcripts prevents the direct application of NLP methods which rely on these marks to work properly, such as taggers and parsers. We present a method to segment the transcripts into sentences and another to detect the disfluencies present in them, to serve as a preprocessing step for the application of subsequent NLP tools. Our methods use recurrent convolutional neural networks with prosodic, morphosyntactic features, and word embeddings. We evaluated both tasks intrinsically, analyzing the most important features, comparing the proposed methods to simpler ones, and identifying the main hits and misses. In addition, a final method was created to combine all tasks and it was evaluated extrinsically using 9 syntactic metrics of Coh-Metrix-Dementia. In the intrinsic evaluations, we showed that our method achieved (i) state-of-the-art results for the sentence segmentation task on impaired speech, and (ii) results that are similar to related works for the English language for disfluency detection tasks. Regarding the extrinsic evaluation, only 3 metrics showed a statistically significant difference between manual MCI transcripts and those generated by our method, suggesting that our method is capable to preprocess transcriptions to be further analyzed by NLP tools.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://nilc.icmc.usp.br/coh-metrix-dementia/.

References

  1. Aluísio, S., Cunha, A., Scarton, C.: Evaluating progression of alzheimer’s disease by regression and classification methods in a narrative language test in Portuguese. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS (LNAI), vol. 9727, pp. 109–114. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41552-9_10

    Chapter  Google Scholar 

  2. Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22, 249–254 (1996)

    Google Scholar 

  3. Che, X., Wang, C., Yang, H., Meinel, C.: Punctuation prediction for unsegmented transcript based on word vector. In: LREC, pp. 654–658 (2016)

    Google Scholar 

  4. Chen, J.C.: Speech recognition with automatic punctuation. In: EUROSPEECH, pp. 6–9 (1999)

    Google Scholar 

  5. Christensen, H., Gotoh, Y., Renals, S.: Punctuation annotation using statistical prosody models. In: ISCA Tutorial and Research (2006)

    Google Scholar 

  6. Fraser, K.C., Ben-david, N., Hirst, G., Graham, N.L., Rochon, E.: Sentence segmentation of aphasic speech. In: NAACL, pp. 862–871 (2015)

    Google Scholar 

  7. Heeman, P., Allen, J.: Detecting and correcting speech repairs. In: ACL, pp. 1–8 (1994)

    Google Scholar 

  8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

    Article  Google Scholar 

  9. Hough, J., Schlangen, D.: Joint, incremental disfluency detection and utterance segmentation from speech. In: EACL, pp. 326–336 (2017)

    Google Scholar 

  10. Jarrold, W.L., Peintner, B., Yeh, E., Krasnow, R., Javitz, H.S., Swan, G.E.: Language analytics for assessing brain health: cognitive impairment, depression and pre-symptomatic alzheimer’s disease. In: Yao, Y., Sun, R., Poggio, T., Liu, J., Zhong, N., Huang, J. (eds.) BI 2010. LNCS (LNAI), vol. 6334, pp. 299–307. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15314-3_28

    Chapter  Google Scholar 

  11. Lehr, M., Prud’hommeaux, E.T., Shafran, I., Roark, B.: Fully automated neuropsychological assessment for detecting mild cognitive impairment. In: INTERSPEECH, pp. 1039–1042 (2012)

    Google Scholar 

  12. Liu, Y., Shriberg, E., Stolcke, A., Harper, M.P.: Comparing HMM, maximum entropy, and conditional random fields for disfluency detection. In: INTERSPEECH, pp. 3313–3316 (2005)

    Google Scholar 

  13. Liu, Y., Stolcke, A., Shriberg, E., Harper, M.: Using conditional random fields for sentence boundary detection in speech. In: ACL, pp. 451–458 (2005)

    Google Scholar 

  14. Qian, X., Liu, Y.: Disfluency detection using multi-step stacked learning. In: ACL, pp. 820–825 (2013)

    Google Scholar 

  15. Shriberg, E., Bates, R.A., Stolcke, A.: A prosody only decision-tree model for disfluency detection. In: Eurospeech, pp. 2383–2386 (1997)

    Google Scholar 

  16. Stolcke, A., et al.: Automatic detection of sentence boundaries and disfluencies based on recognized words. In: ICSLP (1998)

    Google Scholar 

  17. Tieleman, T., Hinton, G.: RMSprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. (2012)

    Google Scholar 

  18. Tilk, O., Alumäe, T.: LSTM for punctuation restoration in speech transcripts. In: INTERSPEECH, pp. 683–687. ISCA (2015)

    Google Scholar 

  19. Treviso, M.V., Shulby, C., Aluísio, S.M.: Sentence segmentation in narrative transcripts from neuropsychological tests using recurrent convolutional neural networks. In: EACL, pp. 1–10 (2017)

    Google Scholar 

  20. Wang, S., Che, W., Zhang, Y., Zhang, M., Liu, T.: Transition-based disfluency detection using LSTMs. EMNLP, pp. 2775–2784 (2017)

    Google Scholar 

Download references

Acknowledgments

We thank CNPq for a scholarship granted to the first author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcos Vinícius Treviso .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Treviso, M.V., Aluísio, S.M. (2018). Sentence Segmentation and Disfluency Detection in Narrative Transcripts from Neuropsychological Tests. In: Villavicencio, A., et al. Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science(), vol 11122. Springer, Cham. https://doi.org/10.1007/978-3-319-99722-3_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99722-3_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99721-6

  • Online ISBN: 978-3-319-99722-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics