Advertisement

Linguistic Resources for Meeting Speech Recognition

  • Meghan Lammie Glenn
  • Stephanie Strassel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3869)

Abstract

This paper describes efforts by the University of Pennsylvania’s Linguistic Data Consortium to create and distribute shared linguistic resources – including data, annotations, tools and infrastructure – to support the Rich Transcription 2005 Spring Meeting Recognition Evaluation. In addition to distributing large volumes of training data, LDC produced reference transcripts for the RT-05S conference room evaluation corpus, which represents a variety of subjects, scenarios and recording conditions. Careful verbatim reference transcripts including rich markup were created for all two hours of data. One hour was also selected for a contrastive study using a quick transcription methodology. We review the two methodologies and discuss qualitative differences in the resulting transcripts. Finally, we describe infrastructure development including transcription tools to support our efforts.

Keywords

Audio Signal Segment Boundary Broadcast News Linguistic Resource Human Language Technology 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Strassel, S., Glenn, M.: Shared Linguistic Resources for Human Language Technology in the Meeting Domain. In: Proceedings of the ICASSP 2004 Meeting Recognition Workshop (2004), http://www.nist.gov/speech/test_beds/mr_proj/icassp_program.html
  2. 2.
    Linguistic Data Consortium: RT-04 Meeting Transcription Guidelines (2004), http://www.ldc.upenn.edu/Projects/Transcription/NISTMeet/index.html
  3. 3.
    Strassel, S., Cieri, C., Walker, K., Miller, D.: Shared Resources for Robust Speech-to-Text Technology. In: Proceedings of Eurospeech (2003)Google Scholar
  4. 4.
    Bird, S., Liberman, M.: A formal framework for linguistic annotation. Speech Communication 33, 23–60 (2001)CrossRefzbMATHGoogle Scholar
  5. 5.
    Maeda, K., Strassel, S.: Annotation Tools for Large-Scale Corpus Development: Using AGTK at the Linguistic Data Consortium. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (2004)Google Scholar
  6. 6.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Meghan Lammie Glenn
    • 1
  • Stephanie Strassel
    • 1
  1. 1.Linguistic Data ConsortiumUniversity of PennsylvaniaPhiladelphiaUSA

Personalised recommendations