Linguistic Resources for Meeting Speech Recognition
- 1.4k Downloads
This paper describes efforts by the University of Pennsylvania’s Linguistic Data Consortium to create and distribute shared linguistic resources – including data, annotations, tools and infrastructure – to support the Rich Transcription 2005 Spring Meeting Recognition Evaluation. In addition to distributing large volumes of training data, LDC produced reference transcripts for the RT-05S conference room evaluation corpus, which represents a variety of subjects, scenarios and recording conditions. Careful verbatim reference transcripts including rich markup were created for all two hours of data. One hour was also selected for a contrastive study using a quick transcription methodology. We review the two methodologies and discuss qualitative differences in the resulting transcripts. Finally, we describe infrastructure development including transcription tools to support our efforts.
KeywordsAudio Signal Segment Boundary Broadcast News Linguistic Resource Human Language Technology
Unable to display preview. Download preview PDF.
- 1.Strassel, S., Glenn, M.: Shared Linguistic Resources for Human Language Technology in the Meeting Domain. In: Proceedings of the ICASSP 2004 Meeting Recognition Workshop (2004), http://www.nist.gov/speech/test_beds/mr_proj/icassp_program.html
- 2.Linguistic Data Consortium: RT-04 Meeting Transcription Guidelines (2004), http://www.ldc.upenn.edu/Projects/Transcription/NISTMeet/index.html
- 3.Strassel, S., Cieri, C., Walker, K., Miller, D.: Shared Resources for Robust Speech-to-Text Technology. In: Proceedings of Eurospeech (2003)Google Scholar
- 5.Maeda, K., Strassel, S.: Annotation Tools for Large-Scale Corpus Development: Using AGTK at the Linguistic Data Consortium. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (2004)Google Scholar