Abstract
This paper describes efforts by the University of Pennsylvania’s Linguistic Data Consortium to create and distribute shared linguistic resources – including data, annotations, tools and infrastructure – to support the Rich Transcription 2005 Spring Meeting Recognition Evaluation. In addition to distributing large volumes of training data, LDC produced reference transcripts for the RT-05S conference room evaluation corpus, which represents a variety of subjects, scenarios and recording conditions. Careful verbatim reference transcripts including rich markup were created for all two hours of data. One hour was also selected for a contrastive study using a quick transcription methodology. We review the two methodologies and discuss qualitative differences in the resulting transcripts. Finally, we describe infrastructure development including transcription tools to support our efforts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Strassel, S., Glenn, M.: Shared Linguistic Resources for Human Language Technology in the Meeting Domain. In: Proceedings of the ICASSP 2004 Meeting Recognition Workshop (2004), http://www.nist.gov/speech/test_beds/mr_proj/icassp_program.html
Linguistic Data Consortium: RT-04 Meeting Transcription Guidelines (2004), http://www.ldc.upenn.edu/Projects/Transcription/NISTMeet/index.html
Strassel, S., Cieri, C., Walker, K., Miller, D.: Shared Resources for Robust Speech-to-Text Technology. In: Proceedings of Eurospeech (2003)
Bird, S., Liberman, M.: A formal framework for linguistic annotation. Speech Communication 33, 23–60 (2001)
Maeda, K., Strassel, S.: Annotation Tools for Large-Scale Corpus Development: Using AGTK at the Linguistic Data Consortium. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Glenn, M.L., Strassel, S. (2006). Linguistic Resources for Meeting Speech Recognition. In: Renals, S., Bengio, S. (eds) Machine Learning for Multimodal Interaction. MLMI 2005. Lecture Notes in Computer Science, vol 3869. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11677482_33
Download citation
DOI: https://doi.org/10.1007/11677482_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32549-9
Online ISBN: 978-3-540-32550-5
eBook Packages: Computer ScienceComputer Science (R0)