Abstract
We propose a joint segmentation and classification approach for the dialogue act recognition task on natural multi-party meetings (ICSI Meeting Corpus). Five broad DA categories are automatically recognised using a generative Dynamic Bayesian Network based infrastructure. Prosodic features and a switching graphical model are used to estimate DA boundaries, in conjunction with a factored language model which is used to relate words and DA categories. This easily generalizable and extensible system promotes a rational approach to the joint DA segmentation and recognition task, and is capable of good recognition performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ang, J., Liu, Y., Shriberg, E.: Automatic dialog act segmentation and classification in multiparty meetings. In: Proc. of the IEEE ICASSP (March 2005)
Shriberg, E., Dhillon, R., Bhagat, S., Ang, J., Carvey, H.: The ICSI meeting recorder dialog act (MRDA) corpus. In: Proc. HLT-NAACL SIGDIAL Workshop (April–May 2004)
Stolcke, A., Ries, K., Coccaro, N., Shriberg, E., Jurafsky, D., Taylor, P., Martin, R., Van Ess-Dykema, C., Meteer, M.: Dialogue act modeling for automatic tagging and recognition of conversational speech. Computational Linguistics (26), 339–373 (2000)
Nagata, M., Morimoto, T.: An experimental statistical dialogue model to predict the speech act type of the next utterance. In: Proc. of the International Symposium on Spoken Dialogue, pp. 83–86 (November 1993)
Bilmes, J., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Proceedings of HLT/NAACL 2003 (May 2003)
Shriberg, E., Bates, R., Taylor, P., Stolcke, A., Jurafsky, D., Ries, K., Coccaro, N., Martin, R., Meteer, M., Van Ess-Dykema, C.: Can prosody aid the automatic classification of dialog acts in conversational speech? Language and Speech (41), 439–487 (1998)
Hastie, H., Poesio, M., Isard, S.: Automatically predicting dialogue structure using prosodic features. Speech Communication (36), 63–79 (2002)
Zimmermann, M., Liu, Y., Shriberg, E., Stolcke, A.: Toward joint segmentation and classification of dialog acts in multiparty meetings. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869. Springer, Heidelberg (2006)
Ji, G., Bilmes, J.: Dialog act tagging using graphical models. In: Proc. of the IEEE ICASSP (March 2005)
Venkataraman, A., Ferrer, L., Stolcke, A., Shriberg, E.: Training a prosody-based dialog act tagger from unlabeled data. In: Proc. of the IEEE ICASSP (April 2003)
Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: The ICSI meeting corpus. In: Proc. IEEE ICASSP (April 2003)
Hain, T., Karafiát, M., Garau, G., Moore, D., Wan, V., Ordelman, R., Renals, S.: Transcription of conference room meetings: An investigation. In: Proc. Interspeech 2005, Eurospeech, Lisbon (September 2005)
Kirchhoff, K., Bilmes, J., Henderson, J., Schwartz, R., Noamany, M., Schone, P., Ji, G., Das, S., Egan, M., He, F., Vergyri, D., Liu, D., Duta, N.: Novel approaches to arabic speech recognition - final report from the jhu summer workshop 2002. Tech. Rep., John-Hopkins University (2002)
Stolcke, A.: SRILM an extensible language modeling toolkit. In: Proc. Int. Conf. on Spoken Language Processing (September 2002)
Murphy, K.P.: Dynamic Bayesian networks: Representation, inference and learning. Ph.D. Thesis, UC Berkeley, Computer Science Division (July 2002)
Bilmes, J., Zweig, G.: The Graphical Model ToolKit: an open source software system for speech and time-series processing. In: Proc. IEEE ICASSP (June 2002)
Bilmes, J.A.: Dynamic bayesian multinets. In: Proc. Int. Conf. on Uncertainty in Artificial Intelligence (2000)
Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraaij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, I., Post, W., Reidsma, D., Wellner, P.: The AMI meeting corpus: A pre-announcement. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dielmann, A., Renals, S. (2006). Multistream Recognition of Dialogue Acts in Meetings. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_16
Download citation
DOI: https://doi.org/10.1007/11965152_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69267-6
Online ISBN: 978-3-540-69268-3
eBook Packages: Computer ScienceComputer Science (R0)