A novel focus encoding scheme for addressee detection in multiparty interaction using machine learning algorithms

Malik, Usman; Barange, Mukesh; Saunier, Julien; Pauchet, Alexandre

doi:10.1007/s12193-020-00361-9

A novel focus encoding scheme for addressee detection in multiparty interaction using machine learning algorithms

Original Paper
Published: 17 January 2021

Volume 15, pages 175–188, (2021)
Cite this article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Usman Malik ORCID: orcid.org/0000-0002-6138-6635¹,
Mukesh Barange¹,
Julien Saunier¹ &
…
Alexandre Pauchet¹

228 Accesses
2 Citations
Explore all metrics

Abstract

Addressee detection is a fundamental task for seamless dialogue management and turn taking in human-agent interaction. Though addressee detection is implicit in dyadic interaction, it becomes a challenging task when more than two participants are involved. This article proposes multiple addressee detection models based on smart feature selection and focus encoding schemes. The models are trained using different machine learning and deep learning algorithms. This research work improves existing baseline accuracies for addressee prediction on two datasets. In addition, the article explores the impact of different focus encoding schemes in several addressee detection cases. Finally, an implementation strategy for addressee detection model in real-time is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning for Acoustic Addressee Detection in Spoken Dialogue Systems

Are You Addressing Me? Multimodal Addressee Detection in Human-Human-Computer Conversations

A Comparative Study of Classical and Deep Classifiers for Textual Addressee Detection in Human-Human-Machine Conversations

Notes

The annotation is available at: https://doi.org/10.6084/m9.figshare.13297775.
http://agent.roboslang.org.

References

Akker H, Akker R (2009) Are you being addressed?-real-time addressee detection to support remote participants in hybrid meetings. In: SIGDIAL, pp 21–28
Akker R, Traum D (2009) A comparison of addressee detection methods for multiparty conversations. In: SEMDIAL’09, pp 99–106
Baba N, Huang HH, Nakano YI (2011) Identifying utterances addressed to an agent in multiparty human–agent conversations. In: International workshop on IVA’11, pp 255–261
Bakx I, Van Turnhout K, Terken J (2003) Facial orientation during multi-party interaction with information kiosks. In: INTERACT 2003 Zurich, Switzerland, pp 163–170
Carletta J (2007) Unleashing the killer corpus: experiences in creating the multi-everything ami meeting corpus. Lang Resour Eval 41(2):181–190
Article Google Scholar
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: SIGKDD. ACM, pp 785–794
Dietterich TG et al (2002) Ensemble learning. Handb Brain Theory Neural Netw 2:110–125
Google Scholar
Elith J, Leathwick JR, Hastie T (2008) A working guide to boosted regression trees. J Anim. Ecol. 77(4):802–813
Article Google Scholar
Galley M, McKeown K, Hirschberg J, Shriberg E (2004) Identifying agreement and disagreement in conversational speech: use of Bayesian networks to model pragmatic dependencies. In: ACL’04, p 669
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Article Google Scholar
Goffman E (1981) Forms of talk, University of Pennsylvania publications in conduct and communication. University of Pennsylvania Press, Philadelphia
Google Scholar
Hastie T, Rosset S, Zhu J, Zou H (2009) Multi-class adaboost. Stat. Interface 2(3):349–360
Article MathSciNet Google Scholar
Hawkins DM (2004) The problem of overfitting. J Chem Inform Comput Sci 44(1):1–12
Article Google Scholar
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. Intell Syst Appl 13(4):18–28
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, vol 398. Wiley, New York
Book Google Scholar
Jovanovic N (2007) To whom it may concern-addressee identification in face-to-face meetings
Jovanovic N, Akker R, Nijholt A (2006) A corpus for studying addressing behaviour in multi-party dialogues. LREC’06 40(1):5–23
Google Scholar
Jovanovic N, op den Akker R (2004) Towards automatic addressee identification in multi-party dialogues. In: SIGdial@HLT-NAACL’04
Kiranyaz S, Ince T, Abdeljaber O, Avci O, Gabbouj M (2019) 1-D convolutional neural networks for signal processing applications. In: ICASSP’19, pp 8360–8364
Koutsombogera M, Vogel C (2018) Modeling collaborative multimodal behavior in group dialogues: the multisimo corpus. In: LREC-2018
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS’12, pp 1097–1105
Kruse R, Borgelt C, Klawonn F, Moewes C, Steinbrecher M, Held P (2013) Multi-layer perceptrons. In: Computational Intelligence, pp 47–81
Le MT, Shimizu N, Miyazaki T, Shinoda K (2018) Deep learning based multi-modal addressee recognition in visual scenes with utterances. In: IJCAI, pp 1546–1553
Liaw A, Wiener M et al (2002) Classification and regression by randomforest. R News 2(3):18–22
Google Scholar
Malik U, Barange M, Ghannad N, Saunier J, Pauchet A (2019) A generic machine learning based approach for addressee detection in multiparty interaction. In: IVA ’19, pp 119–126
McCowan I, Carletta J, Kraaij W, Ashby S, Bourban S, Flynn M, Guillemot M, Hain T, Kadlec J, Karaiskos V et al (2005) The ami meeting corpus. In: MB’05, vol 88, p 100
Melamud O, Goldberger J, Dagan I (2016) context2vec: learning generic context embedding with bidirectional lstm. In: 20th SIGNLL conference on computational natural language learning, pp 51–61
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. JMLR 12:2825–2830
MathSciNet MATH Google Scholar
Recasens A, Khosla A, Vondrick C, Torralba A (2015) Where are they looking? In: Adv. in neural information processing systems, pp 199–207
Rish I et al (2001) An empirical study of the naive bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 3, pp 41–46
Sacks H, Schegloff E, Jefferson G (1974) A simplest systematics for the organization of turn-taking in conversation. Language 50:696–735
Article Google Scholar
Searle JR, Searle JR (1969) Speech acts: an essay in the philosophy of language, vol. 626, Cambridge university press
Serban O, Pauchet A (2014) Agentslang: a new distributed interactive system. current approaches and performance. In: ICAART14, pp 596–603
Smit SK, Eiben AE (2009) Comparing parameter tuning methods for evolutionary algorithms. In: CEC’09, pp 399–406
Traum DR, Robinson S, Stephan J (2004) Evaluation of multi-party virtual reality dialogue interaction. In: LREC’04, pp 1699–1702
Traum DR, Robinson S, Stephan J (2006) Evaluation of multi-party reality dialogue interaction. Tech. rep., University of Southern California Marina Del Rey CA Inst For Creative Technologies
Vertegaal R (1998) Look who’s talking to whom. Mediating joint attention in multiparty. Doctoral Thesis, Twente University, the Netherlands
Zhang ML, Zhou ZH (2005) A k-nearest neighbor based algorithm for multi-label classification. In: GRC’05, vol 2. ACM, pp 718–721

Download references

Author information

Authors and Affiliations

Normandie University, INSA Rouen, LITIS, Rouen, France
Usman Malik, Mukesh Barange, Julien Saunier & Alexandre Pauchet

Authors

Usman Malik
View author publications
You can also search for this author in PubMed Google Scholar
Mukesh Barange
View author publications
You can also search for this author in PubMed Google Scholar
Julien Saunier
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Pauchet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Usman Malik.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the DAISI project, cofunded by the European Union with the European Regional Development Fund (ERDF), by the French Agence Nationale de la Recherche and by the Regional Council of Normandie.

Appendix: Classifiers and parameters for experimentation

Classifier	AMI parameters	MULTISIMO parameters
XGB	Learning_rate =0.1, n_estimators=140, max_depth=5, min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8, objective= ’multi:softmax’, nthread=4, scale_pos_weight=1	learning_rate =0.1, n_estimators=130, max_depth=3, min_child_weight=1, gamma=0, subsample=0.6, colsample_bytree=0.5, objective= ’multi:softmax’, nthread=4, scale_pos_weight=1
ET	’Bootstrap’: true, ’criterion’: ’gini’, ’max_features’: ’sqrt’, ’n_estimators’: 1000	’bootstrap’: True, ’criterion’: ’entropy’, ’max_features’: ’sqrt’, ’n_estimators’: 200
ADB	Base_estimtor = “DecisionTree”, ’max_features’: 30, ’n_estimators’:800	Base_estimtor = “DecisionTree”, ’max_features’: 30, ’n_estimators’:800
MLP	’Activation’: ’tanh’, ’alpha’: 0.05, ’hidden_layer_sizes’: (100,), ’learning_rate’: ’adaptive’, ’solver’: ’adam’	activation = ’tanh’, alpha = 0.0001, hidden_layer_sizes = (50, 100, 50), learning_rate=’constant’, solver = ’sgd’, max_iter = 100
RF	’Bootstrap’: False, ’criterion’: ’gini’, ’max_features’: ’auto’, ’n_estimators’: 200	’bootstrap’: True, ’criterion’: ’gini’, ’max_features’: ’sqrt’, ’n_estimators’: 100
LR	Penalty=’l2’, C =100	penalty=‘l2’, C =0.1
SVM	’C’: 100, ’gamma’: 0.01	’C’: 10, ’gamma’: 0.01
NB	No Parameters	No Parameters
KNN	’n_neighbors’: 8	’n_neighbors’: 9
LSTM	Hidden layer neurons = (100, 50), drop Out = 0.5, hidden_activation = relu, final_Activation = softmax, loss = cateorical_crossentropy, optimizer = adam, Bach_size = 4, epochs = 100, callbacks = early Stopping, patience = 20	hidden layer neurons = (50, 25), drop Out = 0.2, hidden_activation = relu, final_Activation = softmax, loss = cateorical_crossentropy, optimizer = adam, Bach_size = 1, epochs = 100, callbacks = early Stopping, patience = 20
Bi-LSTM	Hidden layer neurons = (100, 50), drop Out = 0.5, hidden_activation = relu, final_Activation = softmax, loss = cateorical_crossentropy, optimizer = adam, Bach_size = 4, epochs = 100, callbacks = early Stopping, patience = 20	hidden layer neurons = (50, 25), drop Out = 0.2, hidden_activation = relu, final_Activation = softmax, loss = cateorical_crossentropy, optimizer = adam, Bach_size = 1, epochs = 100, callbacks = early Stopping, patience = 20
1D-CNN	Hidden layer neurons = (100, 50), kernel_size(3,3) drop Out = 0.5, hidden_activation = relu, final_Activation = softmax, loss = cateorical_crossentropy, optimizer = adam, Bach_size = 4, epochs = 100, calbacks = early Stopping, patience = 20	hidden layer neurons = (50, 25), kernel_size(3,3) drop Out = 0.2, hidden_activation = relu, final_Activation = softmax, loss = cateorical_crossentropy, optimizer = adam, Bach_size = 1, epochs = 100, callbacks = early Stopping, patience = 20

Rights and permissions

Reprints and permissions

About this article

Cite this article

Malik, U., Barange, M., Saunier, J. et al. A novel focus encoding scheme for addressee detection in multiparty interaction using machine learning algorithms. J Multimodal User Interfaces 15, 175–188 (2021). https://doi.org/10.1007/s12193-020-00361-9

Download citation

Received: 29 October 2019
Accepted: 05 December 2020
Published: 17 January 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s12193-020-00361-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel focus encoding scheme for addressee detection in multiparty interaction using machine learning algorithms

Abstract

Access this article

Similar content being viewed by others

Deep Learning for Acoustic Addressee Detection in Spoken Dialogue Systems

Are You Addressing Me? Multimodal Addressee Detection in Human-Human-Computer Conversations

A Comparative Study of Classical and Deep Classifiers for Textual Addressee Detection in Human-Human-Machine Conversations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Classifiers and parameters for experimentation

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel focus encoding scheme for addressee detection in multiparty interaction using machine learning algorithms

Abstract

Access this article

Similar content being viewed by others

Deep Learning for Acoustic Addressee Detection in Spoken Dialogue Systems

Are You Addressing Me? Multimodal Addressee Detection in Human-Human-Computer Conversations

A Comparative Study of Classical and Deep Classifiers for Textual Addressee Detection in Human-Human-Machine Conversations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Classifiers and parameters for experimentation

Appendix: Classifiers and parameters for experimentation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation