A novel density-based clustering method using word embedding features for dialogue intention recognition

Jang, Jungsun; Lee, Yeonsoo; Lee, Seolhwa; Shin, Dongwon; Kim, Dongjun; Rim, Haechang

doi:10.1007/s10586-016-0649-7

A novel density-based clustering method using word embedding features for dialogue intention recognition

Published: 22 September 2016

Volume 19, pages 2315–2326, (2016)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Jungsun Jang¹,
Yeonsoo Lee¹,
Seolhwa Lee²,
Dongwon Shin²,
Dongjun Kim² &
…
Haechang Rim²

623 Accesses
10 Citations
Explore all metrics

Abstract

In dialogue systems, understanding user utterances is crucial for providing appropriate responses. Various classification models have been proposed to deal with natural language understanding tasks related to user intention analysis, such as dialogue acts or emotion recognition. However, models that use original lexical features without any modifications encounter the problem of data sparseness, and constructing sufficient training data to overcome this problem is labor-intensive, time-consuming, and expensive. To address this issue, word embedding models that can learn lexical synonyms using vast raw corpora have recently been proposed. However, the analysis of embedding features is not yet sufficient to validate the efficiency of such models. Specifically, using the cosine similarity score as a feature in the embedding space neglects the skewed nature of the word frequency distribution, which can affect the improvement of model performance. This paper describes a novel density-based clustering method that efficiently integrates word embedding vectors into dialogue intention recognition. Experimental results show that our proposed model helps overcome the data sparseness problem seen in previous classification models and can assist in improving the classification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

User intent classification in noisy texts: an investigation on neural language models

Article 28 May 2022

Probabilistic Word Association for Dialogue Act Classification with Recurrent Neural Networks

Deep Speaker Embeddings Based Online Diarization

References

Mancini, M., Pelachaud, C.: Dynamic behavior qualifiers for conversational agents. In: Intelligent Virtual Agents: 7th International Working Conference, pp. 112–124 (2007)
Bosma, W., André, E.: Exploiting emotions to disambiguate dialogue acts. In: Proceedings of the 9th International Conference on Intelligent User Interfaces, pp. 85–92 (2004)
Austin, J.A.: How to Do Things with Words. Harvard University Press, Cambridge (1962)
Google Scholar
Traum, D., Larsson, S.: The information state approach to dialogue management. In: Smith, R., van Kuppevelt, J. (eds.) Current and New Directions in Discourse and Dialogue. Kluwer, Dordrecht (2003)
Google Scholar
Bub, T., Schwinn, J.: VERBMOBIL: the evolution of a complex large speech-to-speech translation system. In: Proceedings of International Conference on Spoken Language Processing, (1996)
Allen, J., Core, M.: DAMSL: dialogue act markup in several layers (draft 2.1). Technical Report, University of Rochester, (1997)
Bunt, H., Alexandersson, J., Charletta, J., Choe, J.W., Fang, A.C., Hasida, K., Lee, K., Petukhova, V., Popescu-Belis, A., Romary, L., Soria, C., Traum, D.: Towards an ISO standard for dialogue act annotation. In: Proceedings of International Language Resources and Evaluation (LREC’10), pp. 2248–2558, (2010)
Bunt, H., Alexandersson, J., Charletta, J., Choe, J.W., Fang, A.C., Hasida, K., Lee, K., Petukhova, V., Popescu-Belis, A., Romary, L., Soria, C., Traum, D.: ISO 24617-2: a semantically-based standard for dialogue annotation. In;: Proceedings of International Language Resources and Evaluation (LREC’12), pp. 430–437, (2012)
Lee, H., Kim, H., Seo, J.: An effective two-step model for speech act analysis in a schedule management domain. Korean J. Cognit. Sci. 19(3), 297–310 (2008)
Article MathSciNet Google Scholar
Kim, S., Lee, Y., Lee, J.: Korean speech act tagging using previous sentence features and following candidate speech acts. J. Korean Inst. Inform.n Sci. Eng. 35(6), 374–385 (2008)
Kim, M., Park, J., Kim, S., Rim, H., Lee, D.: A comparative study on optimal feature identification and combination for Korean dialogue act classification. J. Korean Inst. Inform.n Sci. Eng. 35(11), 681–691 (2008)
Kim, H., Seon, C., Seo, J.: Review of Korean speech act classification: machine learning methods. J. Comput. Sci. Eng. 5(4), 288–293 (2011)
Article Google Scholar
Aman, S., Szpakowicz, S.: Identifying expressions of emotion in text. In: Proceedings of 10th International Conference on Text, Speech and Dialogue, (2007)
Valstar, M., Jiang, B., Méhu, M., Pantic, M., Scherer, K.: The first facial expression recognition and analysis challenge. In: Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, pp. 921–926, (2011)
Alhussein, M.: Automatic facial emotion recognition using weber local descriptor for e-Healthcare system. Clust. Comput. 19(1), 99–108 (2016)
Article MathSciNet Google Scholar
Purver, M., Battersby, S.: Experimenting with distant supervision for emotion classification. In: Proceedings of EACL, pp. 482–491 (2012)
Kang, S., Park, H., Seo, J.: Emotion classification of user’s utterance for a dialogue system. Korean J. Cognit. Sci. 21(4), 459–480 (2010)
Article Google Scholar
Hasegawa, T., Kaji, N., Yoshinaga, N., Toyoda, M.: Predicting and eliciting addressee’s emotion in online dialogue. In: Proceedings of ACL, pp. 964–972, (2013)
Plutchik, R.: A general psychoevolutionary theory of emotion. In: Plutchik, R., Kellerman, H. (eds.) Emotion: Theory, Research, and Experience, pp. 3–33. Academic Press, New York (1980)
Chapter Google Scholar
Dumais, S.T.: Latent semantic analysis. Ann. Rev. Inform. Sci. Technol. 38, 188–230 (2004)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Mikolov, T., Karafiat, M., Burget, L., Cernocky, J.: Recurrent neural network based language model. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), pp. 1045–1048, (2010)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR, (2013)
Barnoi, M., Dinu, G., Kruszewski, G.: Don’t count, predict! a systematic comparison of context-counting versus context-predicting semantic vectors. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 238–247, (2014)
Xu, R., Chen, T., Xia, Y., Lu, Q., Liu, B., Wang, X.: Word embedding composition for data imbalances in sentiment and emotion classification. Cognit. Comput. 7(2), 226–240 (2015)
Article Google Scholar
Shin, D., Lee, Y., Jang, J., Rim, H.: Emotion classification in dialogue using embedding features. In: Proceedings of the 27th Conference on Hangul and Korean Language Information Processing, pp. 109–114, (2015)
Aggarwal, C.C., Reddy, C.K.: Data Clustering Algorithms and Applications. CRC Press, Boca Raton (2015)
MATH Google Scholar
Ester, M., Kriegel, H., Xu, X.: Knowledge discovery in large spatial databases: focusing techniques for efficient class identification. In: Proceedings of 4th International Symposium on Large Spatial Databases, pp. 67–82, (1995)
Hinneburg, A., Keim, D.: An efficient approach to clustering large multimedia databases with noise. In: Proceedings of 4th International Conference on Knowledge Discovery and Data Mining, pp. 58–65, (1998)
Lin, C., Cheng, J., Wu, C.: Mobile location estimation using density-based clustering techniques for NLoS environments. Clust. Comput. 10(1), 3–16 (2007)
Article Google Scholar
Ko, Y., Kim, K., Seo, J.: Topic keyword identification for text summarization using lexical clustering. IEICE Trans. Inform. Syst. 86(9), 1695–1701 (2003)
Google Scholar
Li, Y., Luo, C., Chung, S.: A parallel text document clustering algorithm based on neighbors. Clust. Comput. 18(2), 933–948 (2015)
Article Google Scholar
Park, K., Lim, H.: Acquiring lexical knowledge using raw corpora and unsupervised clustering method. Clust. Comput. 17(3), 901–910 (2014)
Article Google Scholar
Lee, D., Rim, H.: Probabilistic modeling of Korean morphology. IEEE Trans. Audio Speech Lang. Process. 17(5), 945–955 (2009)
Article Google Scholar
van der Maaten, L.: Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014)
MathSciNet MATH Google Scholar
Kim, D., Lee, Y., Zhang, J., Rim, H.: Lexical feature embedding for classifying dialogue acts on Korean conversations., In: Proceedings of 42th Winter Conference on Korean Institute of Information Scientists and Engineers, pp. 575–577, (2015)

Download references

Author information

Authors and Affiliations

AI Center, NCSOFT, Seongnam-si, Gyeonggi-do, Korea
Jungsun Jang & Yeonsoo Lee
Department of Computer Science and Engineering, Korea University, Seongbuk-gu, Seoul, Korea
Seolhwa Lee, Dongwon Shin, Dongjun Kim & Haechang Rim

Authors

Jungsun Jang
View author publications
You can also search for this author in PubMed Google Scholar
Yeonsoo Lee
View author publications
You can also search for this author in PubMed Google Scholar
Seolhwa Lee
View author publications
You can also search for this author in PubMed Google Scholar
Dongwon Shin
View author publications
You can also search for this author in PubMed Google Scholar
Dongjun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Haechang Rim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haechang Rim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jang, J., Lee, Y., Lee, S. et al. A novel density-based clustering method using word embedding features for dialogue intention recognition. Cluster Comput 19, 2315–2326 (2016). https://doi.org/10.1007/s10586-016-0649-7

Download citation

Received: 25 March 2016
Revised: 04 August 2016
Accepted: 12 September 2016
Published: 22 September 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10586-016-0649-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel density-based clustering method using word embedding features for dialogue intention recognition

Abstract

Access this article

Similar content being viewed by others

User intent classification in noisy texts: an investigation on neural language models

Probabilistic Word Association for Dialogue Act Classification with Recurrent Neural Networks

Deep Speaker Embeddings Based Online Diarization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel density-based clustering method using word embedding features for dialogue intention recognition

Abstract

Access this article

Similar content being viewed by others

User intent classification in noisy texts: an investigation on neural language models

Probabilistic Word Association for Dialogue Act Classification with Recurrent Neural Networks

Deep Speaker Embeddings Based Online Diarization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation