Skip to main content
Log in

A novel density-based clustering method using word embedding features for dialogue intention recognition

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

In dialogue systems, understanding user utterances is crucial for providing appropriate responses. Various classification models have been proposed to deal with natural language understanding tasks related to user intention analysis, such as dialogue acts or emotion recognition. However, models that use original lexical features without any modifications encounter the problem of data sparseness, and constructing sufficient training data to overcome this problem is labor-intensive, time-consuming, and expensive. To address this issue, word embedding models that can learn lexical synonyms using vast raw corpora have recently been proposed. However, the analysis of embedding features is not yet sufficient to validate the efficiency of such models. Specifically, using the cosine similarity score as a feature in the embedding space neglects the skewed nature of the word frequency distribution, which can affect the improvement of model performance. This paper describes a novel density-based clustering method that efficiently integrates word embedding vectors into dialogue intention recognition. Experimental results show that our proposed model helps overcome the data sparseness problem seen in previous classification models and can assist in improving the classification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Mancini, M., Pelachaud, C.: Dynamic behavior qualifiers for conversational agents. In: Intelligent Virtual Agents: 7th International Working Conference, pp. 112–124 (2007)

  2. Bosma, W., André, E.: Exploiting emotions to disambiguate dialogue acts. In: Proceedings of the 9th International Conference on Intelligent User Interfaces, pp. 85–92 (2004)

  3. Austin, J.A.: How to Do Things with Words. Harvard University Press, Cambridge (1962)

    Google Scholar 

  4. Traum, D., Larsson, S.: The information state approach to dialogue management. In: Smith, R., van Kuppevelt, J. (eds.) Current and New Directions in Discourse and Dialogue. Kluwer, Dordrecht (2003)

    Google Scholar 

  5. Bub, T., Schwinn, J.: VERBMOBIL: the evolution of a complex large speech-to-speech translation system. In: Proceedings of International Conference on Spoken Language Processing, (1996)

  6. Allen, J., Core, M.: DAMSL: dialogue act markup in several layers (draft 2.1). Technical Report, University of Rochester, (1997)

  7. Bunt, H., Alexandersson, J., Charletta, J., Choe, J.W., Fang, A.C., Hasida, K., Lee, K., Petukhova, V., Popescu-Belis, A., Romary, L., Soria, C., Traum, D.: Towards an ISO standard for dialogue act annotation. In: Proceedings of International Language Resources and Evaluation (LREC’10), pp. 2248–2558, (2010)

  8. Bunt, H., Alexandersson, J., Charletta, J., Choe, J.W., Fang, A.C., Hasida, K., Lee, K., Petukhova, V., Popescu-Belis, A., Romary, L., Soria, C., Traum, D.: ISO 24617-2: a semantically-based standard for dialogue annotation. In;: Proceedings of International Language Resources and Evaluation (LREC’12), pp. 430–437, (2012)

  9. Lee, H., Kim, H., Seo, J.: An effective two-step model for speech act analysis in a schedule management domain. Korean J. Cognit. Sci. 19(3), 297–310 (2008)

    Article  MathSciNet  Google Scholar 

  10. Kim, S., Lee, Y., Lee, J.: Korean speech act tagging using previous sentence features and following candidate speech acts. J. Korean Inst. Inform.n Sci. Eng. 35(6), 374–385 (2008)

  11. Kim, M., Park, J., Kim, S., Rim, H., Lee, D.: A comparative study on optimal feature identification and combination for Korean dialogue act classification. J. Korean Inst. Inform.n Sci. Eng. 35(11), 681–691 (2008)

  12. Kim, H., Seon, C., Seo, J.: Review of Korean speech act classification: machine learning methods. J. Comput. Sci. Eng. 5(4), 288–293 (2011)

    Article  Google Scholar 

  13. Aman, S., Szpakowicz, S.: Identifying expressions of emotion in text. In: Proceedings of 10th International Conference on Text, Speech and Dialogue, (2007)

  14. Valstar, M., Jiang, B., Méhu, M., Pantic, M., Scherer, K.: The first facial expression recognition and analysis challenge. In: Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, pp. 921–926, (2011)

  15. Alhussein, M.: Automatic facial emotion recognition using weber local descriptor for e-Healthcare system. Clust. Comput. 19(1), 99–108 (2016)

    Article  MathSciNet  Google Scholar 

  16. Purver, M., Battersby, S.: Experimenting with distant supervision for emotion classification. In: Proceedings of EACL, pp. 482–491 (2012)

  17. Kang, S., Park, H., Seo, J.: Emotion classification of user’s utterance for a dialogue system. Korean J. Cognit. Sci. 21(4), 459–480 (2010)

    Article  Google Scholar 

  18. Hasegawa, T., Kaji, N., Yoshinaga, N., Toyoda, M.: Predicting and eliciting addressee’s emotion in online dialogue. In: Proceedings of ACL, pp. 964–972, (2013)

  19. Plutchik, R.: A general psychoevolutionary theory of emotion. In: Plutchik, R., Kellerman, H. (eds.) Emotion: Theory, Research, and Experience, pp. 3–33. Academic Press, New York (1980)

    Chapter  Google Scholar 

  20. Dumais, S.T.: Latent semantic analysis. Ann. Rev. Inform. Sci. Technol. 38, 188–230 (2004)

    Article  Google Scholar 

  21. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  22. Mikolov, T., Karafiat, M., Burget, L., Cernocky, J.: Recurrent neural network based language model. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), pp. 1045–1048, (2010)

  23. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR, (2013)

  24. Barnoi, M., Dinu, G., Kruszewski, G.: Don’t count, predict! a systematic comparison of context-counting versus context-predicting semantic vectors. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 238–247, (2014)

  25. Xu, R., Chen, T., Xia, Y., Lu, Q., Liu, B., Wang, X.: Word embedding composition for data imbalances in sentiment and emotion classification. Cognit. Comput. 7(2), 226–240 (2015)

    Article  Google Scholar 

  26. Shin, D., Lee, Y., Jang, J., Rim, H.: Emotion classification in dialogue using embedding features. In: Proceedings of the 27th Conference on Hangul and Korean Language Information Processing, pp. 109–114, (2015)

  27. Aggarwal, C.C., Reddy, C.K.: Data Clustering Algorithms and Applications. CRC Press, Boca Raton (2015)

    MATH  Google Scholar 

  28. Ester, M., Kriegel, H., Xu, X.: Knowledge discovery in large spatial databases: focusing techniques for efficient class identification. In: Proceedings of 4th International Symposium on Large Spatial Databases, pp. 67–82, (1995)

  29. Hinneburg, A., Keim, D.: An efficient approach to clustering large multimedia databases with noise. In: Proceedings of 4th International Conference on Knowledge Discovery and Data Mining, pp. 58–65, (1998)

  30. Lin, C., Cheng, J., Wu, C.: Mobile location estimation using density-based clustering techniques for NLoS environments. Clust. Comput. 10(1), 3–16 (2007)

    Article  Google Scholar 

  31. Ko, Y., Kim, K., Seo, J.: Topic keyword identification for text summarization using lexical clustering. IEICE Trans. Inform. Syst. 86(9), 1695–1701 (2003)

    Google Scholar 

  32. Li, Y., Luo, C., Chung, S.: A parallel text document clustering algorithm based on neighbors. Clust. Comput. 18(2), 933–948 (2015)

    Article  Google Scholar 

  33. Park, K., Lim, H.: Acquiring lexical knowledge using raw corpora and unsupervised clustering method. Clust. Comput. 17(3), 901–910 (2014)

    Article  Google Scholar 

  34. Lee, D., Rim, H.: Probabilistic modeling of Korean morphology. IEEE Trans. Audio Speech Lang. Process. 17(5), 945–955 (2009)

    Article  Google Scholar 

  35. van der Maaten, L.: Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014)

    MathSciNet  MATH  Google Scholar 

  36. Kim, D., Lee, Y., Zhang, J., Rim, H.: Lexical feature embedding for classifying dialogue acts on Korean conversations., In: Proceedings of 42th Winter Conference on Korean Institute of Information Scientists and Engineers, pp. 575–577, (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haechang Rim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jang, J., Lee, Y., Lee, S. et al. A novel density-based clustering method using word embedding features for dialogue intention recognition. Cluster Comput 19, 2315–2326 (2016). https://doi.org/10.1007/s10586-016-0649-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-016-0649-7

Keywords

Navigation