A Novel Similarity Measure to Induce Semantic Classes and Its Application for Language Model Adaptation in a Dialogue System

Li, Ya-Li; Xu, Wei-Qun; Yan, Yong-Hong

doi:10.1007/s11390-012-1233-0

A Novel Similarity Measure to Induce Semantic Classes and Its Application for Language Model Adaptation in a Dialogue System

Regular Paper
Published: 05 March 2012

Volume 27, pages 443–450, (2012)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Ya-Li Li¹,
Wei-Qun Xu¹ &
Yong-Hong Yan¹

67 Accesses
Explore all metrics

Abstract

In this paper, we propose a novel co-occurrence probabilities based similarity measure for inducing semantic classes. Clustering with the new similarity measure outperforms the widely used distance based on Kullback-Leibler divergence in precision, recall and F1 evaluation. In our experiments, we induced semantic classes from unannotated in-domain corpus and then used the induced classes and structures to generate large in-domain corpus which was then used for language model adaptation. Character recognition rate was improved from 85.2% to 91%. We imply a new measure to solve the lack of domain data problem by first induction then generation for a dialogue system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Concept Discovery and Automatic Semantic Annotation for Language Understanding in an Information-Query Dialogue System Using Latent Dirichlet Allocation and Segmental Methods

Constructing Language Models for Spoken Dialogue Systems from Keyword Set

On the Use of Phoneme Lattices in Spoken Language Understanding

References

Gorin A L. On automated language acquisition. Acoustical Society of America Journal, 1995, 97(6): 3441–3461.
Article Google Scholar
Arai K, Wright J H, Riccardi G, Gorin A L. Grammar fragment acquisition using syntactic and semantic clustering. Speech Communication, 1999, 27(1): 43–62.
Article Google Scholar
Meng H M, Siu K C. Semiautomatic acquisition of semantic structures for understanding domain-specific natural language queries. IEEE Trans. Knowl. Data Eng., 2002, 14(1): 172–181.
Article Google Scholar
Pargellis A N, Fosler-Lussier E, Lee C H, Potamianos A, Tsai A. Auto-induced semantic classes. Speech Communication, 2004, 43(3): 183–203.
Article Google Scholar
Pangos A, Iosif E, Potamianos A, Fosler-Lussier E. Combining statistical similarity measures for automatic induction of semantic classes. In Proc. 2005 IEEE Workshop on Automatic Speech Recognition and Understanding, San Juan, Puerto Rico, Nov. 27-Dec. 1, 2005, pp.278–283.
Iosif E, Tegos A, Pangos A, Fosler-Lussier E, Potamianos A. Unsupervised combination of metrics for semantic class induction. In Proc. Spoken Language Technology Workshop, Palm Beach, Aruba, Dec. 10-13, 2006, pp.86–89.
Iosif E, Potamianos A. A soft-clustering algorithm for automatic induction of semantic classes. In Proc. Interspeech 2007, Antwerp, Belgium, Aug. 27-31, 2007, pp.1609–1612.
Wang C, Chung G, Seneff S. Automatic induction of language model data for a spoken dialogue system. Language Resources and Evaluation, 2006, 40(1): 25–46.
Article Google Scholar
Lin D. An information-theoretic definition of similarity. In Proc. the 15th International Conference on Machine Learning, Madison, USA, July 24-27, 1998, pp.296–304.
Dagan I, Lee L, Pereira F. Similarity-based models of word cooccurrence probabilities. Machine Learning, 1999, 34(1–3): 43–69.
Article MATH Google Scholar
Weeds J, Weir D, McCarthy D. Characterising measures of lexical distributional similarity. In Proc. the 20th International Conference on Computer Linguistics, Switzerland, August 23-27, 2004, pp.1015–1021.
Cover T M, Thomas J A. Elements of Information Theory. Wiley-Interscience, 2006, pp.224–238.
Bellegarda J R. Statistical language model adaptation: Review and perspectives. Speech Communication, 2004, 42(1): 93–108.
Article Google Scholar
Hakkani-Tür D Z, Riccardi G, Tur G. An active approach to spoken language processing. ACM Transactions on Speech and Language Processing, 2006, 3(3): 1–31.
Article Google Scholar
Stolcke A. SRILM — An extensible language modeling toolkit. In Proc. ICSLP, Denver, USA, September 16-20, 2002, pp.901–904.

Download references

Author information

Authors and Affiliations

Key Laboratory of Speech Acoustics and Content Understanding, Chinese Academy of Sciences, Beijing, 100190, China
Ya-Li Li, Wei-Qun Xu & Yong-Hong Yan

Authors

Ya-Li Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Qun Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yong-Hong Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ya-Li Li.

Additional information

This work is partially supported by the National Natural Science Foundation of China under Grant Nos. 10925419, 90920302, 10874203, 60875014, 61072124, 11074275, 11161140319.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 78.7 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, YL., Xu, WQ. & Yan, YH. A Novel Similarity Measure to Induce Semantic Classes and Its Application for Language Model Adaptation in a Dialogue System. J. Comput. Sci. Technol. 27, 443–450 (2012). https://doi.org/10.1007/s11390-012-1233-0

Download citation

Received: 10 December 2010
Revised: 14 September 2011
Published: 05 March 2012
Issue Date: March 2012
DOI: https://doi.org/10.1007/s11390-012-1233-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Similarity Measure to Induce Semantic Classes and Its Application for Language Model Adaptation in a Dialogue System

Abstract

Access this article

Similar content being viewed by others

Concept Discovery and Automatic Semantic Annotation for Language Understanding in an Information-Query Dialogue System Using Latent Dirichlet Allocation and Segmental Methods

Constructing Language Models for Spoken Dialogue Systems from Keyword Set

On the Use of Phoneme Lattices in Spoken Language Understanding

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(PDF 78.7 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Novel Similarity Measure to Induce Semantic Classes and Its Application for Language Model Adaptation in a Dialogue System

Abstract

Access this article

Similar content being viewed by others

Concept Discovery and Automatic Semantic Annotation for Language Understanding in an Information-Query Dialogue System Using Latent Dirichlet Allocation and Segmental Methods

Constructing Language Models for Spoken Dialogue Systems from Keyword Set

On the Use of Phoneme Lattices in Spoken Language Understanding

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(PDF 78.7 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation