Skip to main content
Log in

A Novel Similarity Measure to Induce Semantic Classes and Its Application for Language Model Adaptation in a Dialogue System

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

In this paper, we propose a novel co-occurrence probabilities based similarity measure for inducing semantic classes. Clustering with the new similarity measure outperforms the widely used distance based on Kullback-Leibler divergence in precision, recall and F1 evaluation. In our experiments, we induced semantic classes from unannotated in-domain corpus and then used the induced classes and structures to generate large in-domain corpus which was then used for language model adaptation. Character recognition rate was improved from 85.2% to 91%. We imply a new measure to solve the lack of domain data problem by first induction then generation for a dialogue system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Gorin A L. On automated language acquisition. Acoustical Society of America Journal, 1995, 97(6): 3441–3461.

    Article  Google Scholar 

  2. Arai K, Wright J H, Riccardi G, Gorin A L. Grammar fragment acquisition using syntactic and semantic clustering. Speech Communication, 1999, 27(1): 43–62.

    Article  Google Scholar 

  3. Meng H M, Siu K C. Semiautomatic acquisition of semantic structures for understanding domain-specific natural language queries. IEEE Trans. Knowl. Data Eng., 2002, 14(1): 172–181.

    Article  Google Scholar 

  4. Pargellis A N, Fosler-Lussier E, Lee C H, Potamianos A, Tsai A. Auto-induced semantic classes. Speech Communication, 2004, 43(3): 183–203.

    Article  Google Scholar 

  5. Pangos A, Iosif E, Potamianos A, Fosler-Lussier E. Combining statistical similarity measures for automatic induction of semantic classes. In Proc. 2005 IEEE Workshop on Automatic Speech Recognition and Understanding, San Juan, Puerto Rico, Nov. 27-Dec. 1, 2005, pp.278–283.

  6. Iosif E, Tegos A, Pangos A, Fosler-Lussier E, Potamianos A. Unsupervised combination of metrics for semantic class induction. In Proc. Spoken Language Technology Workshop, Palm Beach, Aruba, Dec. 10-13, 2006, pp.86–89.

  7. Iosif E, Potamianos A. A soft-clustering algorithm for automatic induction of semantic classes. In Proc. Interspeech 2007, Antwerp, Belgium, Aug. 27-31, 2007, pp.1609–1612.

  8. Wang C, Chung G, Seneff S. Automatic induction of language model data for a spoken dialogue system. Language Resources and Evaluation, 2006, 40(1): 25–46.

    Article  Google Scholar 

  9. Lin D. An information-theoretic definition of similarity. In Proc. the 15th International Conference on Machine Learning, Madison, USA, July 24-27, 1998, pp.296–304.

  10. Dagan I, Lee L, Pereira F. Similarity-based models of word cooccurrence probabilities. Machine Learning, 1999, 34(1–3): 43–69.

    Article  MATH  Google Scholar 

  11. Weeds J, Weir D, McCarthy D. Characterising measures of lexical distributional similarity. In Proc. the 20th International Conference on Computer Linguistics, Switzerland, August 23-27, 2004, pp.1015–1021.

  12. Cover T M, Thomas J A. Elements of Information Theory. Wiley-Interscience, 2006, pp.224–238.

  13. Bellegarda J R. Statistical language model adaptation: Review and perspectives. Speech Communication, 2004, 42(1): 93–108.

    Article  Google Scholar 

  14. Hakkani-Tür D Z, Riccardi G, Tur G. An active approach to spoken language processing. ACM Transactions on Speech and Language Processing, 2006, 3(3): 1–31.

    Article  Google Scholar 

  15. Stolcke A. SRILM — An extensible language modeling toolkit. In Proc. ICSLP, Denver, USA, September 16-20, 2002, pp.901–904.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ya-Li Li.

Additional information

This work is partially supported by the National Natural Science Foundation of China under Grant Nos. 10925419, 90920302, 10874203, 60875014, 61072124, 11074275, 11161140319.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 78.7 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, YL., Xu, WQ. & Yan, YH. A Novel Similarity Measure to Induce Semantic Classes and Its Application for Language Model Adaptation in a Dialogue System. J. Comput. Sci. Technol. 27, 443–450 (2012). https://doi.org/10.1007/s11390-012-1233-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-012-1233-0

Keywords

Navigation