Leveraging topical and positional cues for language modeling in speech recognition

Chiu, Hsuan-Sheng; Chen, Kuan-Yu; Chen, Berlin

doi:10.1007/s11042-013-1456-2

Leveraging topical and positional cues for language modeling in speech recognition

Published: 19 April 2013

Volume 72, pages 1465–1481, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hsuan-Sheng Chiu¹,
Kuan-Yu Chen¹ &
Berlin Chen¹

181 Accesses
2 Citations
Explore all metrics

Abstract

This paper investigates language modeling with topical and positional information for large vocabulary continuous speech recognition. We first compare among a few topic models both theoretically and empirically, including document topic models and word topic models. On the other hand, since for some spoken documents such as broadcast news stories, the composition and the word usage of documents of the same style are usually similar, the documents hence can be separated into partitions consisting of identical rhetoric or topic styles by the literary structures, like introductory remarks, elucidations of methodology or affairs, conclusions of the articles, references or footnotes of reporters, etc. We hence present two position-dependent language models for speech recognition by integrating word positional information into the exiting n-gram and topic models. The experiments conducted on broadcast news transcription seem to indicate that such position-dependent models obtain comparable results to the existing n-gram and topic models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Topic Modeling for Speech and Language Processing

Online LDA-Based Language Model Adaptation

Long-Distance Continuous Space Language Modeling for Speech Recognition

References

Aubert XL (2002) An overview of decoding techniques for large vocabulary continuous speech recognition. Comput Speech Lang 16:89–114
Article Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Bellegarda JR (1998) A multi-span language modeling framework for large vocabulary speech recognition. IEEE Trans Speech Audio Process 6(5):456–467
Article Google Scholar
Bellegarda JR (2004) Statistical language model adaptation: review and perspectives. Speech Comm 42(11):93–108
Article Google Scholar
Brown PF, deSouza P, Mercer RL, Pietra VJD, Lai JC (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479
Google Scholar
Chelba C, Jelinek F (2000) Structured language modeling. Comput Speech Lang 14(4):283–332
Article Google Scholar
Chen B (2009) Word topic models for spoken document retrieval and transcription. ACM Trans Asian Lang Inf Process 8(1):2:1–2:27
Google Scholar
Chen B, Kuo JW, Tsai WH (2004) Lightly supervised and data-driven approaches to mandarin broadcast news transcription. In: Proc. IEEE International Conference on Acoustics, Speech, Signal Processing (ICASSP 2004), pp 777–780
Chen B, Lin SH (2012) A risk-aware modeling framework for speech summarization. IEEE Trans Audio Speech Lang Process 20(1):199–210
Google Scholar
Chen B, Liu JW (2011) Discriminative language modeling for speech recognition with relevance information. In: Proc. IEEE International Conference on Multimedia & Expo (ICME 2011), pp 1–4
Chen B, Liu SH, Chu FH (2009) Training data selection for improving discriminative training of acoustic models. Pattern Recognit Lett 30(13):1228–1235
Article Google Scholar
Chen B, Wang HM, Lee LS (2002) Discriminating capabilities of syllable-based features and approaches of utilizing them for voice retrieval of speech information in Mandarin Chinese. IEEE Trans Speech Audio Process 10(5):303–314
Article Google Scholar
Chen KY, Chen B (2011) Relevance language modeling for speech recognition. In: Proc. IEEE International Conference on Acoustics, Speech, Signal Processing (ICASSP 2011), pp 5568–5571
Chen KY, Chiu HS, Chen B (2010) Latent topic modeling of word vicinity information for speech recognition. In: Proc. IEEE International Conference on Acoustics, Speech, Signal Processing (ICASSP 2010), pp 5394–5397
Chen YT, Chen B, Wang HM (2009) A probabilistic generative framework for extractive broadcast news speech summarization. IEEE Trans Audio Speech Lang Process 17(1):95–106
Article Google Scholar
Chiu HS, Chen GY, Lee CJ, Chen B (2008) Position information for language modeling in speech recognition, In: Proc. 6th International Symposium on Chinese Spoken Language Processing (ISCSLP 2008), pp 101–104
Clarkson PR, Robinson AJ (1997) Language model adaptation using mixtures and an exponentially decaying cache. In: Proc. IEEE International Conference on Acoustics, Speech, Signal Processing (ICASSP 1997), pp 799–802
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38
MATH MathSciNet Google Scholar
Gildea D, Hofmann T (1999) Topic-based language models using EM. In: Proc. European Conference on Speech Communication and Technology (Eurospeech 1999), pp 2167–2170
Good IJ (1953) The population frequencies of species and estimation of population parameters. Biometrika 40(3–4):237–264
Article MATH MathSciNet Google Scholar
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42:177–196
Article MATH Google Scholar
Kneser R, Ney H (1995) Improved backing-off for m-gram language modeling. In: Proc. IEEE International Conference on Acoustics, Speech, Signal Processing (ICASSP 1995), vol. I, pp 181–184
Koshinaka T, Iso K, Okumura A (2005) An HMM-based text segmentation method using variational Bayes approach and its application to LVCSR for broadcast news. In: Proc. IEEE International Conference on Acoustics, Speech, Signal Processing (ICASSP 2005), pp 485–488
Lau R, Rosenfeld R, Roukos S (1993) Trigger-based language models: a maximum entropy approach. Proc IEEE Int Conf Acoust Speech Signal Process 2:45–48
Article Google Scholar
Lee HS, Chen B (2009) Generalized likelihood ratio discriminant analysis. In: Proc. IEEE workshop on Automatic Speech Recognition and Understanding (ASRU 2009), pp 158–163
Roark B, Saraclar M, Collins M (2007) Discriminative n-gram language modeling. Comput Speech Lang 21:373–392
Article Google Scholar
Rosenfeld R (2000) Two decades of statistical language modeling: where do we go from here. Proc IEEE 88(8):1270–1278
Article Google Scholar
Ortmanns S, Ney H, Aubert X (1997) A word graph algorithm for large vocabulary continuous speech recognition. Comput Speech Lang 11:43–72
Article Google Scholar
Ostendorf M (2008) Speech technology and information access. IEEE Signal Process Mag 25(3):150–152
Google Scholar
Pallett D, Fisher W, Fiscus J (1990) Tools for the analysis of benchmark speech recognition tests. In: Proc. IEEE International Conference on Acoustics, Speech, Signal Processing, pp 97–100
Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In: Proc. the ACM International Conference on Research and Development in Information Retrieval (SIGIR 1998), pp 275–281
Saul L, Pereira F, (1997) Aggregate and mixed-order Markov models for statistical language processing In: Proc. Empirical Methods on Natural Language Processing (EMNLP 1997), pp 81–89
Stolcke A (2000) SRI language modeling toolkit. Version 1.3.3. http://www.speech.sri.com/projects/srilm/
Tur G, Mori RD (eds) (2011) Spoken language understanding—systems for extracting semantic information from speech. John Wiley and Sons, New York, NY
Wang HM, Chen B, Kuo JW, Cheng SS (2005) MATBN: a Mandarin Chinese broadcast news corpus. Int J Comput Linguist Chin Lang Process 10(1):219–235
MATH Google Scholar
Zhai CX (2008) Statistical language models for information retrieval. Morgan & Claypool Publishers, United States

Download references

Acknowledgments

This work was sponsored in part by “Aim for the Top University Plan” of National Taiwan Normal University and Ministry of Education, Taiwan, and the National Science Council, Taiwan, under Grants NSC 101-2221-E-003-024-MY3, NSC 101-2511-S-003-057-MY3, NSC 101-2511-S-003-047-MY3, NSC 99-2221-E-003-017-MY3, and NSC 98-2221-E-003-011-MY3.

Author information

Authors and Affiliations

Department of Computer Science & Information Engineering, National Taiwan Normal University, Taipei, Taiwan
Hsuan-Sheng Chiu, Kuan-Yu Chen & Berlin Chen

Authors

Hsuan-Sheng Chiu
View author publications
You can also search for this author in PubMed Google Scholar
Kuan-Yu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Berlin Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Berlin Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chiu, HS., Chen, KY. & Chen, B. Leveraging topical and positional cues for language modeling in speech recognition. Multimed Tools Appl 72, 1465–1481 (2014). https://doi.org/10.1007/s11042-013-1456-2

Download citation

Published: 19 April 2013
Issue Date: September 2014
DOI: https://doi.org/10.1007/s11042-013-1456-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Leveraging topical and positional cues for language modeling in speech recognition

Abstract

Access this article

Similar content being viewed by others

Topic Modeling for Speech and Language Processing

Online LDA-Based Language Model Adaptation

Long-Distance Continuous Space Language Modeling for Speech Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Leveraging topical and positional cues for language modeling in speech recognition

Abstract

Access this article

Similar content being viewed by others

Topic Modeling for Speech and Language Processing

Online LDA-Based Language Model Adaptation

Long-Distance Continuous Space Language Modeling for Speech Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation