Abstract
Over the past few years, online multimedia educational repositories have increased in number and popularity. The main aim of the transLectures project is to develop cost-effective solutions for producing accurate transcriptions and translations for large video lecture repositories, such as VideoLectures.NET or the Universitat Politècnica de València’s repository, poliMedia. In this paper, we present the transLectures-UPV toolkit (TLK), which has been specifically designed to meet the requirements of the transLectures project, but can also be used as a conventional ASR toolkit. The main features of the current release include HMM training and decoding with speaker adaptation techniques (fCMLLR). TLK has been tested on the VideoLectures.NET and poliMedia repositories, yielding very competitive results. TLK has been released under the permissive open source Apache License v2.0 and can be directly downloaded from the transLectures website.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Final report on massive adaptation (M36). To be delivered on October 2014 (2014)
First report on massive adaptation (M12), https://www.translectures.eu/wp-content/uploads/2013/05/transLectures-D3.1.1-18Nov2012.pdf
Opencast Matterhorn, http://opencast.org/matterhorn/
sclite - Score speech recognition system output, http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm
Second report on massive adaptation (M24), https://www.translectures.eu//wp-content/uploads/2014/01/transLectures-D3.1.2-15Nov2013.pdf
TLK: The transLectures-UPV Toolkit, https://www.translectures.eu/tlk/
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains. The Annals of Mathematical Statistics 41(1), 164–171 (1970)
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 20(1), 30–42 (2012)
Digalakis, V., Rtischev, D., Neumeyer, L., Sa, E.: Speaker Adaptation Using Constrained Estimation of Gaussian Mixtures. IEEE Transactions on Speech and Audio Processing 3, 357–366 (1995)
Huang, J.T., Li, J., Yu, D., Deng, L., Gong, Y.: Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: Proc. of ICASSP (2013)
Munteanu, C., Baecker, R., Penn, G., Toms, E., James, D.: The Effect of Speech Recognition Accuracy Rates on the Usefulness and Usability of Webcast Archives. In: Proc. of CHI, pp. 493–502 (2006)
Ney, H., Ortmanns, S.: Progress in dynamic programming search for LVCSR. Proceedings of the IEEE 88(8), 1224–1240 (2000)
Ortmanns, S., Ney, H., Eiden, A.: Language-model look-ahead for large vocabulary speech recognition. In: Proc. of ICSLP, vol. 4, pp. 2095–2098 (1996)
Ortmanns, S., Ney, H., Aubert, X.: A word graph algorithm for large vocabulary continuous speech recognition. Computer Speech and Language 11(1), 43–72 (1997)
Povey, D., et al.: The Kaldi Speech Recognition Toolkit. In: Proc. of ASRU (2011)
Rumelhart, D., Hintont, G., Williams, R.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Rybach, D., et al.: The RWTH Aachen University Open Source Speech Recognition System. In: Proc. Interspeech, pp. 2111–2114 (2009)
Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription. In: Proc. of ASRU, pp. 24–29 (2011)
Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13(2), 260–269 (1967)
Young, S., et al.: The HTK Book. Cambridge University Engineering Department (1995)
Young, S.J., Odell, J.J., Woodland, P.C.: Tree-based state tying for high accuracy acoustic modelling. In: Proc. of HLT, pp. 307–312 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
del-Agua, M.A. et al. (2014). The Translectures-UPV Toolkit. In: Navarro Mesa, J.L., et al. Advances in Speech and Language Technologies for Iberian Languages. Lecture Notes in Computer Science(), vol 8854. Springer, Cham. https://doi.org/10.1007/978-3-319-13623-3_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-13623-3_28
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13622-6
Online ISBN: 978-3-319-13623-3
eBook Packages: Computer ScienceComputer Science (R0)