Abstract
This article addresses error detection in broadcast news automatic transcription, as a post-processing stage. Based on the observation that many errors appear in bursts, we investigated the use of Markov Chains (MC) for their temporal modelling capabilities. Experiments were conducted on a large Amercian English broadcast news corpus from NIST. Common features in error detection were used, all decoder-based. MC classification performance was compared with a discriminative maximum entropy model (Maxent), currently used in our in-house decoder to estimate confidence measures, and also with Gaussian Mixture Models (GMM). The MC classifier obtained the best results, by detecting 16.2% of the errors, with the lowest classification error rate of 16.7%. To be compared with the GMM classifier, MC allowed to lower the number of false detections, by 23.5% relative. The Maxent system achieved the same CER, but detected only 7.2% of the errors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Gillick, L., Ito, Y., Young, J.: A probabilistic approach to confidence estimation and evaluation. In: Proceedings of ICASSP, Munich, pp. 879–882 (1997)
Allauzen, A.: Error detection in confusion. In: Proceedings of INTERSPEECH, Antwerp, pp. 1749–1752 (2007)
Weintraub, M., Beaufays, F., Rivlin, Z., Konig, Y., Stolcke, A.: Neural - Network Based Measures of Confidence for Word Recognition. In: Proceedings of ICASSP, Los Alamitos, pp. 887–890 (1997)
Xue, J., Zhao, Y.: Random forests-based confidence annotation using novel features from confusion network. In: Proceedings of ICASSP, Toulouse, pp. 1149–1152 (2006)
Hillard, D., Ostendorf, M.: Compensating for Word Posterior Estimation Bias in Confusion Networks. In: Proceedings of ICASSP, Toulouse, pp. 1153–1156 (2006)
Schwartz, R., Nguyen, L., Kubala, F., Chou, G., Zavaliagkos, G., Makhoul, J.: On Using Written Language Training Data for Spoken Language Modeling. In: Proceedings of ACL, New Jersey, pp. 94–97 (1994)
Ratnaparkhi, A.: A Maximum Entropy Model for Part-Of-Speech Tagging. In: Proceedings of EMLNP, Philadelphia, pp. 133–142 (1996)
Jaynes, E.T.: Information theory and statistical mechanics. Physical review 106(4), 620–630 (1957)
Rabiner, L.R., Juang, B.H.: An Introduction to Hidden Markov Models. IEEE Acoustics Speech and Signal Processing Magazine ASSP-3(1), 4–16 (1986)
Bishop, C.: Pattern recognition and machine learning. Springer, Heidelberg (2006)
Meinedo, H., Caseiro, D., Neto, J., Trancoso, I.: AUDIMUS.MEDIA: A broadcast news speech recognition system for the european portuguese language. In: Mamede, N.J., Baptista, J., Trancoso, I., Nunes, M.d.G.V. (eds.) PROPOR 2003. LNCS, vol. 2721, pp. 9–17. Springer, Heidelberg (2003)
Abad, A., Neto, J.: Incorporating Acoustical Modelling of Phone Transitions in a Hybrid ANN/HMM Speech Recognizer. In: Proceedings of INTERSPEECH, Brisbane, pp. 2394–2397 (2008)
Marujo, L., Lopes, J., Mamede, N., Trancoso, I., Pino, J., Eskenazi, M., Baptista, J., Viana, C.: Porting REAP to European Portuguese. In: SLATE 2009 - Speech and Language Technology in Education, Brighton (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pellegrini, T., Trancoso, I. (2011). Error Detection in Broadcast News ASR Using Markov Chains. In: Vetulani, Z. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2009. Lecture Notes in Computer Science(), vol 6562. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20095-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-20095-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20094-6
Online ISBN: 978-3-642-20095-3
eBook Packages: Computer ScienceComputer Science (R0)