Abstract
An important aspect of Ambient Intelligence is a convenient user interface, supporting several user-friendly input modalities. Speech is one of the most natural modalities for man-machine interaction. Numerous applications in the context of Ambient Intelligence — whether referring to a single input modality or combining different ones — involve some pattern classification task. Experience shows that for building successful and reliable real life applications, advanced classification algorithms are needed providing maximal accuracy for the underlying task. In this chapter, we investigate whether a generic machine learning technique, the boosting algorithm, can successfully be applied to increase the accuracy in a ‘large-scale’ classification problem, namely large vocabulary automatic speech recognition. Specifically, we outline an approach to implement the AdaBoost.M2 algorithm for training of acoustic models in a state-of-the-art automatic speech recognizer. Detailed evaluations in a large vocabulary name recognition task show that this ‘utterance approach’ improves the best test error rates obtained with standard training paradigms. In particular, we obtain additive performance gains when combining boosting with discriminative training, one of the most powerful training algorithms in speech recognition. Our findings motivate further applications of boosting in other classification tasks relevant for Ambient Intelligence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abney, S., R.E. Schapire, and Y. Singer [1999]. Boosting applied to tagging and PP attachment. In Proc. of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, Maryland, pages 38–45.
Bahl, L.R., P.F. Brown, P.V. de Souza, and R.L. Mercer [1986]. Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In Proc. Intern. Conference on Acoustics, Speech and Signal Processing (ICASSP-86), Tokyo, Japan, pages 49–52.
Beyerlein, P. [1998]. Discriminative model combination. In Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP-98), Seattle, WA, pages 481–484.
Cook, G.D., and A.J. Robinson [1996]. Boosting the performance of connectionist large vocabulary speech Recognition. In Proc. International Conference on Spoken Language Processing (ICSLP-96), Philadelphia, PA, pages 1305–1308.
Davis, S.B., and P. Mermelstein [ 1980 ]. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. ASSP, 28: 357–366.
Escudero, G., L. Marquez, and G. Rigau [ 2000 ]. Boosting applied to word sense disambiguation. In Proc. 12th European Conf on Machine Learning, pages 129–141.
Freund, Y., and R.E. Schapire [ 1997 ]. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55: 119–139.
Freund, Y., R. Iyer, R.E. Schapire, and Y. Singer [1998]. An efficient boosting algorithm for combining preferences. In Machine Learning: Proc. 15th International Conference (ICML- 98).
Henderson, J.C., and E. Brill [2000]. Bagging and boosting a treebank parser. In Proc. of the First Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2000), Seattle, WA, pages 34–41.
Iyer, R.D., D.D. Levis, R.E. Schapire, Y. Singer, and A. Singhal [2000]. Boosting for document routing. In Proc. 9th International Conference on Information and Knowledge Management.
Juang, B.H., and S. Katagiri [ 1992 ]. Discriminative learning for minimum error classification. IEEE Transactions on Signal Processing, 40: 3043–3054.
Mason, L., P. Bartlett, and M. Golea [ 1997 ]. Generalization error of combined classifiers. Technical report, Department of Systems Engineering, Australian National University.
Meyer, C., and G. Rose [2000]. Rival training: Efficient use of data in discriminative training. In Proc. International Conf. on Spoken Language Processing (ICSLP-00), Beijing, China, pages 632–635.
Meyer, C. [2002]. Utterance-level boosting of HMM speech recognizers. In Proc. International Conf. on Acoustics, Speech and Signal Processing (ICASSP-02), Orlando, FL, pages 109–112.
Meyer, C., and P. Beyerlein [2002]. Towards “large margin” speech recognizers by boosting and discriminative training. In Machine Learning: Proc. of the Nineteenth International Conference (ICML-02), Sydney, Australia, pages 419–426.
Odell, J.J. [1995]. The Use of Context in Large Vocabulary Speech Recognition. Ph.D. thesis, University of Cambridge 1995, England.
Rochery, M., R. Schapire, M. Rahim, N. Gupta, G. Riccardi, S. Bangalore, H. Alshawi, and S. Douglas [2002]. Combining prior knowledge and boosting for call classification in spoken language dialogue. In Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP-02), Orlando, FL, pages 29–32.
Ruber, B. [1997]. Obtaining confidence measures from sentence probabilities. In Proc. EU- ROSPEECH, Rhodes, Greece, pages 739–742.
Schapire, R.E. [ 1990 ]. The strength of weak learnability. Machine Learning, 5: 197–227.
Schapire, R.E., Y. Freund, P. Bartlett, and W.S. Lee [ 1998a ]. Boosting the margin: A new explanation of the effectiveness of voting methods. The Annals of Statistics, 26: 1651–1686.
Schapire, R.E., Y. Singer, and A. Singhal [1998b]. Boosting and Rocchio applied to text filtering. In Proc. 21st Annual Int. Conf. on Research and Development in Information Retrieval.
Schapire, R.E. [1999]. Theoretical views of boosting and applications. In Proc. 10th International Conference on Algorithmic Learning Theory, Tokyo, Japan.
Schapire, R.E., and Y. Singer [ 2000 ]. BoosTexter: A boosting-based system for text categorization. Machine Learning, 39: 135–168.
Schapire, R.E., M. Rochery, M. Rahim, and N. Gupta [2002]. Incorporating prior knowledge into boosting. In Machine Learning: Proc. of the Nineteenth International Conference (ICML-02), Sydney, Australia, pages 538–545.
Schwenk, H. [1999]. Using boosting to improve a hybrid HMM/neural network speech recognizer. In Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP-99), Phoenix, AZ, pages 1009–1012.
Tieu, K., and P. Viola [2000]. Boosting image retrieval. In Proc. of the IEEE conference on Computer Vision and Pattern Recognition.
Zweig, G., and M. Padmanabhan [2000]. Boosting Gaussian mixtures in an LVCSR system. In Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP-00), Istanbul, Turkey, pages 1527–1530.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Meyer, C., Beyerlein, P. (2004). Machine Learning for Ambient Intelligence: Boosting in Automatic Speech Recognition. In: Verhaegh, W.F.J., Aarts, E., Korst, J. (eds) Algorithms in Ambient Intelligence. Philips Research, vol 2. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-0703-9_9
Download citation
DOI: https://doi.org/10.1007/978-94-017-0703-9_9
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-6490-5
Online ISBN: 978-94-017-0703-9
eBook Packages: Springer Book Archive