Advertisement

Soft Computing

, Volume 23, Issue 1, pp 115–128 | Cite as

Calibrating AdaBoost for phoneme classification

  • Gábor GosztolyaEmail author
  • Róbert Busa-Fekete
Methodologies and Application
  • 63 Downloads

Abstract

Phoneme classification is a classification sub-task of automatic speech recognition (ASR), which is essential in order to achieve good speech recognition accuracy. However, unlike most classification tasks, besides finding the correct class, providing good posterior scores is also an important requirement of it. Partly because of this, formerly Gaussian Mixture Models, while recently Artificial Neural Networks (ANNs) are used in this task, while other common machine learning methods like Support Vector Machines and AdaBoost.MH are applied only rarely. In a previous study, we showed that AdaBoost.MH can match the performance of ANNs in terms of classification accuracy, but lags behind it when utilizing its output in the speech recognition process. This is in part due to the imprecise posterior scores that AdaBoost.MH produces, which is a well-known weakness of this method. To improve the quality of posterior scores produced, it is common to perform some kind of posterior calibration. In this study, we test several posterior calibration techniques in order to improve the overall performance of AdaBoost.MH. We found that posterior calibration is a good way to improve ASR accuracy, especially when we integrate the speech recognition process into the calibration workflow.

Keywords

Speech recognition Phoneme classification Phoneme probability estimation Posterior calibration AdaBoost.MH 

Notes

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

References

  1. Ayer M, Brunk H, Ewing G, Reid W, Silverman E (1955) An empirical distribution function for sampling with incomplete information. Ann Math Stat 5(26):641–647MathSciNetCrossRefzbMATHGoogle Scholar
  2. Bartlett PL, Traskin M (2007) AdaBoost is consistent. J Mach Learn Res 8:2347–2368MathSciNetzbMATHGoogle Scholar
  3. Benbouzid D, Busa-Fekete R, Casagrande N, Collin FD, Kégl B (2012) MultiBoost: a multi-purpose boosting package. J Mach Learn Res 13:549–553zbMATHGoogle Scholar
  4. Bishop CM (1995) Neural networks for pattern recognition. Clarendon Press, OxfordzbMATHGoogle Scholar
  5. Bodnár P, Nyúl LG (2015) Improved QR code localization using boosted cascade of weak classifiers. Acta Cybern 22(1):21–33MathSciNetCrossRefzbMATHGoogle Scholar
  6. Busa-Fekete R, Kégl B (2009) Accelerating AdaBoost using UCB. In: KDDCup 2009 (JMLR W&CP), vol 7, pp 111–122, Paris, FranceGoogle Scholar
  7. Busa-Fekete R, Kégl B, Éltetö T, Szarvas G (2013) Tune and mix: learning to rank using ensembles of calibrated multi-class classifiers. Mach Learn 93(2–3):261–292MathSciNetCrossRefzbMATHGoogle Scholar
  8. Crammer K, Singer Y (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292zbMATHGoogle Scholar
  9. Drish J (2001) Obtaining calibrated probability estimates from support vector machines. Technical report, University of California, San Diego, CA, USAGoogle Scholar
  10. Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New YorkzbMATHGoogle Scholar
  11. Ensor KB, Glynn PW (1997) Stochastic optimization via grid search. In: Lectures in Applied Mathematics, vol 33. American Mathematical Society, pp 89–100Google Scholar
  12. Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28:337–374MathSciNetCrossRefzbMATHGoogle Scholar
  13. Gosztolya G (2014) Is AdaBoost competitive for phoneme classification? In: Proceedings of CINTI (IEEE), pp 61–66, Budapest, HungaryGoogle Scholar
  14. Gosztolya G (2015) On evaluation metrics for social signal detection. In: Proceedings of InterSpeech, pp 2504–2508, Dresden, GermanyGoogle Scholar
  15. Gosztolya G, Busa-Fekete R, Tóth L (2013) Detecting autism, emotions and social signals using AdaBoost. In: Proceedings of InterSpeech, pp 220–224, Lyon, FranceGoogle Scholar
  16. Gosztolya G, Beke A, Neuberger T, Tóth L (2016) Laughter classification using deep rectifier neural networks with a minimal feature subset. Arch Acoust 41(4):669–682CrossRefGoogle Scholar
  17. Gupta R, Audhkhasi K, Lee S, Narayanan SS (2013) Speech paralinguistic event detection using probabilistic time-series smoothing and masking. In: Proceedings of InterSpeech, pp 173–177Google Scholar
  18. Imseng D, Bourlard H, Magimai-Doss M, Dines J (2011) Language dependent universal phoneme posterior estimation for mixed language speech recognition. In: Proceedings of ICASSP, pp. 5012–5015, Prague, Czech RepublicGoogle Scholar
  19. Jelinek F (1997) Statistical methods for speech recognition. MIT Press, CambridgeGoogle Scholar
  20. Kaya H, Karpov AA, Salah AA (2015) Fisher Vectors with cascaded normalization for paralinguistic analysis. In: Proceedings of InterSpeech, pp 909–913Google Scholar
  21. Lamel L, Kassel R, Seneff S (1986) Speech database development: design and analysis of the acoustic-phonetic corpus. In: Proceedings of DARPA speech recognition workshop, pp 121–124Google Scholar
  22. Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10(8):707–710MathSciNetGoogle Scholar
  23. Manning C, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  24. Mease D, Wyner A, Buja A (2007) Boosted classification trees and class probability/quantile estimation. J Mach Learn Res 8:409–439zbMATHGoogle Scholar
  25. Morgan N, Bourland H (1995) An introduction to hybrid HMM/connectionist continuous speech recognition. Signal Process Mag 1025–1028, May 1995Google Scholar
  26. Neuberger T, Beke A (2013) Automatic laughter detection in spontaneous speech using GMM–SVM method. In: Proceedings of TSD, pp 113–120Google Scholar
  27. Niculescu-Mizil A, Caruana R (2005) Obtaining calibrated probabilities from boosting. In: Proceedings of 21st conference on uncertainty in artificial intelligence (UAI’05), pp 413–420Google Scholar
  28. Platt J (2000) Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Smola A, Bartlett P, Schoelkopf B, Schuurmans D (eds) Advances in large margin classifiers. MIT Press, Cambridge, pp 61–74Google Scholar
  29. Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice Hall, Englewood CliffsGoogle Scholar
  30. Robertson T, Wright F, Dykstra R (1988) Order restricted statistical inference. Wiley, New YorkzbMATHGoogle Scholar
  31. Schapire RE, Freund Y (2012) Boosting: foundations and algorithms. MIT Press, CambridgezbMATHGoogle Scholar
  32. Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336CrossRefzbMATHGoogle Scholar
  33. Schölkopf B, Platt JC, Shawe-Taylor JC, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471CrossRefzbMATHGoogle Scholar
  34. Tóth L, Kocsor A, Csirik J (2005) On naive Bayes in speech recognition. Int J Appl Math Compu Sci 15(2):287–294MathSciNetzbMATHGoogle Scholar
  35. Tóth S, Sztahó D, Vicsi K (2012) Speech emotion perception by human and machine. In: Proceedings of COST action, pp 213–224, Patras, GreeceGoogle Scholar
  36. van Leeuwen DA, Martin AF, Przybocki MA, Bouten JS (2006) NIST and NFI-TNO evaluations of automatic speaker recognition. Comput Speech Lang 20(2–3):128–158CrossRefGoogle Scholar
  37. Waegeman W, Dembczynski K, Jachnik A, Cheng W, Hüllermeier E (2014) On the Bayes-optimality of f-measure maximizers. J Mach Learn Res 15(1):3333–3388MathSciNetzbMATHGoogle Scholar
  38. Wu T, Lin C, Weng R (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5:975–1005MathSciNetzbMATHGoogle Scholar
  39. Young S, Evermann G, Gales MJF, Hain T, Kershaw D, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK book. Cambridge University, CambridgeGoogle Scholar
  40. Zadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: Proceedings of ICML, pp 609–616Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.MTA-SZTE Research Group on Artificial IntelligenceHungarian Academy of SciencesSzegedHungary
  2. 2.Yahoo ResearchNew YorkUSA

Personalised recommendations