Abstract
In the present work we investigate the performance of a number of traditional and recent speech enhancement algorithms in the adverse non-stationary conditions, which are distinctive for motorcycle on the move. The performance of these algorithms is ranked in terms of the improvement they contribute to the speech recognition rate, when compared to the baseline result, i.e. without speech enhancement. The experimentations on the MoveOn motorcycle speech and noise database suggested that there is no equivalence between the ranking of algorithms based on the human perception of speech quality and the speech recognition performance. The Multi-band spectral subtraction method was observed to lead to the highest speech recognition performance.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Gartner, U., Konig, W., Wittig, T. (2001). Evaluation of Manual vs. Speech input when using a driver information system in real traffic. Driving Assessment 2001: 1st International Driving Symposium on Human Factors in Driver Assessment, Training and Ve-chicle Design, pp. 7–13, CO.
Berton, A., Buhler, D., Minker, W. (2006). SmartKom-Mobile Car: User Interaction with Mobile Services in a Car Environment. In SmartKom: Foundations of Multimodal Dialogue Systems, Wolfgang Wahlster (Ed.). pp. 523–537, Springer.
Bohus, D., Rudnicky, A.I. (2003). RavenClaw: Dialog Management Using Hierarchical Task Decomposition and an Expectation Agenda. Proceedings European Conference on Speech Communication and Technology (EUROSPEECH):597–600.
Bohus, D., Raux, A., Harris, T.K., Eskenazi, M., Rudnicky, A.I. (2007). Olympus: an open-source framework for conversational spoken Language interface research, Bridging the Gap: Academic and Industrial Research in Dialog Technology workshop at HLT/NAACL 2007.
Berouti, M., Schwartz, R., Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Proceedings IEEE ICASSP′79:208–211.
Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing 9(5):504–512.
Kamath, S., Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. Proceedings ICASSP—02.
Ephraim, Y., Malah, D. (1985). Speech enhancement using a minimum mean square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, Signal Processing 33:443–445.
Loizou, P. (2005). Speech enhancement based on perceptually motivated Bayesian estimators of the speech magnitude spectrum. IEEE Transactions on Speech and Audio Processing 13(5):857–869.
Hu,Y., Loizou, P. (2003). A generalized subspace approach for enhancing speech corrupted by coloured noise. IEEE Transactions on Speech and Audio Processing 11:334–341.
Jabloun, F., Champagne, B. (2003). Incorporating the human hearing properties in the signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 11(6):700–708.
Hu, Y., Loizou, P. (2004). Speech enhancement based on wavelet thresholding the multi-taper spectrum. IEEE Transactions on Speech and Audio Processing 12(1):59–67.
Winkler, T., Kostoulas, T., Adderley, R., Bonkowski, C., Ganchev, T., Kohler, J., Fako-takis N. (2008). The MoveOn Motorcycle Speech Corpus. Proceedings of LREC′2008.
Lee, A., Kawahara, T., Shikano, K. (2001). Julius an open source real-time large vocabulary recognition engine. Proceedings European Conference on Speech Communication and Technology (EUROSPEECH):1691–1694.
Hoge, H., Draxler, C., Van den Heuvel, H., Johansen, F.T., Sanders, E., Tropf, H.S. (1999). SpeechDat Multilingual Speech Databases for Teleservices: Across the Finish Line. Proceedings 6th European Conference on Speech Communication and Technology (EUROSPEECH):2699–2702.
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Moore, G., Odell, J., Ol-lason, D., Povey, D., Valtchev, V., Woodland, P. (2005). The HTK Book (for HTK Version 3.3). Cambridge University.
Davis, S.B., Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing 28(4):357–366.
Baum, L.E., Petrie, T., Soules, G., Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics 41(1):164–171.
Clarkson, P.R., Rosenfeld, R. (1997). Statistical Language Modeling Using the CMU-Cambridge Toolkit. Proceedings 5th European Conference on Speech Communication and Technology (EUROSPEECH): 2707–2710.
Winkler, T., Ganchev, T., Kostoulas,T., Mporas, I., Lazaridis, A., Ntalampiras, S., Badii, A., Adderley, R., Bonkowski, C. (2007). MoveOn Deliverable D.5: Report on Audio databases, Noise processing environment, ASR and TTS modules.
Ntalampiras, S., Ganchev, T., Potamitis, I., Fakotakis, N. (2008). Objective comparison of speech enhancement algorithms under real world conditions. Proceedings PETRA 2008:34.
Loizou P. (2007). Speech Enhancement: Theory and Practice, CRC Press, 2007.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 IFIP International Federation for Information Processing
About this paper
Cite this paper
Mporas, I., Ganchev, T., Kocsis, O., Fakotakis, N. (2009). Performance Evaluation of a Speech Interface for Motorcycle Environment. In: Iliadis, Maglogiann, Tsoumakasis, Vlahavas, Bramer (eds) Artificial Intelligence Applications and Innovations III. AIAI 2009. IFIP International Federation for Information Processing, vol 296. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0221-4_31
Download citation
DOI: https://doi.org/10.1007/978-1-4419-0221-4_31
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-0220-7
Online ISBN: 978-1-4419-0221-4
eBook Packages: Computer ScienceComputer Science (R0)