Abstract
An inversion of the speech polarity may have a dramatic detrimental effect on the performance of various techniques of speech processing. An automatic method for determining the speech polarity (which is dependent upon the recording setup) is thus required as a preliminary step for ensuring the well-behaviour of such techniques. This paper proposes a new approach of polarity detection relying on oscillating statistical moments. These moments have the property to oscillate at the local fundamental frequency and to exhibit a phase shift which depends on the speech polarity. This dependency stems from the introduction of non-linearity or higher-order statistics in the moment calculation. The resulting method is shown on 10 speech corpora to provide a substantial improvement compared to state-of-the-art techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Drugman, T., Thomas, M., Gudnason, J., Naylor, P., Dutoit, T.: Detection of Glottal Closure Instants from Speech Signals: a Quantitative Review. IEEE Trans. on Audio, Speech and Language Processing (to appear)
Fant, G., Liljencrants, J., Lin, Q.: A four parameter model of glottal flow, STL-QPSR4, pp. 1–13 (1985)
Sakaguchi, S., Arai, T., Murahara, Y.: The Effect of Polarity Inversion of Speech on Human Perception and Data Hiding as Application. In: ICASSP, vol. 2, pp. 917–920 (2000)
Hunt, A., Black, A.: Unit selection in a concatenative speech synthesis system using a large speech database. In: ICASSP, pp. 373–376 (1996)
Moulines, E., Laroche, J.: Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Communication 16, 175–205 (1995)
Drugman, T., Bozkurt, B., Dutoit, T.: A comparative study of glottal source estimation techniques. Computer Speech and Language 26, 20–34 (2012)
Ding, W., Campbell, N.: Determining Polarity of Speech Signals Based on Gradient of Spurious Glottal Waveforms. In: ICASSP, pp. 857–860 (1998)
Alku, P., Svec, J., Vilkman, E., Sram, F.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication 11(2-3), 109–118 (1992)
Saratxaga, I., Erro, D., Hernáez, I., Sainz, I., Navas, E.: Use of harmonic phase information for polarity detection in speech signals. In: Interspeech, pp. 1075–1078 (2009)
Kominek, J., Black, A.: The CMU Arctic Speech Databases. In: SSW5, pp. 223–224 (2004)
Burkhardt, F., Paseschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A Database of German Emotional Speech. In: Interspeech, pp. 1517–1520 (2005)
Bagshaw, P., Hiller, S., Jack, M.: Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching. In: Eurospeech, pp. 1003–1006 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Drugman, T., Dutoit, T. (2011). Oscillating Statistical Moments for Speech Polarity Detection. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds) Advances in Nonlinear Speech Processing. NOLISP 2011. Lecture Notes in Computer Science(), vol 7015. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25020-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-25020-0_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25019-4
Online ISBN: 978-3-642-25020-0
eBook Packages: Computer ScienceComputer Science (R0)