Skip to main content

Binaural Speech Separation Using Recurrent Timing Neural Networks for Joint F0-Localisation Estimation

  • Conference paper
Machine Learning for Multimodal Interaction (MLMI 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4892))

Included in the following conference series:

Abstract

A speech separation system is described in which sources are represented in a joint interaural time difference-fundamental frequency (ITD-F0) cue space. Traditionally, recurrent timing neural networks (RTNNs) have been used only to extract periodicity information; in this study, this type of network is extended in two ways. Firstly, a coincidence detector layer is introduced, each node of which is tuned to a particular ITD; secondly, the RTNN is extended to become two-dimensional to allow periodicity analysis to be performed at each best-ITD. Thus, one axis of the RTNN represents F0 and the other ITD allowing sources to be segregated on the basis of their separation in ITD-F0 space. Source segregation is performed within individual frequency channels without recourse to across-channel estimates of F0 or ITD that are commonly used in auditory scene analysis approaches. The system is evaluated on spatialised speech signals using energy-based metrics and automatic speech recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bregman, A.S.: Auditory Scene Analysis. The Perceptual Organization of Sound. MIT Press, Cambridge (1990)

    Google Scholar 

  2. Wang, D., Brown, G.J. (eds.): Computational Auditory Scene Analysis: Principles, Algorithms and Applications. IEEE Press / Wiley-Interscience (2006)

    Google Scholar 

  3. Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commun. 34(3), 267–285 (2001)

    Article  MATH  Google Scholar 

  4. Brokx, J.P.L., Nooteboom, S.G.: Intonation and the perceptual separation of simultaneous voices. J. Phonetics 10, 23–36 (1982)

    Google Scholar 

  5. Scheffers, M.T.M.: Sifting Vowels: Auditory Pitch Analysis and Sound Segregation. PhD thesis, Groningen University, The Netherlands (1983)

    Google Scholar 

  6. Bird, J., Darwin, C.J.: Effects of a difference in fundamental frequency in separating two sentences. In: Palmer, A.R., Rees, A., Summerfield, A.Q., Meddis, R. (eds.) Psychophysical and physiological advances in hearing, Whurr, pp. 263–269 (1997)

    Google Scholar 

  7. Blauert, J.: Spatial Hearing — The Psychophysics of Human Sound Localization. MIT Press, Cambridge (1997)

    Google Scholar 

  8. Lyon, R.F.: A computational model of binaural localization and separation. In: Proc. Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 1148–1151 (1983)

    Google Scholar 

  9. Roman, N., Wang, D., Brown, G.J.: Speech segregation based on sound localization. J. Acoust. Soc. Am. 114, 2236–2252 (2003)

    Article  Google Scholar 

  10. Edmonds, B.A., Culling, J.F.: The spatial unmasking of speech: Evidence for within-channel processing of interaural time delay. J. Acoust. Soc. Am. 117, 3069–3078 (2005)

    Article  Google Scholar 

  11. Cariani, P.A.: Neural timing nets. Neural Networks 14, 737–753 (2001)

    Article  Google Scholar 

  12. Cariani, P.A.: Recurrent timing nets for auditory scene analysis. In: Proc. Intl. Conf. on Neural Networks (IJCNN) (2003)

    Google Scholar 

  13. Jeffress, L.A.: A place theory of sound localization. J. Comp. Physiol. Psychol. 41, 35–39 (1948)

    Article  Google Scholar 

  14. Patterson, R.D., Nimmo-Smith, I., Holdsworth, J., Rice, P.: An efficient auditory filterbank based on the gammatone function. Technical Report 2341, Applied Psychology Unit, University of Cambridge, UK (1988)

    Google Scholar 

  15. Glasberg, B.R., Moore, B.C.J.: Derivation of auditory filter shapes from notched-noise data. Hearing Res. 47, 103–138 (1990)

    Article  Google Scholar 

  16. Leonard, R.G.: A database for speaker-independent digit recognition. In: Proc. Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP). vol. 3 (1984)

    Google Scholar 

  17. Gardner, W.G., Martin, K.D.: HRTF measurements of a KEMAR. J. Acoust. Soc. Am. 97(6), 3907–3908 (1995)

    Article  Google Scholar 

  18. Hu, G., Wang, D.: Monaural speech segregation based on pitch tracking and amplitude modulation. Neural Networks 15(5), 1135–1150 (2004)

    Article  Google Scholar 

  19. Cooke, M.P.: Modelling auditory processing and organisation. Cambridge University Press, Cambridge (1991/1993)

    Google Scholar 

  20. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.3). Cambridge University Engineering Department (2005)

    Google Scholar 

  21. Murata, N., Ikeda, S., Ziehe, A.: An approach to blind source separation based on temporal structure of speech signals. Neurocomputing 41, 1–24 (2001)

    Article  MATH  Google Scholar 

  22. Wrigley, S.N., Brown, G.J.: A computational model of auditory selective attention. IEEE Trans. Neural Networks 15(5), 1151–1163 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Andrei Popescu-Belis Steve Renals Hervé Bourlard

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wrigley, S.N., Brown, G.J. (2008). Binaural Speech Separation Using Recurrent Timing Neural Networks for Joint F0-Localisation Estimation. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2007. Lecture Notes in Computer Science, vol 4892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78155-4_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78155-4_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78154-7

  • Online ISBN: 978-3-540-78155-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics