Skip to main content

Spoken Language Processing: Time to Look Outside?

  • Conference paper
  • First Online:
Book cover Statistical Language and Speech Processing (SLSP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8791))

Included in the following conference series:

Abstract

Over the past thirty years, the field of spoken language processing has made impressive progress from simple laboratory demonstrations to mainstream consumer products. However, commercial applications such as Siri highlight the fact that there is still some way to go in creating Autonomous Social Agents that are truly capable of conversing effectively with their human counterparts in real-world situations. This paper suggests that it may be time for the spoken language processing community to take an interest in the potentially important developments that are occurring in related fields such as cognitive neuroscience, intelligent systems and developmental robotics. It then gives an insight into how such ideas might be integrated into a novel Mutual Beliefs Desires Intentions Actions and Consequences (MBDIAC) framework that places a focus on generative models of communicative behaviour which are recruited for interpreting the behaviour of others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR, Upper Saddle River (2001)

    Google Scholar 

  2. Gales, M., Young, S.: The application of hidden Markov models in speech recognition. Found. Trends Sig. Process. 1(3), 195–304 (2007)

    Article  MATH  Google Scholar 

  3. Pieraccini, R.: The Voice Mach. MIT Press, Cambridge (2012)

    Google Scholar 

  4. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  5. Moore, R.K.: Modelling data entry rates for ASR and alternative input methods. In: INTERSPEECH 2004 ICSLP, Jeju, Korea (2004)

    Google Scholar 

  6. Moore, R.K.: Spoken language processing: where do we go from here? In: Trappl, R. (ed.) Your Virtual Butler. LNCS, vol. 7407, pp. 119–133. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  7. Dawkins, R.: The Blind Watchmaker. Penguin Books, London (1991)

    Google Scholar 

  8. Gopnik, A., Meltzoff, A.N., Kuhl, P.K.: The Scientist in the Crib. Perennial, New York (2001)

    Google Scholar 

  9. Moore, R.K.: Towards a unified theory of spoken language processing. In: 4th IEEE International Conference on Cognitive Informatics, Irvine, CA (2005)

    Google Scholar 

  10. Maturana, H.R., Varela, F.J.: The Tree of Knowledge: The Biological Roots of Human Understanding. New Science Library/Shambhala Publications, Boston (1987)

    Google Scholar 

  11. Garrod, S., Pickering, M.J.: Why is conversation so easy? Trends Cogn. Sci. 8, 8–11 (2004)

    Article  Google Scholar 

  12. Fusaroli, R., Raczaszek-Leonardi, J., Tyln, K.: Dialog as interpersonal synergy. New Ideas Psychol. 32, 147–157 (2014)

    Article  Google Scholar 

  13. Gibson, J.J.: The theory of affordances. In: Shaw, R., Bransford, J. (eds.) Perceiving, Acting, and Knowing: Toward an Ecological Psychology, pp. 67–82. Lawrence Erlbaum, Hillsdale (1977)

    Google Scholar 

  14. Turk, M.: Multimodal interaction: a review. Pattern Recogn. Lett. 36, 189–195 (2014)

    Article  Google Scholar 

  15. Wagner, P., Malisz, Z., Kopp, S.: Gesture and speech in interaction: an overview. Speech Commun. 57, 209–232 (2014)

    Article  Google Scholar 

  16. Mithen, S.: The Prehistory of the Mind. Phoenix, London (1996)

    Google Scholar 

  17. MacWhinney, B.: Language evolution and human development. In: Bjorklund, D., Pellegrini, A. (eds.) Origins of the Social Mind: Evolutionary Psychology and Child Development, pp. 383–410. Guilford Press, New York (2005)

    Google Scholar 

  18. Tomasello, M.: Origins of Human Communication. MIT Press, Cambridge (2008)

    Google Scholar 

  19. Clark, H.H., Brennan, S.A.: Perspectives on socially shared cognition. In: Resnick, L.B., Levine, J.M., Teasley, S.D. (eds.) Grounding in communication, pp. 127–149. APA Books, Washington (1991)

    Google Scholar 

  20. Pezzulo, G.: Shared representations as coordination tools for interaction. Rev. Philos. Psychol. 2, 303–333 (2011)

    Article  Google Scholar 

  21. Tomasello, M.: The role of joint attention in early language development. Lang. Sci. 11, 69–88 (1988)

    Article  Google Scholar 

  22. Sebanz, N., Bekkering, H., Knoblich, G.: Joint action: bodies and minds moving together. Trends Cogn. Sci. 10(2), 70–76 (2006)

    Article  Google Scholar 

  23. Bekkering, H., de Bruijn, E.R.A., Cuijpers, R.H., Newman-Norlund, R., van Schie, H.T., Meulenbroek, R.: Joint action: neurocognitive mechanisms supporting human interaction. Top. Cogn. Sci. 1, 340–352 (2009)

    Article  Google Scholar 

  24. Galantucci, B., Sebanz, N.: Joint action: current perspectives. Top. Cogn. Sci. 1, 255–259 (2009)

    Article  Google Scholar 

  25. Steels, L.: Evolving grounded communication for robots. Trends Cogn. Sci. 7(7), 308–312 (2003)

    Article  Google Scholar 

  26. Roy, D., Reiter, E.: Connecting language to the world. Artif. Intell. 167, 1–12 (2005)

    Article  Google Scholar 

  27. Roy, D.: Semiotic schemas: a framework for grounding language in action and perception. Artif. Intell. 167, 170–205 (2005)

    Article  Google Scholar 

  28. Lyon, C., Nehaniv, C.L., Cangelosi, A.: Emergence of Communication and Language. Springer, London (2007)

    Book  MATH  Google Scholar 

  29. Stramandinoli, F., Marocco, D., Cangelosi, A.: The grounding of higher order concepts in action and language: a cognitive robotics model. Neural Netw. 32, 165–173 (2012)

    Article  Google Scholar 

  30. Harnad, S.: The symbol grounding problem. Physica D 42, 335–346 (1990)

    Article  Google Scholar 

  31. Feldman, J.A.: From Molecules to Metaphor: A Neural Theory of Language. Bradford Books, Cambridge (2008)

    Google Scholar 

  32. Rizzolatti, G., Fadiga, L., Gallese, V., Fogassi, L.: Premotor cortex and the recognition of motor actions. Cogn. Brain Res. 3, 131–141 (1996)

    Article  Google Scholar 

  33. Rizzolatti, G., Craighero, L.: The mirror-neuron system. Annu. Rev. Neurosci. 27, 169–192 (2004)

    Article  Google Scholar 

  34. Wilson, M., Knoblich, G.: The case for motor involvement in perceiving conspecifics. Psychol. Bull. 131(3), 460–473 (2005)

    Article  Google Scholar 

  35. Caggiano, V., Fogassi, L., Rizzolatti, G., Casile, A., Giese, M.A., Thier, P.: Mirror neurons encode the subjective value of an observed action. Proc. Nat. Acad. Sci. 109(29), 11848–11853 (2012)

    Article  Google Scholar 

  36. Oztop, E., Kawato, M., Arbib, M.: Mirror neurons and imitation: a computationally guided review. Neural Netw. 19, 25–271 (2006)

    Article  Google Scholar 

  37. Corradini, A., Antonietti, A.: Mirror neurons and their function in cognitively understood empathy. Conscious. Cogn. 22(3), 1152–1161 (2013)

    Article  Google Scholar 

  38. Rizzolatti, G., Arbib, M.A.: Language within our grasp. Trends Neurosci. 21(5), 188–194 (1998)

    Article  Google Scholar 

  39. Studdert-Kennedy, M.: Mirror neurons, vocal imitation, and the evolution of particulate speech. In: Stamenov, M.I., Gallese, V. (eds.) Mirror Neurons and the Evolution of Brain and Language, pp. 207–227. Benjamins, Philadelphia (2002)

    Chapter  Google Scholar 

  40. Arbib, M.A.: From monkey-like action recognition to human language: an evolutionary framework for neurolinguists. Behav. Brian Sci. 28(2), 105–124 (2005)

    Google Scholar 

  41. Corballis, M.C.: Mirror neurons and the evolution of language. Brain Lang. 112(1), 25–35 (2010)

    Article  Google Scholar 

  42. Liberman, A.M., Cooper, F.S., Harris, K.S., MacNeilage, P.J.: A motor theory of speech perception. In: Symposium on Speech Communication Seminar. Royal Institute of Technology, Stockholm (1963)

    Google Scholar 

  43. Galantucci, B., Fowler, C.A., Turvey, M.T.: The motor theory of speech perception reviewed. Psychon. Bull. Rev. 13(3), 361–377 (2006)

    Article  Google Scholar 

  44. Lotto, A.J., Hickok, G.S., Holt, L.L.: Reflections on mirror neurons and speech perception. Trends Cogn. Sci. 13(3), 110–114 (2009)

    Article  Google Scholar 

  45. Hickok, G.: The role of mirror neurons in speech and language processing. Brain Lang.: Mirror Neurons: Prospects Probl. Neurobiol. Lang. 112(1), 1–2 (2010)

    Google Scholar 

  46. Barakova, E.I., Lourens, T.: Mirror neuron framework yields representations for robot interaction. Neurocomputing 72(4–6), 895–900 (2009)

    Article  Google Scholar 

  47. Pickering, M.J., Garrod, S.: Do people use language production to make predictions during comprehension? Trends Cogn. Sci. 11(3), 105–110 (2007)

    Article  Google Scholar 

  48. Pickering, M.J., Garrod, S.: An integrated theory of language production and comprehension. Behav. Brain Sci. 36(04), 329–347 (2013)

    Article  Google Scholar 

  49. Pickering, M.J., Garrod, S.: Forward models and their implications for production, comprehension, and dialogue. Behav. Brain Sci. 36(4), 377–392 (2013)

    Article  Google Scholar 

  50. Schwartz, J.L., Basirat, A., Mnard, L., Sato, M.: The perception-for-action-control theory (PACT): a perceptuo-motor theory of speech perception. J. Neurolinguist. 25(5), 336–354 (2012)

    Article  Google Scholar 

  51. Powers, W.T.: Behavior: The Control of Perception. Hawthorne/Aldine, New York (1973)

    Google Scholar 

  52. Powers, W.T.: Living Control Systems III: The Fact of Control. Benchmark Publications, Escondido (2008)

    Google Scholar 

  53. Wiener, N.: Cybernetics or Control and Communication in the Animal and the Machine. Wiley, New York (1948)

    Google Scholar 

  54. Bourbon, W.T., Powers, W.T.: Models and their worlds. Int. J. Hum.-Comput. Stud. 50, 445–461 (1999)

    Article  Google Scholar 

  55. Lindblom, B.: Explaining phonetic variation: a sketch of the H&H theory. In: Hardcastle, W.J., Marchal, A. (eds.) Speech Production and Speech Modelling, pp. 403–439. Kluwer Academic Publishers, Dordrecht (1990)

    Chapter  Google Scholar 

  56. Moore, R.K., Nicolao, M.: Reactive speech synthesis: actively managing phonetic contrast along an H&H continuum. In: 17th International Congress of Phonetics Sciences (ICPhS), Hong Kong (2011)

    Google Scholar 

  57. Dennett, D.: The Intentional Stance. MIT Press, Cambridge (1989)

    Google Scholar 

  58. Glock, H.-J.: Intentionality and language. Lang. Commun. 21(2), 105–118 (2001)

    Article  Google Scholar 

  59. Frith, C.D., Lau, H.C.: The problem of introspection. Conscious. Cogn. 15, 761–764 (2006)

    Article  Google Scholar 

  60. Rao, A., Georgoff, M.: BDI agents: from theory to practice. Australian Artificial Intelligence Institute, Melbourne (1995)

    Google Scholar 

  61. Wooldridge, M.: Reasoning About Ration Agents. MIT Press, Cambridge (2000)

    Google Scholar 

  62. Scherer, K.R., Schorr, A., Johnstone, T.: Appraisal Processes in Emotion: Theory, Methods Research. Oxford University Press, New York/Oxford (2001)

    Google Scholar 

  63. Marsella, S., Gratch, J., Petta, P.: Computational models of emotion. In: Scherer, K.R., Bänziger, T., Roesch, E. (eds.) A Blueprint for Affective Computing-A Sourcebook and Manual, pp. 21–46. Oxford University Press, New York (2010)

    Google Scholar 

  64. Moore, R.K.: Spoken language processing: piecing together the puzzle. Speech Commun. 49(5), 418–435 (2007)

    Article  Google Scholar 

  65. Moore, R.K.: PRESENCE: a human-inspired architecture for speech-based human-machine interaction. IEEE Trans. Comput. 56(9), 1176–1188 (2007)

    Article  MathSciNet  Google Scholar 

  66. Moore, R.K.: Cognitive approaches to spoken language technology. In: Chen, F., Jokinen, K. (eds.) Speech Technology: Theory and Applications, pp. 89–103. Springer, New York (2010)

    Chapter  Google Scholar 

  67. Nicolao, M., Latorre, J., Moore, R.K.: C2H: A computational model of H&H-based phonetic contrast in synthetic speech. In: INTERSPEECH, Portland, USA (2012)

    Google Scholar 

  68. Worgan, S., Moore, R.K.: Enabling reinforcement learning for open dialogue systems through speech stress detection. In: Fourth International Workshop on Human-Computer Conversation, Bellagio, Italy (2008)

    Google Scholar 

  69. Hofe, R., Moore, R.K.: Towards an investigation of speech energetics using AnTon: an animatronic model of a human tongue and vocal tract. Connect. Sci. 20(4), 319–336 (2008)

    Article  Google Scholar 

  70. Crook, N., Smith, C., Cavazza, M., Pulman, S., Moore, R.K., Boye, J.: Handling user interruptions in an embodied conversational agent. In: AAMAS 2010: 9th International Conference on Autonomous Agents and Multiagent Systems, Toronto (2010)

    Google Scholar 

  71. Crook, N.T., Field, D., Smith, C., Harding, S., Pulman, S., Cavazza, M., Charlton, D., Moore, R.K., Boye, J.: Generating context-sensitive ECA responses to user barge-in interruptions. J. Multimodal User Interfaces 6(1–2), 13–25 (2012)

    Article  Google Scholar 

  72. Allen, J.F., Ferguson, G., Stent, A.: An architecture for more realistic conversational systems. In: 6th International Conference on Intelligent User Interfaces (2001)

    Google Scholar 

  73. Aist, G., Allen, J., Campana, E., Galescu, L., Gallo, C.A.G., Stoness, S.C., Swift, M., Tanenhaus, M.: Software architectures for incremental understanding of human speech. In: Ninth International Conference on Spoken Language Processing: INTERSPEECH - ICSLP, Pittsburgh, PA, USA (2006)

    Google Scholar 

  74. Schlangen, D., Skantze, G.: A general, abstract model of incremental dialogue processing. In: 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL-09), Athens, Greece (2009)

    Google Scholar 

  75. Hastie, H., Lemon, O., Dethlefs, N.: Incremental spoken dialogue systems: tools and data. In: Proceedings of NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community, Montreal, Canada, pp. 15–16 (2012)

    Google Scholar 

  76. Williams, J.D., Young, S.J.: Partially observable Markov decision processes for spoken dialog systems. Comput. Speech Lang. 21(2), 231–422 (2007)

    Article  Google Scholar 

  77. Thomson, B., Young, S.J.: Bayesian update of dialogue state: a POMDP framework for spoken dialogue systems. Comput. Speech Lang. 24(4), 562–588 (2010)

    Article  Google Scholar 

  78. Moore, R.K.: Interpreting intentional behaviour. In: Mller, M., Narayanan, S.S., Schuller, B. (eds.) Dagstuhl Seminar 13451 on Computational Audio Analysis, vol. 3, Dagstuhl, Germany (2014)

    Google Scholar 

  79. Moore, R.K.: From talking and listening robots to intelligent communicative machines. In: Markowitz, J. (ed.) Robots That Talk and Listen. De Gruyter, Boston (in press)

    Google Scholar 

Download references

Acknowledgments

The author would like to thank colleagues in the Sheffield Speech and Hearing research group and the Bristol Robotics Laboratory for discussions relating to the content of this paper. This work was partially supported by the European Commission [grant numbers EU-FP6- 507422, EU-FP6-034434, EU-FP7-231868, FP7-ICT-2013-10-611971] and the UK Engineering and Physical Sciences Research Council [grant number EP/I013512/1].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roger K. Moore .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Moore, R.K. (2014). Spoken Language Processing: Time to Look Outside?. In: Besacier, L., Dediu, AH., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2014. Lecture Notes in Computer Science(), vol 8791. Springer, Cham. https://doi.org/10.1007/978-3-319-11397-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11397-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11396-8

  • Online ISBN: 978-3-319-11397-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics