Abstract
Over the past thirty years, the field of spoken language processing has made impressive progress from simple laboratory demonstrations to mainstream consumer products. However, commercial applications such as Siri highlight the fact that there is still some way to go in creating Autonomous Social Agents that are truly capable of conversing effectively with their human counterparts in real-world situations. This paper suggests that it may be time for the spoken language processing community to take an interest in the potentially important developments that are occurring in related fields such as cognitive neuroscience, intelligent systems and developmental robotics. It then gives an insight into how such ideas might be integrated into a novel Mutual Beliefs Desires Intentions Actions and Consequences (MBDIAC) framework that places a focus on generative models of communicative behaviour which are recruited for interpreting the behaviour of others.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR, Upper Saddle River (2001)
Gales, M., Young, S.: The application of hidden Markov models in speech recognition. Found. Trends Sig. Process. 1(3), 195–304 (2007)
Pieraccini, R.: The Voice Mach. MIT Press, Cambridge (2012)
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Moore, R.K.: Modelling data entry rates for ASR and alternative input methods. In: INTERSPEECH 2004 ICSLP, Jeju, Korea (2004)
Moore, R.K.: Spoken language processing: where do we go from here? In: Trappl, R. (ed.) Your Virtual Butler. LNCS, vol. 7407, pp. 119–133. Springer, Heidelberg (2013)
Dawkins, R.: The Blind Watchmaker. Penguin Books, London (1991)
Gopnik, A., Meltzoff, A.N., Kuhl, P.K.: The Scientist in the Crib. Perennial, New York (2001)
Moore, R.K.: Towards a unified theory of spoken language processing. In: 4th IEEE International Conference on Cognitive Informatics, Irvine, CA (2005)
Maturana, H.R., Varela, F.J.: The Tree of Knowledge: The Biological Roots of Human Understanding. New Science Library/Shambhala Publications, Boston (1987)
Garrod, S., Pickering, M.J.: Why is conversation so easy? Trends Cogn. Sci. 8, 8–11 (2004)
Fusaroli, R., Raczaszek-Leonardi, J., Tyln, K.: Dialog as interpersonal synergy. New Ideas Psychol. 32, 147–157 (2014)
Gibson, J.J.: The theory of affordances. In: Shaw, R., Bransford, J. (eds.) Perceiving, Acting, and Knowing: Toward an Ecological Psychology, pp. 67–82. Lawrence Erlbaum, Hillsdale (1977)
Turk, M.: Multimodal interaction: a review. Pattern Recogn. Lett. 36, 189–195 (2014)
Wagner, P., Malisz, Z., Kopp, S.: Gesture and speech in interaction: an overview. Speech Commun. 57, 209–232 (2014)
Mithen, S.: The Prehistory of the Mind. Phoenix, London (1996)
MacWhinney, B.: Language evolution and human development. In: Bjorklund, D., Pellegrini, A. (eds.) Origins of the Social Mind: Evolutionary Psychology and Child Development, pp. 383–410. Guilford Press, New York (2005)
Tomasello, M.: Origins of Human Communication. MIT Press, Cambridge (2008)
Clark, H.H., Brennan, S.A.: Perspectives on socially shared cognition. In: Resnick, L.B., Levine, J.M., Teasley, S.D. (eds.) Grounding in communication, pp. 127–149. APA Books, Washington (1991)
Pezzulo, G.: Shared representations as coordination tools for interaction. Rev. Philos. Psychol. 2, 303–333 (2011)
Tomasello, M.: The role of joint attention in early language development. Lang. Sci. 11, 69–88 (1988)
Sebanz, N., Bekkering, H., Knoblich, G.: Joint action: bodies and minds moving together. Trends Cogn. Sci. 10(2), 70–76 (2006)
Bekkering, H., de Bruijn, E.R.A., Cuijpers, R.H., Newman-Norlund, R., van Schie, H.T., Meulenbroek, R.: Joint action: neurocognitive mechanisms supporting human interaction. Top. Cogn. Sci. 1, 340–352 (2009)
Galantucci, B., Sebanz, N.: Joint action: current perspectives. Top. Cogn. Sci. 1, 255–259 (2009)
Steels, L.: Evolving grounded communication for robots. Trends Cogn. Sci. 7(7), 308–312 (2003)
Roy, D., Reiter, E.: Connecting language to the world. Artif. Intell. 167, 1–12 (2005)
Roy, D.: Semiotic schemas: a framework for grounding language in action and perception. Artif. Intell. 167, 170–205 (2005)
Lyon, C., Nehaniv, C.L., Cangelosi, A.: Emergence of Communication and Language. Springer, London (2007)
Stramandinoli, F., Marocco, D., Cangelosi, A.: The grounding of higher order concepts in action and language: a cognitive robotics model. Neural Netw. 32, 165–173 (2012)
Harnad, S.: The symbol grounding problem. Physica D 42, 335–346 (1990)
Feldman, J.A.: From Molecules to Metaphor: A Neural Theory of Language. Bradford Books, Cambridge (2008)
Rizzolatti, G., Fadiga, L., Gallese, V., Fogassi, L.: Premotor cortex and the recognition of motor actions. Cogn. Brain Res. 3, 131–141 (1996)
Rizzolatti, G., Craighero, L.: The mirror-neuron system. Annu. Rev. Neurosci. 27, 169–192 (2004)
Wilson, M., Knoblich, G.: The case for motor involvement in perceiving conspecifics. Psychol. Bull. 131(3), 460–473 (2005)
Caggiano, V., Fogassi, L., Rizzolatti, G., Casile, A., Giese, M.A., Thier, P.: Mirror neurons encode the subjective value of an observed action. Proc. Nat. Acad. Sci. 109(29), 11848–11853 (2012)
Oztop, E., Kawato, M., Arbib, M.: Mirror neurons and imitation: a computationally guided review. Neural Netw. 19, 25–271 (2006)
Corradini, A., Antonietti, A.: Mirror neurons and their function in cognitively understood empathy. Conscious. Cogn. 22(3), 1152–1161 (2013)
Rizzolatti, G., Arbib, M.A.: Language within our grasp. Trends Neurosci. 21(5), 188–194 (1998)
Studdert-Kennedy, M.: Mirror neurons, vocal imitation, and the evolution of particulate speech. In: Stamenov, M.I., Gallese, V. (eds.) Mirror Neurons and the Evolution of Brain and Language, pp. 207–227. Benjamins, Philadelphia (2002)
Arbib, M.A.: From monkey-like action recognition to human language: an evolutionary framework for neurolinguists. Behav. Brian Sci. 28(2), 105–124 (2005)
Corballis, M.C.: Mirror neurons and the evolution of language. Brain Lang. 112(1), 25–35 (2010)
Liberman, A.M., Cooper, F.S., Harris, K.S., MacNeilage, P.J.: A motor theory of speech perception. In: Symposium on Speech Communication Seminar. Royal Institute of Technology, Stockholm (1963)
Galantucci, B., Fowler, C.A., Turvey, M.T.: The motor theory of speech perception reviewed. Psychon. Bull. Rev. 13(3), 361–377 (2006)
Lotto, A.J., Hickok, G.S., Holt, L.L.: Reflections on mirror neurons and speech perception. Trends Cogn. Sci. 13(3), 110–114 (2009)
Hickok, G.: The role of mirror neurons in speech and language processing. Brain Lang.: Mirror Neurons: Prospects Probl. Neurobiol. Lang. 112(1), 1–2 (2010)
Barakova, E.I., Lourens, T.: Mirror neuron framework yields representations for robot interaction. Neurocomputing 72(4–6), 895–900 (2009)
Pickering, M.J., Garrod, S.: Do people use language production to make predictions during comprehension? Trends Cogn. Sci. 11(3), 105–110 (2007)
Pickering, M.J., Garrod, S.: An integrated theory of language production and comprehension. Behav. Brain Sci. 36(04), 329–347 (2013)
Pickering, M.J., Garrod, S.: Forward models and their implications for production, comprehension, and dialogue. Behav. Brain Sci. 36(4), 377–392 (2013)
Schwartz, J.L., Basirat, A., Mnard, L., Sato, M.: The perception-for-action-control theory (PACT): a perceptuo-motor theory of speech perception. J. Neurolinguist. 25(5), 336–354 (2012)
Powers, W.T.: Behavior: The Control of Perception. Hawthorne/Aldine, New York (1973)
Powers, W.T.: Living Control Systems III: The Fact of Control. Benchmark Publications, Escondido (2008)
Wiener, N.: Cybernetics or Control and Communication in the Animal and the Machine. Wiley, New York (1948)
Bourbon, W.T., Powers, W.T.: Models and their worlds. Int. J. Hum.-Comput. Stud. 50, 445–461 (1999)
Lindblom, B.: Explaining phonetic variation: a sketch of the H&H theory. In: Hardcastle, W.J., Marchal, A. (eds.) Speech Production and Speech Modelling, pp. 403–439. Kluwer Academic Publishers, Dordrecht (1990)
Moore, R.K., Nicolao, M.: Reactive speech synthesis: actively managing phonetic contrast along an H&H continuum. In: 17th International Congress of Phonetics Sciences (ICPhS), Hong Kong (2011)
Dennett, D.: The Intentional Stance. MIT Press, Cambridge (1989)
Glock, H.-J.: Intentionality and language. Lang. Commun. 21(2), 105–118 (2001)
Frith, C.D., Lau, H.C.: The problem of introspection. Conscious. Cogn. 15, 761–764 (2006)
Rao, A., Georgoff, M.: BDI agents: from theory to practice. Australian Artificial Intelligence Institute, Melbourne (1995)
Wooldridge, M.: Reasoning About Ration Agents. MIT Press, Cambridge (2000)
Scherer, K.R., Schorr, A., Johnstone, T.: Appraisal Processes in Emotion: Theory, Methods Research. Oxford University Press, New York/Oxford (2001)
Marsella, S., Gratch, J., Petta, P.: Computational models of emotion. In: Scherer, K.R., Bänziger, T., Roesch, E. (eds.) A Blueprint for Affective Computing-A Sourcebook and Manual, pp. 21–46. Oxford University Press, New York (2010)
Moore, R.K.: Spoken language processing: piecing together the puzzle. Speech Commun. 49(5), 418–435 (2007)
Moore, R.K.: PRESENCE: a human-inspired architecture for speech-based human-machine interaction. IEEE Trans. Comput. 56(9), 1176–1188 (2007)
Moore, R.K.: Cognitive approaches to spoken language technology. In: Chen, F., Jokinen, K. (eds.) Speech Technology: Theory and Applications, pp. 89–103. Springer, New York (2010)
Nicolao, M., Latorre, J., Moore, R.K.: C2H: A computational model of H&H-based phonetic contrast in synthetic speech. In: INTERSPEECH, Portland, USA (2012)
Worgan, S., Moore, R.K.: Enabling reinforcement learning for open dialogue systems through speech stress detection. In: Fourth International Workshop on Human-Computer Conversation, Bellagio, Italy (2008)
Hofe, R., Moore, R.K.: Towards an investigation of speech energetics using AnTon: an animatronic model of a human tongue and vocal tract. Connect. Sci. 20(4), 319–336 (2008)
Crook, N., Smith, C., Cavazza, M., Pulman, S., Moore, R.K., Boye, J.: Handling user interruptions in an embodied conversational agent. In: AAMAS 2010: 9th International Conference on Autonomous Agents and Multiagent Systems, Toronto (2010)
Crook, N.T., Field, D., Smith, C., Harding, S., Pulman, S., Cavazza, M., Charlton, D., Moore, R.K., Boye, J.: Generating context-sensitive ECA responses to user barge-in interruptions. J. Multimodal User Interfaces 6(1–2), 13–25 (2012)
Allen, J.F., Ferguson, G., Stent, A.: An architecture for more realistic conversational systems. In: 6th International Conference on Intelligent User Interfaces (2001)
Aist, G., Allen, J., Campana, E., Galescu, L., Gallo, C.A.G., Stoness, S.C., Swift, M., Tanenhaus, M.: Software architectures for incremental understanding of human speech. In: Ninth International Conference on Spoken Language Processing: INTERSPEECH - ICSLP, Pittsburgh, PA, USA (2006)
Schlangen, D., Skantze, G.: A general, abstract model of incremental dialogue processing. In: 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL-09), Athens, Greece (2009)
Hastie, H., Lemon, O., Dethlefs, N.: Incremental spoken dialogue systems: tools and data. In: Proceedings of NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community, Montreal, Canada, pp. 15–16 (2012)
Williams, J.D., Young, S.J.: Partially observable Markov decision processes for spoken dialog systems. Comput. Speech Lang. 21(2), 231–422 (2007)
Thomson, B., Young, S.J.: Bayesian update of dialogue state: a POMDP framework for spoken dialogue systems. Comput. Speech Lang. 24(4), 562–588 (2010)
Moore, R.K.: Interpreting intentional behaviour. In: Mller, M., Narayanan, S.S., Schuller, B. (eds.) Dagstuhl Seminar 13451 on Computational Audio Analysis, vol. 3, Dagstuhl, Germany (2014)
Moore, R.K.: From talking and listening robots to intelligent communicative machines. In: Markowitz, J. (ed.) Robots That Talk and Listen. De Gruyter, Boston (in press)
Acknowledgments
The author would like to thank colleagues in the Sheffield Speech and Hearing research group and the Bristol Robotics Laboratory for discussions relating to the content of this paper. This work was partially supported by the European Commission [grant numbers EU-FP6- 507422, EU-FP6-034434, EU-FP7-231868, FP7-ICT-2013-10-611971] and the UK Engineering and Physical Sciences Research Council [grant number EP/I013512/1].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Moore, R.K. (2014). Spoken Language Processing: Time to Look Outside?. In: Besacier, L., Dediu, AH., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2014. Lecture Notes in Computer Science(), vol 8791. Springer, Cham. https://doi.org/10.1007/978-3-319-11397-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-11397-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11396-8
Online ISBN: 978-3-319-11397-5
eBook Packages: Computer ScienceComputer Science (R0)