Spoken Language Processing: Time to Look Outside?

Moore, Roger K.

doi:10.1007/978-3-319-11397-5_2

Roger K. Moore⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8791))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

1091 Accesses
3 Citations

Abstract

Over the past thirty years, the field of spoken language processing has made impressive progress from simple laboratory demonstrations to mainstream consumer products. However, commercial applications such as Siri highlight the fact that there is still some way to go in creating Autonomous Social Agents that are truly capable of conversing effectively with their human counterparts in real-world situations. This paper suggests that it may be time for the spoken language processing community to take an interest in the potentially important developments that are occurring in related fields such as cognitive neuroscience, intelligent systems and developmental robotics. It then gives an insight into how such ideas might be integrated into a novel Mutual Beliefs Desires Intentions Actions and Consequences (MBDIAC) framework that places a focus on generative models of communicative behaviour which are recruited for interpreting the behaviour of others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR, Upper Saddle River (2001)
Google Scholar
Gales, M., Young, S.: The application of hidden Markov models in speech recognition. Found. Trends Sig. Process. 1(3), 195–304 (2007)
Article MATH Google Scholar
Pieraccini, R.: The Voice Mach. MIT Press, Cambridge (2012)
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Moore, R.K.: Modelling data entry rates for ASR and alternative input methods. In: INTERSPEECH 2004 ICSLP, Jeju, Korea (2004)
Google Scholar
Moore, R.K.: Spoken language processing: where do we go from here? In: Trappl, R. (ed.) Your Virtual Butler. LNCS, vol. 7407, pp. 119–133. Springer, Heidelberg (2013)
Chapter Google Scholar
Dawkins, R.: The Blind Watchmaker. Penguin Books, London (1991)
Google Scholar
Gopnik, A., Meltzoff, A.N., Kuhl, P.K.: The Scientist in the Crib. Perennial, New York (2001)
Google Scholar
Moore, R.K.: Towards a unified theory of spoken language processing. In: 4th IEEE International Conference on Cognitive Informatics, Irvine, CA (2005)
Google Scholar
Maturana, H.R., Varela, F.J.: The Tree of Knowledge: The Biological Roots of Human Understanding. New Science Library/Shambhala Publications, Boston (1987)
Google Scholar
Garrod, S., Pickering, M.J.: Why is conversation so easy? Trends Cogn. Sci. 8, 8–11 (2004)
Article Google Scholar
Fusaroli, R., Raczaszek-Leonardi, J., Tyln, K.: Dialog as interpersonal synergy. New Ideas Psychol. 32, 147–157 (2014)
Article Google Scholar
Gibson, J.J.: The theory of affordances. In: Shaw, R., Bransford, J. (eds.) Perceiving, Acting, and Knowing: Toward an Ecological Psychology, pp. 67–82. Lawrence Erlbaum, Hillsdale (1977)
Google Scholar
Turk, M.: Multimodal interaction: a review. Pattern Recogn. Lett. 36, 189–195 (2014)
Article Google Scholar
Wagner, P., Malisz, Z., Kopp, S.: Gesture and speech in interaction: an overview. Speech Commun. 57, 209–232 (2014)
Article Google Scholar
Mithen, S.: The Prehistory of the Mind. Phoenix, London (1996)
Google Scholar
MacWhinney, B.: Language evolution and human development. In: Bjorklund, D., Pellegrini, A. (eds.) Origins of the Social Mind: Evolutionary Psychology and Child Development, pp. 383–410. Guilford Press, New York (2005)
Google Scholar
Tomasello, M.: Origins of Human Communication. MIT Press, Cambridge (2008)
Google Scholar
Clark, H.H., Brennan, S.A.: Perspectives on socially shared cognition. In: Resnick, L.B., Levine, J.M., Teasley, S.D. (eds.) Grounding in communication, pp. 127–149. APA Books, Washington (1991)
Google Scholar
Pezzulo, G.: Shared representations as coordination tools for interaction. Rev. Philos. Psychol. 2, 303–333 (2011)
Article Google Scholar
Tomasello, M.: The role of joint attention in early language development. Lang. Sci. 11, 69–88 (1988)
Article Google Scholar
Sebanz, N., Bekkering, H., Knoblich, G.: Joint action: bodies and minds moving together. Trends Cogn. Sci. 10(2), 70–76 (2006)
Article Google Scholar
Bekkering, H., de Bruijn, E.R.A., Cuijpers, R.H., Newman-Norlund, R., van Schie, H.T., Meulenbroek, R.: Joint action: neurocognitive mechanisms supporting human interaction. Top. Cogn. Sci. 1, 340–352 (2009)
Article Google Scholar
Galantucci, B., Sebanz, N.: Joint action: current perspectives. Top. Cogn. Sci. 1, 255–259 (2009)
Article Google Scholar
Steels, L.: Evolving grounded communication for robots. Trends Cogn. Sci. 7(7), 308–312 (2003)
Article Google Scholar
Roy, D., Reiter, E.: Connecting language to the world. Artif. Intell. 167, 1–12 (2005)
Article Google Scholar
Roy, D.: Semiotic schemas: a framework for grounding language in action and perception. Artif. Intell. 167, 170–205 (2005)
Article Google Scholar
Lyon, C., Nehaniv, C.L., Cangelosi, A.: Emergence of Communication and Language. Springer, London (2007)
Book MATH Google Scholar
Stramandinoli, F., Marocco, D., Cangelosi, A.: The grounding of higher order concepts in action and language: a cognitive robotics model. Neural Netw. 32, 165–173 (2012)
Article Google Scholar
Harnad, S.: The symbol grounding problem. Physica D 42, 335–346 (1990)
Article Google Scholar
Feldman, J.A.: From Molecules to Metaphor: A Neural Theory of Language. Bradford Books, Cambridge (2008)
Google Scholar
Rizzolatti, G., Fadiga, L., Gallese, V., Fogassi, L.: Premotor cortex and the recognition of motor actions. Cogn. Brain Res. 3, 131–141 (1996)
Article Google Scholar
Rizzolatti, G., Craighero, L.: The mirror-neuron system. Annu. Rev. Neurosci. 27, 169–192 (2004)
Article Google Scholar
Wilson, M., Knoblich, G.: The case for motor involvement in perceiving conspecifics. Psychol. Bull. 131(3), 460–473 (2005)
Article Google Scholar
Caggiano, V., Fogassi, L., Rizzolatti, G., Casile, A., Giese, M.A., Thier, P.: Mirror neurons encode the subjective value of an observed action. Proc. Nat. Acad. Sci. 109(29), 11848–11853 (2012)
Article Google Scholar
Oztop, E., Kawato, M., Arbib, M.: Mirror neurons and imitation: a computationally guided review. Neural Netw. 19, 25–271 (2006)
Article Google Scholar
Corradini, A., Antonietti, A.: Mirror neurons and their function in cognitively understood empathy. Conscious. Cogn. 22(3), 1152–1161 (2013)
Article Google Scholar
Rizzolatti, G., Arbib, M.A.: Language within our grasp. Trends Neurosci. 21(5), 188–194 (1998)
Article Google Scholar
Studdert-Kennedy, M.: Mirror neurons, vocal imitation, and the evolution of particulate speech. In: Stamenov, M.I., Gallese, V. (eds.) Mirror Neurons and the Evolution of Brain and Language, pp. 207–227. Benjamins, Philadelphia (2002)
Chapter Google Scholar
Arbib, M.A.: From monkey-like action recognition to human language: an evolutionary framework for neurolinguists. Behav. Brian Sci. 28(2), 105–124 (2005)
Google Scholar
Corballis, M.C.: Mirror neurons and the evolution of language. Brain Lang. 112(1), 25–35 (2010)
Article Google Scholar
Liberman, A.M., Cooper, F.S., Harris, K.S., MacNeilage, P.J.: A motor theory of speech perception. In: Symposium on Speech Communication Seminar. Royal Institute of Technology, Stockholm (1963)
Google Scholar
Galantucci, B., Fowler, C.A., Turvey, M.T.: The motor theory of speech perception reviewed. Psychon. Bull. Rev. 13(3), 361–377 (2006)
Article Google Scholar
Lotto, A.J., Hickok, G.S., Holt, L.L.: Reflections on mirror neurons and speech perception. Trends Cogn. Sci. 13(3), 110–114 (2009)
Article Google Scholar
Hickok, G.: The role of mirror neurons in speech and language processing. Brain Lang.: Mirror Neurons: Prospects Probl. Neurobiol. Lang. 112(1), 1–2 (2010)
Google Scholar
Barakova, E.I., Lourens, T.: Mirror neuron framework yields representations for robot interaction. Neurocomputing 72(4–6), 895–900 (2009)
Article Google Scholar
Pickering, M.J., Garrod, S.: Do people use language production to make predictions during comprehension? Trends Cogn. Sci. 11(3), 105–110 (2007)
Article Google Scholar
Pickering, M.J., Garrod, S.: An integrated theory of language production and comprehension. Behav. Brain Sci. 36(04), 329–347 (2013)
Article Google Scholar
Pickering, M.J., Garrod, S.: Forward models and their implications for production, comprehension, and dialogue. Behav. Brain Sci. 36(4), 377–392 (2013)
Article Google Scholar
Schwartz, J.L., Basirat, A., Mnard, L., Sato, M.: The perception-for-action-control theory (PACT): a perceptuo-motor theory of speech perception. J. Neurolinguist. 25(5), 336–354 (2012)
Article Google Scholar
Powers, W.T.: Behavior: The Control of Perception. Hawthorne/Aldine, New York (1973)
Google Scholar
Powers, W.T.: Living Control Systems III: The Fact of Control. Benchmark Publications, Escondido (2008)
Google Scholar
Wiener, N.: Cybernetics or Control and Communication in the Animal and the Machine. Wiley, New York (1948)
Google Scholar
Bourbon, W.T., Powers, W.T.: Models and their worlds. Int. J. Hum.-Comput. Stud. 50, 445–461 (1999)
Article Google Scholar
Lindblom, B.: Explaining phonetic variation: a sketch of the H&H theory. In: Hardcastle, W.J., Marchal, A. (eds.) Speech Production and Speech Modelling, pp. 403–439. Kluwer Academic Publishers, Dordrecht (1990)
Chapter Google Scholar
Moore, R.K., Nicolao, M.: Reactive speech synthesis: actively managing phonetic contrast along an H&H continuum. In: 17th International Congress of Phonetics Sciences (ICPhS), Hong Kong (2011)
Google Scholar
Dennett, D.: The Intentional Stance. MIT Press, Cambridge (1989)
Google Scholar
Glock, H.-J.: Intentionality and language. Lang. Commun. 21(2), 105–118 (2001)
Article Google Scholar
Frith, C.D., Lau, H.C.: The problem of introspection. Conscious. Cogn. 15, 761–764 (2006)
Article Google Scholar
Rao, A., Georgoff, M.: BDI agents: from theory to practice. Australian Artificial Intelligence Institute, Melbourne (1995)
Google Scholar
Wooldridge, M.: Reasoning About Ration Agents. MIT Press, Cambridge (2000)
Google Scholar
Scherer, K.R., Schorr, A., Johnstone, T.: Appraisal Processes in Emotion: Theory, Methods Research. Oxford University Press, New York/Oxford (2001)
Google Scholar
Marsella, S., Gratch, J., Petta, P.: Computational models of emotion. In: Scherer, K.R., Bänziger, T., Roesch, E. (eds.) A Blueprint for Affective Computing-A Sourcebook and Manual, pp. 21–46. Oxford University Press, New York (2010)
Google Scholar
Moore, R.K.: Spoken language processing: piecing together the puzzle. Speech Commun. 49(5), 418–435 (2007)
Article Google Scholar
Moore, R.K.: PRESENCE: a human-inspired architecture for speech-based human-machine interaction. IEEE Trans. Comput. 56(9), 1176–1188 (2007)
Article MathSciNet Google Scholar
Moore, R.K.: Cognitive approaches to spoken language technology. In: Chen, F., Jokinen, K. (eds.) Speech Technology: Theory and Applications, pp. 89–103. Springer, New York (2010)
Chapter Google Scholar
Nicolao, M., Latorre, J., Moore, R.K.: C2H: A computational model of H&H-based phonetic contrast in synthetic speech. In: INTERSPEECH, Portland, USA (2012)
Google Scholar
Worgan, S., Moore, R.K.: Enabling reinforcement learning for open dialogue systems through speech stress detection. In: Fourth International Workshop on Human-Computer Conversation, Bellagio, Italy (2008)
Google Scholar
Hofe, R., Moore, R.K.: Towards an investigation of speech energetics using AnTon: an animatronic model of a human tongue and vocal tract. Connect. Sci. 20(4), 319–336 (2008)
Article Google Scholar
Crook, N., Smith, C., Cavazza, M., Pulman, S., Moore, R.K., Boye, J.: Handling user interruptions in an embodied conversational agent. In: AAMAS 2010: 9th International Conference on Autonomous Agents and Multiagent Systems, Toronto (2010)
Google Scholar
Crook, N.T., Field, D., Smith, C., Harding, S., Pulman, S., Cavazza, M., Charlton, D., Moore, R.K., Boye, J.: Generating context-sensitive ECA responses to user barge-in interruptions. J. Multimodal User Interfaces 6(1–2), 13–25 (2012)
Article Google Scholar
Allen, J.F., Ferguson, G., Stent, A.: An architecture for more realistic conversational systems. In: 6th International Conference on Intelligent User Interfaces (2001)
Google Scholar
Aist, G., Allen, J., Campana, E., Galescu, L., Gallo, C.A.G., Stoness, S.C., Swift, M., Tanenhaus, M.: Software architectures for incremental understanding of human speech. In: Ninth International Conference on Spoken Language Processing: INTERSPEECH - ICSLP, Pittsburgh, PA, USA (2006)
Google Scholar
Schlangen, D., Skantze, G.: A general, abstract model of incremental dialogue processing. In: 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL-09), Athens, Greece (2009)
Google Scholar
Hastie, H., Lemon, O., Dethlefs, N.: Incremental spoken dialogue systems: tools and data. In: Proceedings of NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community, Montreal, Canada, pp. 15–16 (2012)
Google Scholar
Williams, J.D., Young, S.J.: Partially observable Markov decision processes for spoken dialog systems. Comput. Speech Lang. 21(2), 231–422 (2007)
Article Google Scholar
Thomson, B., Young, S.J.: Bayesian update of dialogue state: a POMDP framework for spoken dialogue systems. Comput. Speech Lang. 24(4), 562–588 (2010)
Article Google Scholar
Moore, R.K.: Interpreting intentional behaviour. In: Mller, M., Narayanan, S.S., Schuller, B. (eds.) Dagstuhl Seminar 13451 on Computational Audio Analysis, vol. 3, Dagstuhl, Germany (2014)
Google Scholar
Moore, R.K.: From talking and listening robots to intelligent communicative machines. In: Markowitz, J. (ed.) Robots That Talk and Listen. De Gruyter, Boston (in press)
Google Scholar

Download references

Acknowledgments

The author would like to thank colleagues in the Sheffield Speech and Hearing research group and the Bristol Robotics Laboratory for discussions relating to the content of this paper. This work was partially supported by the European Commission [grant numbers EU-FP6- 507422, EU-FP6-034434, EU-FP7-231868, FP7-ICT-2013-10-611971] and the UK Engineering and Physical Sciences Research Council [grant number EP/I013512/1].

Author information

Authors and Affiliations

Speech and Hearing Research Group, University of Sheffield Regent Court, 211 Portobello, Sheffield, S1 4DP, UK
Roger K. Moore

Authors

Roger K. Moore
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roger K. Moore .

Editor information

Editors and Affiliations

University Joseph Fourier, Grenoble, France
Laurent Besacier
Rovira i Virgili University, Tarragona, Spain
Adrian-Horia Dediu
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moore, R.K. (2014). Spoken Language Processing: Time to Look Outside?. In: Besacier, L., Dediu, AH., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2014. Lecture Notes in Computer Science(), vol 8791. Springer, Cham. https://doi.org/10.1007/978-3-319-11397-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-11397-5_2
Published: 03 September 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11396-8
Online ISBN: 978-3-319-11397-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics