Skip to main content

Compensating for Limitations in Speech-Based Natural Language Processing with Multimodal Interfaces in UAV Operation

  • Conference paper
  • First Online:
Advances in Human Factors in Robots and Unmanned Systems (AHFE 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 595))

Included in the following conference series:

Abstract

Natural language interfaces are becoming more ubiquitous. By allowing for more natural communication, reducing the complexity of interacting with machines, and enabling non-expert users, these interfaces have found homes in numerous common products. However, these natural language interfaces still have great room for growth and development in order to better reflect human speech patterns. Intuitive speech communication is often accompanied by gestural information that is currently lacking from most speech interfaces. Exclusion of gestural data reduces a machine’s ability to interpret deictic information and understand some semantic intent. To allow for truly intuitive communication between humans and machines, a natural language interface must understand not only speech but also gestural data. This paper will outline the limitations and restrictions of some of the most popular and common speech-only natural language processing algorithms and systems in use today. Focus will be given to extra-linguistic communication aspects, including gestural information. Current research trends will then be presented that have been designed to compensate for these gaps by incorporating extra-linguistic information. The success of each of these trends will then be evaluated, as well as the hopefulness of continued investigative efforts. Additionally, a model multimodal interface will be presented that incorporates language and gesture data in order to demonstrate the effectiveness of such an interface. The gestural portion of this interface is included to compensate for some of the limitations of speech-only natural language interfaces. Combining these two types of natural language interfaces thereby works to reduce the limitations of natural language interfaces and increase their success. This presentation will discuss how the two interfaces work together and will specify how the speech interface limitations are addressed through the inclusion of a gestural system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Saffer, D.: Designing Gestural Interfaces: Touchscreens and Interactive Devices. O’Reilly Media, Inc., Sebastopol (2008)

    Google Scholar 

  2. Becker, K.C.: Developing a Speech-Based Interface for Field Data Collection. Diss., Texas A&M University (2016)

    Google Scholar 

  3. McNeill, D.: Hand and Mind: What Gestures Reveal About Thought. University of Chicago Press, Chicago (1992)

    Google Scholar 

  4. Nadkarni, P.M., Ohno-Machado, L., Chapman, W.W.: Natural language processing: an introduction. J. Am. Med. Inform. Assoc. 18(5), 544–551 (2011)

    Article  Google Scholar 

  5. Lewis-Kraus, G.: The Great A.I. Awakening. The New York Times Magazine. http://www.nytimes.com/2016/12/14/magazine/the-great-ai-awakening.html. Accessed 14 Dec 2016

  6. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing, vol. 999. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  7. Deng, L., Dong, Y.: Deep learning: methods and applications. Found. Trends Signal Process. 7(3–4), 197–387 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  8. Nielsen, M.A.: Neural networks and deep learning, 2016. http://neuralnetworksanddeeplearning.com/. Accessed 21 Dec 2016

  9. McAllester, D.: Interviewed by author. November 30, 2016

    Google Scholar 

  10. Sproat, R.: Interviewed by author. December 7, 2016

    Google Scholar 

  11. Brew, C.: Interviewed by author. December 1, 2016

    Google Scholar 

  12. Hollingshead Seitz, K.: Interviewed by author. December 11, 2016

    Google Scholar 

  13. Bates, M., Bobrow, R.J., Weischedel, R.M.: Critical challenges for natural language processing. In: Challenges in Natural Language Processing, pp. 3–34 (1993)

    Google Scholar 

  14. Bänziger, T., Scherer, K.R.: The role of intonation in emotional expressions. Speech Commun. 46(3), 252–267 (2005)

    Article  Google Scholar 

  15. Nakassis, C., Snedeker, J.: Beyond sarcasm: intonation and context as relational cues in children’s recognition of irony. In: Proceedings of the Twenty-Sixth Boston University Conference on Language Development. Cascadilla Press, Somerville, MA, pp. 429–440 (2002)

    Google Scholar 

  16. Liberman, M., Prince, A.: On stress and linguistic rhythm. Linguist. Inq. 8(2), 249–336 (1977)

    Google Scholar 

  17. Bolt, R.A.: ‘Put-that-there’: voice and gesture at the graphics interface. In: Maybury, M.T., Wahlster, W. (eds.)Readings in Intelligent User Interfaces, pp. 19–28. Morgan Kaufmann Publishers Inc., San Francisco (1998)

    Google Scholar 

  18. Cavar, D.: Interviewed by author. November 8, 2016

    Google Scholar 

  19. Chandarana, M., et al.: A natural interaction interface for UAVs using intuitive gesture recognition. In: Savage-Knepshield, P., Chen, J. (eds.) Advances in Human Factors in Robots and Unmanned Systems, pp. 387–398. Springer International Publishing, Berlin (2017)

    Chapter  Google Scholar 

  20. Bulyko, I., Ostendorf, M., Siu, M., Ng, T., Stolcke, A., Çetin, Ö.: Web resources for language modeling in conversational speech recognition. ACM Trans. Speech Lang. Process. 5(1), 1 (2007)

    Article  Google Scholar 

  21. Gershgorn, D.: Oxford University’s lip-reading AI is more accurate than humans, but still has a way to go. Quartz. http://qz.com/829041/oxford-lip-reading-artificial-intelligence/. Accessed 07 Nov 2016

  22. Sensory, Inc. Linguist Technologist. Interviewed by author. December 11, 2016

    Google Scholar 

  23. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Pearson, Upper Saddle River, NJ (2009)

    Google Scholar 

  24. Turing, A.M.: Computing machinery and intelligence. Mind 59(236), 433–460 (1950)

    Article  MathSciNet  Google Scholar 

  25. Chandarana, M., et al.: Fly like this: Natural language interfaces for uav mission planning. In: Proceedings of the 10th International Conference on Advances in Computer-Human Interaction. ThinkMind (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erica L. Meszaros .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Meszaros, E.L., Chandarana, M., Trujillo, A., Allen, B.D. (2018). Compensating for Limitations in Speech-Based Natural Language Processing with Multimodal Interfaces in UAV Operation. In: Chen, J. (eds) Advances in Human Factors in Robots and Unmanned Systems. AHFE 2017. Advances in Intelligent Systems and Computing, vol 595. Springer, Cham. https://doi.org/10.1007/978-3-319-60384-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60384-1_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60383-4

  • Online ISBN: 978-3-319-60384-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics