Disambiguation in Unknown Object Detection by Integrating Image and Speech Recognition Confidences

Ozasa, Yuko; Ariki, Yasuo; Nakano, Mikio; Iwahashi, Naoto

doi:10.1007/978-3-642-37331-2_7

Disambiguation in Unknown Object Detection by Integrating Image and Speech Recognition Confidences

Yuko Ozasa²⁰,
Yasuo Ariki²⁰,
Mikio Nakano²¹ &
…
Naoto Iwahashi²²

Conference paper

8367 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7724))

Abstract

This paper presents a new method to detect unknown objects and their unknown names in object manipulation through man-robot dialog. In the method, the detection is carried out by using the information of object images and user’s speech in an integrated way. Originality of the method is to use logistic regression for the discrimination between unknown and known objects. The accuracy of the unknown object detection was 97% in the case when there were about fifty known objects.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Araki, T., et al.: Autonomous Acquisition of Multimodal Information for Online Object Concept Formation by Robots. In: IEEE International Conference on Intelligent Robots and Systems (2011)
Google Scholar
Holzapfel, H., et al.: A Dialogue Approach to Learning Object Descriptions and Semantic Categories. Robotics and Autonomous Systems 56(11), 1004–1013 (2008)
Article Google Scholar
Nakano, M., et al.: Grounding New Words on The Physical World in Multi-Domain Human-Robot Dialogues. In: Dialog with Robots: Papers from the AAAI Fall Symposium (2010)
Google Scholar
Steels, L., Kaplan, F.: AIBO’s first words: The social learning of language and meaning. Evolution of Communication 4(1), 3–32 (2002)
Article Google Scholar
Skocaj, D., et al.: A basic cognitive system for interactive continuous learning of visual concepts. In: ICRA 2010 Workshop (2010)
Google Scholar
Zuo, X., et al.: Detecting Robot-Directed Speech by Situated Understanding in Physical Interaction. Journal of Artificial Intelligence 25(25), 670–682 (2010)
Google Scholar
Julius, http://julius.sourceforge.jp/
Jiang, H.: Confidence Measures for Speech Recognition: A survey. Speech Communication 45, 455–470 (2005)
Article Google Scholar
Persoon, E., Fu, K.S.: Shape Discrimination Using Fourier Descriptors. IEEE Trans. Accoust. Speech Signal Processing 28(4), 170–179 (1977)
MathSciNet Google Scholar
Kurita, T.: Interactive Weighted Least Squares Algorithms for Neural Networks Classifiers. In: Proc. Workshop on Algorithmic Learning Theory, pp. 77–86 (1992)
Google Scholar
Bishop, C.: Pattern Recognition and Machine Learning. Springer Science+Business Media, LLC, New York (2006)
MATH Google Scholar
Kinect, http://www.microsoft.com/en-us/kinectforwindows/

Download references

Author information

Authors and Affiliations

Graduate School of System Informatics, Kobe University, 1–1, Rokkodaicho, Nada-ku, Kobe, Hyogo, 657–8501, Japan
Yuko Ozasa & Yasuo Ariki
Honda Research Institute Japan Co., Ltd., 8–1 Honcho, Wako-shi, Saitama, 351–0188, Japan
Mikio Nakano
Keihanna Research Laboratories, National Institute of Information and Communications Technology, 3–5 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619–0289, Japan
Naoto Iwahashi

Authors

Yuko Ozasa
View author publications
You can also search for this author in PubMed Google Scholar
Yasuo Ariki
View author publications
You can also search for this author in PubMed Google Scholar
Mikio Nakano
View author publications
You can also search for this author in PubMed Google Scholar
Naoto Iwahashi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, Seoul National University, 1 Gwanak-ro, 151-744, Gwanak-gu, Seoul, Korea
Kyoung Mu Lee
Microsoft Research Asia, No. 5, Danling st., Haidian district, 100080, Beijing, P.R. China
Yasuyuki Matsushita
School of Interactive Computing, Georgia Institute of Technology, 801 Atlantic Drive, CCB 315, 30332, Atlanta, GA, USA
James M. Rehg
Institute of Automation, National Laboratory of Pattern Recognition, Chinese Academy of Sciences, Zhong Quan Cun East Road 95, Haidian District, 100 190, Beijing, P.R. China
Zhanyi Hu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ozasa, Y., Ariki, Y., Nakano, M., Iwahashi, N. (2013). Disambiguation in Unknown Object Detection by Integrating Image and Speech Recognition Confidences. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds) Computer Vision – ACCV 2012. ACCV 2012. Lecture Notes in Computer Science, vol 7724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37331-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-37331-2_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37330-5
Online ISBN: 978-3-642-37331-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics