Object manipulation with a variable-stiffness robotic mechanism using deep neural networks for visual semantics and load estimation


In recent years, the computer vision applications in the robotics have been improved to approach human-like visual perception and scene/context understanding. Following this aspiration, in this study, we explored the possibility of better object manipulation performance by connecting the visual recognition of objects to their physical attributes, such as weight and center of gravity (CoG). To develop and test this idea, an object manipulation platform is built comprising a robotic arm, a depth camera fixed at the top center of the workspace, embedded encoders in the robotic arm mechanism, and microcontrollers for position and force control. Since both the visual recognition and force estimation algorithms use deep learning principles, the test set-up was named as Deep-Table. The objects in the manipulation tests are selected from everyday life and are common to be seen on modern office desktops. The visual object localization and recognition processes are performed from two distinct branches by deep convolutional neural network architectures. We present five of the possible cases, having different levels of information availability on the object weight and CoG in the experiments. The results confirm that using our algorithm, the robotic arm can move different types of objects successfully varying from several grams (empty bottle) to around 250 g (ceramic cup) without failure or tipping. The proposed method also shows that connecting the object recognition with load estimation and contact point further improves the performance characterized by a smoother motion.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  1. 1.

    Qian Y, Bi M, Tan T, Yu K (2016) Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans Audio Speech Lang Process 24(12):2329–9290

    Google Scholar 

  2. 2.

    Xiong W, Droppo J, Huang X, Seide F, Seltzer M, Stolcke A, Yu D, Zweig G (2016) Achieving human parity in conversational speech recognition. arXiv preprint arXiv:1610.05256

  3. 3.

    Moussa A, Mohamed H, Feng J, Kuanquan W, Amel A (2018) Very deep feature extraction and fusion for arrhythmias detection. Neural Comput Appl 30:2047–2057

    Google Scholar 

  4. 4.

    Haithem H, Olfa M, Ezzeddine Z (2018) Convolutional neural network-based multimodal image fusion via similarity learning in the shearlet domain. Neural Comput Appl 30:2029–2045

    Google Scholar 

  5. 5.

    Weiwei Y, Chenliang L, Donghai G, Guangjie H, Masood KA (2018) Socialized healthcare service recommendation using deep learning. Neural Comput Appl 30:2071–2082

    Google Scholar 

  6. 6.

    Xiong W, Droppo J, Huang X, Seide F, Seltzer M, Stolcke A, Yu D, Zweig G (2016) The microsoft 2016 conversational speech recognition system. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5255–5259. https://doi.org/10.1109/ICASSP.2017.7953159

  7. 7.

    Fong T, Nourbakhsh I, Dautenhahn K (2003) A survey of socially interactive robots. Robot Auton Syst 42(3–4):143–166

    MATH  Google Scholar 

  8. 8.

    Cheng G (ed) (2014) Humanoid robotics and neuroscience: science, engineering and society. CRC Press, Boca Raton

    Google Scholar 

  9. 9.

    Lemaignan S, Warnier M, Sisbot EA, Clodic A, Alami R (2017) Artificial cognition for social human-robot interaction: an implementation. Artif Intell 247:45–69

    MathSciNet  Google Scholar 

  10. 10.

    Bayraktar E, Yigit CB, Boyraz P (2018) A hybrid image dataset towards bridging the gap between real and simulation environments for robotics. Mach Vis Appl. https://doi.org/10.1007/s00138-018-0966-3

    Google Scholar 

  11. 11.

    Bailey DG (1995) Pixel calibration techniques. In: Proceedings of the New Zealand image and vision computing workshop, pp 37–42

  12. 12.

    Yigit CB, Bayraktar E, Boyraz P (2018) Low-cost variable stiffness joint design using translational variable radius pulleys. Mech Mach Theory 130:203–219

    Google Scholar 

  13. 13.

    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436

    Google Scholar 

  14. 14.

    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Google Scholar 

  15. 15.

    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252

    MathSciNet  Google Scholar 

  16. 16.

    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  17. 17.

    Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, Cham, pp 818–833

  18. 18.

    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  19. 19.

    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  20. 20.

    Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

  21. 21.

    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  22. 22.

    Szegedy C, Toshev A, Erhan D (2013) Deep neural networks for object detection. In: Advances in neural information processing systems, pp 2553–2561

  23. 23.

    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  24. 24.

    Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  25. 25.

    Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vision 104(2):154–171

    Google Scholar 

  26. 26.

    Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  27. 27.

    Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  28. 28.

    Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: Single shot multibox detector. European conference on computer vision. Springer, Cham, pp 21–37

    Google Scholar 

  29. 29.

    Lin TY, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2017) Feature Pyramid Networks for Object Detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 1, p 4

  30. 30.

    Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. arXiv preprint arXiv:1708.02002

  31. 31.

    Wahrburg A, Zeiss S, Matthias B, Ding H (2014) Contact force estimation for robotic assembly using motor torques. In: 2014 IEEE international conference on automation science and engineering (CASE). IEEE, pp 1252–1257

  32. 32.

    Ugurlu B, Nishimura M, Hyodo K, Kawanishi M, Narikiyo T (2012) A framework for sensorless torque estimation and control in wearable exoskeletons. In: 2012 12th IEEE international workshop on advanced motion control (AMC), pp 1–7. IEEE

  33. 33.

    Yigit CB (2018) Novel mechanism and controller design for hybrid force-position control of humanoid robots. Istanbul Technical University, Istanbul, Turkey (phd thesis)

    Google Scholar 

  34. 34.

    Narendra KS, Parthasarathy K (1990) Identification and control of dynamical systems using neural networks. IEEE Trans Neural Netw 1(1):4–27

    Google Scholar 

  35. 35.

    Yegerlehner JD, Meckl PH (1993) Experimental implementation of neural network controller for robot undergoing large payload changes. In Proceedings 1993 IEEE international conference on robotics and automation, 1993. IEEE, pp 744–749

  36. 36.

    Nho HC, Meckl P (2003) Intelligent feedforward control and payload estimation for a two-link robotic manipulator. IEEE/ASME Trans Mechatron 8(2):277–282

    Google Scholar 

  37. 37.

    Leahy MB, Johnson MA, Rogers SK (1991) Neural network payload estimation for adaptive robot control. IEEE Trans Neural Networks 2(1):93–100

    Google Scholar 

  38. 38.

    Eski İ, Kırnap A (2018) Controller design for upper limb motion using measurements of shoulder, elbow and wrist joints. Neural Comput Appl 30(1):307–325

    Google Scholar 

  39. 39.

    Byravan A, Fox D (2017) SE3-nets: Learning rigid body motion using deep neural networks. In 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 173–180

  40. 40.

    Smith AC, Mobasser F, Hashtrudi-Zaad K (2006) Neural-network-based contact force observers for haptic applications. IEEE Trans Rob 22(6):1163–1175

    Google Scholar 

  41. 41.

    Decety J, Grèzes J (1999) Neural mechanisms subserving the perception of human actions. Trends Cognit Sci 3(5):172–178

    Google Scholar 

  42. 42.

    Triloka J, Senanayake SA, Lai D (2017) Neural computing for walking gait pattern identification based on multi-sensor data fusion of lower limb muscles. Neural Comput Appl 28(1):65–77

    Google Scholar 

  43. 43.

    McIntyre J, Zago M, Berthoz A, Lacquaniti F (2001) Does the brain model Newton’s laws? Nat Neurosci 4(7):693

    Google Scholar 

  44. 44.

    Friedman J, Flash T (2007) Task-dependent selection of grasp kinematics and stiffness in human object manipulation. Cortex 43(3):444–460

    Google Scholar 

  45. 45.

    Helbig HB, Graf M, Kiefer M (2006) The role of action representations in visual object recognition. Exp Brain Res 174(2):221–228

    Google Scholar 

  46. 46.

    Negri GA, Rumiati RI, Zadini A, Ukmar M, Mahon BZ, Caramazza A (2007) What is the role of motor simulation in action and object recognition? Evidence from apraxia. Cognit Neuropsychol 24(8):795–816

    Google Scholar 

  47. 47.

    Gupta A, Kembhavi A, Davis LS (2009) Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans Pattern Anal Mach Intell 31(10):1775–1789

    Google Scholar 

  48. 48.

    Van Cuong P, Nan WY (2016) Adaptive trajectory tracking neural network control with robust compensator for robot manipulators. Neural Comput Appl 27(2):525–536

    Google Scholar 

  49. 49.

    Bohg J, Welke K, León B, Do M, Song D, Wohlkinger W, Aldoma A, Madry M, Przybylski M, Asfour T, Martí H (2012) Task-Based Grasp Adaptation on a Humanoid Robot. In: SyRoCo, pp 779–786

  50. 50.

    Howard M, Braun DJ, Vijayakumar S (2013) Transferring human impedance behavior to heterogeneous variable impedance actuators. IEEE Trans Rob 29(4):847–862

    Google Scholar 

  51. 51.

    Botzer L, Karniel A (2013) Feedback and feedforward adaptation to visuomotor delay during reaching and slicing movements. Eur J Neurosci 38(1):2108–2123

    Google Scholar 

  52. 52.

    Koppula HS, Saxena A (2016) Anticipating human activities using object affordances for reactive robotic response. IEEE Trans Pattern Anal Mach Intell 38(1):14–29

    Google Scholar 

  53. 53.

    Matsui H, Ryu M, Kawabata H (2017) Visual feedback of target position affects accuracy of sequential movements at even spaces. J Motor Behav. https://doi.org/10.1080/00222895.2017.1407744

    Google Scholar 

  54. 54.

    Shepard RN (1978) The mental image. Am Psychol 33(2):125

    Google Scholar 

  55. 55.

    Pylyshyn ZW (1973) What the mind’s eye tells the mind’s brain: a critique of mental imagery. Psychol Bull 80(1):1

    Google Scholar 

  56. 56.

    Gregory RL (2015) Eye and brain: the psychology of seeing. Princeton University Press, Princeton

    Google Scholar 

  57. 57.

    Jolicoeur P, Gluck MA, Kosslyn SM (1984) Pictures and names: making the connection. Cogn Psychol 16(2):243–275

    Google Scholar 

  58. 58.

    Yuan Y, Kitani K (2019) Ego-pose estimation and forecasting as real-time PD control. arXiv preprint arXiv:1906.03173

  59. 59.

    Chealse F, Xin YuT, Yan D, Trevor D, Sergey L, Pieter A (2015) Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. CoRR 16(2):243–275

    Google Scholar 

  60. 60.

    Florence PR, Manuelli L, Tedrake R (2018) Dense object nets: Learning dense visual object descriptors by and for robotic manipulation. arXiv preprint arXiv:1806.08756

  61. 61.

    Karayiannidis Y, Smith C, Vina FE, Kragic D (2014) Online contact point estimation for uncalibrated tool use. In: 2014 IEEE international conference on robotics and automation (ICRA). IEEE, pp 2488–2494

  62. 62.

    Yu KT, Rodriguez A (2018) Realtime state estimation with tactile and visual sensing. application to planar manipulation. In 2018 IEEE international conference on robotics and automation (ICRA). IEEE, pp 7778–7785

  63. 63.

    Calli B, Singh A, Walsman A, Srinivasa S, Abbeel P, Dollar AM (2015) The ycb object and model set: Towards common benchmarks for manipulation research. In: 2015 international conference on advanced robotics (ICAR). IEEE, pp 510–517

  64. 64.

    Levine S, Wagener N, Abbeel P (2015) Learning contact-rich manipulation skills with guided policy search arXiv preprint arXiv:1501.05611

  65. 65.

    Saravanakumar R, Rajchakit G, Ahn CK, Karimi HR (2017) Exponential stability, passivity, and dissipativity analysis of generalized neural networks with mixed time-varying delays. IEEE Trans Syst Man Cybern Syst 49(2):395–405

    Google Scholar 

  66. 66.

    Saravanakumar R, Rajchakit G, Ali MS, Xiang Z, Joo YH (2018) Robust extended dissipativity criteria for discrete-time uncertain neural networks with time-varying delays. Neural Comput Appl 30(12):3893–3904

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Ertugrul Bayraktar.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bayraktar, E., Yigit, C.B. & Boyraz, P. Object manipulation with a variable-stiffness robotic mechanism using deep neural networks for visual semantics and load estimation. Neural Comput & Applic 32, 9029–9045 (2020). https://doi.org/10.1007/s00521-019-04412-5

Download citation


  • Deep neural networks
  • Object recognition
  • Robotic manipulation
  • Context awareness
  • Force estimation