Object Tracking Through Residual and Dense LSTMs

  • Fabio GarceaEmail author
  • Alessandro Cucco
  • Lia Morra
  • Fabrizio Lamberti
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12132)


Visual object tracking task is constantly gaining importance in several fields of application as traffic monitoring, robotics, and surveillance, to name a few. Dealing with changes in the appearance of the tracked object is paramount to achieve high tracking accuracy, and is usually achieved by continually learning features. Recently, deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative, bypassing the need to retrain the feature extraction in an online fashion. Inspired by the success of residual and dense networks in image recognition, we propose here to enhance the capabilities of hybrid trackers using residual and/or dense LSTMs. By introducing skip connections, it is possible to increase the depth of the architecture while ensuring a fast convergence. Experimental results on the Re\(^{3}\) tracker show that DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances such as occlusions and out-of-view objects. Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.


Object tracking Recurrent neural networks Residual networks 


  1. 1.
    Gordon, D., Farhadi, A., Fox, D.: Re\(^3\): real-time recurrent regression networks for visual tracking of generic objects. IEEE Robot. Autom. Lett. 3(2), 788–795 (2018)CrossRefGoogle Scholar
  2. 2.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)Google Scholar
  3. 3.
    Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks (2016)Google Scholar
  4. 4.
    Kim, H.I., Park, R.H.: Residual LSTM attention network for object tracking. IEEE Signal Process. Lett. 25(7), 1029–1033 (2018)CrossRefGoogle Scholar
  5. 5.
    Kim, J., El-Khamy, M., Lee, J.: Residual LSTM: design of a deep recurrent architecture for distant speech recognition. arXiv preprint arXiv:1701.03360 (2017)
  6. 6.
    Ding, Z., Xia, R., Yu, J., Li, X., Yang, J.: Densely connected bidirectional LSTM with applications to sentence classification. In: Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H. (eds.) NLPCC 2018. LNCS (LNAI), vol. 11109, pp. 278–287. Springer, Cham (2018). Scholar
  7. 7.
    Gao, T., Du, J., Dai, L.R., Lee, C.H.: Densely connected progressive learning for LSTM-based speech enhancement. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5054–5058. IEEE (2018)Google Scholar
  8. 8.
    Wang, J., Peng, B., Zhang, X.: Using a stacked residual LSTM model for sentiment intensity prediction. Neurocomputing 322, 93–101 (2018)CrossRefGoogle Scholar
  9. 9.
    Ali, A., et al.: Visual object tracking–classical and contemporary approaches. Front. Comput. Sci. 10(1), 167–188 (2016)CrossRefGoogle Scholar
  10. 10.
    Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). Scholar
  11. 11.
    Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking (2015)Google Scholar
  12. 12.
    He, K., Sun, J.: Convolutional neural networks at constrained time cost. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5353–5360 (2015)Google Scholar
  13. 13.
    He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). Scholar
  14. 14.
    Bouthillier, X., Laurent, C., Vincent, P.: Unreproducible research is reproducible. In: International Conference on Machine Learning, pp. 725–734 (2019)Google Scholar
  15. 15.
    Marrone, S., Olivieri, S., Piantadosi, G., Sansone, C.: Reproducibility of deep CNN for biomedical image processing across frameworks and architectures. In: 2019 27th European Signal Processing Conference (EUSIPCO), pp. 1–5. IEEE (2019)Google Scholar
  16. 16.
    Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)Google Scholar
  17. 17.
    Yang, T., Chan, A.B.: Recurrent filter learning for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2010–2019 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Dipartimento di Automatica e InformaticaPolitecnico di TorinoTurinItaly

Personalised recommendations