Abstract
Among the most crucial components of an intelligent system capable of assisting drone-based cinematography is estimating the pose of the main actors. However, training deep CNNs towards this task is not straightforward, mainly due to the noisy nature of the data and instabilities that occur during the learning process, significantly slowing down the development of such systems. In this work we propose a temporal averaging technique that is capable of stabilizing as well as speeding up the convergence of stochastic optimization techniques for neural network training. We use two face pose estimation datasets to experimentally verify that the proposed method can improve both the convergence of training algorithms and the accuracy of pose estimation. This also reduces the risk of stopping the training process when a bad descent step was taken and the learning rate was not appropriately set, ensuring that the network will perform well at any point of the training process.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)
Goudelis, G., Tefas, A., Pitas, I.: Emerging biometric modalities: a survey. J. Multimodal User Interfaces 2(3), 217–235 (2008)
Gourier, N., Hall, D., Crowley, J.L.: Estimating face orientation from robust detection of salient facial structures. In: FG NET Workshop on Visual Observation of Deictic Gestures (2004)
Haykin, S., Network, N.: A comprehensive foundation. Neural Netw. 2(2004), 41 (2004)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 448–456 (2015)
Jarrett, K., Kavukcuoglu, K., LeCun, Y., et al.: What is the best multi-stage architecture for object recognition? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2146–2153 (2009)
Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Koestinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies (2011)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: single shot MultiBox detector. In: Proceedings of the European Conference on Computer Vision, pp. 21–37 (2016)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Nousi, P., Tefas, A.: Deep learning algorithms for discriminant autoencoding. Neurocomputing (2017)
Passalis, N., Tefas, A.: Learning neural bag-of-features for large-scale image retrieval. IEEE Trans. Syst. Man Cybern.: Syst. (2017)
Passalis, N., Tefas, A.: Neural bag-of-features learning. Pattern Recogn. 64, 277–294 (2017)
Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4), 838–855 (1992)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Ruppert, D.: Efficient estimations from a slowly convergent robbins-monro process. Cornell University Operations Research and Industrial Engineering, Technical report (1988)
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. arXiv preprint arXiv:1505.00387 (2015)
Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)
Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879–2886 (2012)
Acknowledgments
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 731667 (MULTIDRONE). This publication reflects the authors’ views only. The European Commission is not responsible for any use that may be made of the information it contains.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Passalis, N., Tefas, A. (2017). Improving Face Pose Estimation Using Long-Term Temporal Averaging for Stochastic Optimization. In: Boracchi, G., Iliadis, L., Jayne, C., Likas, A. (eds) Engineering Applications of Neural Networks. EANN 2017. Communications in Computer and Information Science, vol 744. Springer, Cham. https://doi.org/10.1007/978-3-319-65172-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-65172-9_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65171-2
Online ISBN: 978-3-319-65172-9
eBook Packages: Computer ScienceComputer Science (R0)