A Study of Action Recognition Problems: Dataset and Architectures Perspectives

Chawky, Bassel S.; Elons, A. S.; Ali, A.; Shedeed, Howida A.

doi:10.1007/978-3-319-63754-9_19

Bassel S. Chawky⁴,
A. S. Elons⁴,
A. Ali⁴ &
…
Howida A. Shedeed⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 730))

1913 Accesses
4 Citations

Abstract

Action recognition field has recently grown dramatically due to its importance in many applications like smart surveillance, human–computer interaction, assisting aged citizens or web-video search and retrieval. Many research trials have tackled action recognition as an open problem. Different datasets are built to evaluate architectures variations. In this survey, different action recognition datasets are explored to highlight their ability to evaluate different models. In addition, for each dataset, a usage is proposed based on the content and format of data it includes, the number of classes and challenges it covers. On other hand, another exploration for different architectures is drawn showing the contribution of each of them to handle different action recognition problem challenges and the scientific explanation behind their results. An overall of 21 datasets is covered with 13 architectures that are shallow and deep models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Shao, L., Jones, S., Li, X.: Efficient search and localization of human actions in video databases. IEEE Trans. Circuits Syst. Video Technol. 24(3), 504–512 (2014)
Article Google Scholar
Wang, F., Xu, D., Lu, W., Xu, H.: Automatic annotation and retrieval for videos. In: Pacific-Rim Symposium on Image and Video Technology, pp. 1030–1040. Springer, Heidelberg (2006)
Google Scholar
Hung, M.H., Pan, J.S.: A real-time action detection system for surveillance videos using template matching. J. Inf. Hiding Multimedia Signal Process. 6(6), 1088–1099 (2015)
Google Scholar
Campo, E., Chan, M.: Detecting abnormal behaviour by real-time monitoring of patients. In: Proceedings of the AAAI-02 Workshop Automation as Caregiver, pp. 8–12 (2002)
Google Scholar
Mumtaz, M., Habib, H. A.: Evaluation of Activity Recognition Algorithms for Employee Performance Monitoring. Int. J. Comput. Sci. Issues (IJCSI), 9(5), 203–210 (2012)
Google Scholar
Regneri, M., Rohrbach, M., Wetzel, D., Thater, S., Schiele, B., Pinkal, M.: Grounding action descriptions in videos. Trans. Assoc. Comput. Linguist. 1, 25–36 (2013)
Google Scholar
Guo, G., Lai, A.: A survey on still image based human action recognition. Pattern Recogn. 47(10), 3343–3361 (2014)
Article Google Scholar
Rodriguez, M.: Spatio-temporal maximum average correlation height templates in action recognition and video summarization (2010)
Google Scholar
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 2929–2936. IEEE (2009)
Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: 2011 International Conference on Computer Vision, pp. 2556–2563. IEEE (2011)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 32–36. IEEE (2004)
Google Scholar
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1996–2003. IEEE (2009)
Google Scholar
Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24(5), 971–981 (2013)
Article Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: UCF101: A dataset of 101 human actions classes from videos in the wild (2012). arXiv:1212.0402
Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8. IEEE (2007)
Google Scholar
Jhuang, H., et al.: Towards understanding action recognition. In: Proceedings of the IEEE International Conference on Computer Vision (2013)
Google Scholar
Rohrbach, M., Amin, S., Andriluka, M., Schiele, B.: A database for fine grained activity detection of cooking activities. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1194–1201. IEEE (2012)
Google Scholar
http://www.murase.m.is.nagoya-u.ac.jp/KSCGR/. Accessed 29 Jan 2013
Escalera, S., Gonzàlez, J., Baró, X., Reyes, M., Lopes, O., Guyon, I., Escalante, H.: Multi-modal gesture recognition challenge 2013: dataset and results. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 445–452. ACM (2013)
Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale Video Classification with Convolutional Neural Networks (2014)
Google Scholar
Badler, N. I., O’Rourke, J., Platt, S., Morris, M. A.: Human movement understanding: a variety of perspectives. In: AAAI, pp. 53–55 (1980)
Google Scholar
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (pp. 65–72). IEEE (2005)
Google Scholar
Klaser, A., Marszałek, M., Schmid, C. A spatio-temporal descriptor based on 3d-gradients. In: BMVC 2008–19th British Machine Vision Conference, pp. 275–1. British Machine Vision Association (2008)
Google Scholar
Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: European Conference on Computer Vision, pp. 650–663. Springer, Heidelberg (2008)
Google Scholar
Wang, H., Ullah, M. M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC 2009-British Machine Vision Conference, pp. 124–1. BMVA Press (2009)
Google Scholar
Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice. Comput. Vis. Image Underst. (2016).
Google Scholar
Dodge, S. F., Karam, L.J.: Is Bottom-Up Attention Useful for Scene Recognition? (2013). arXiv:1307.5702
Peng, X., Zou, C., Qiao, Y., Peng, Q.: Action recognition with stacked fisher vectors. In: European Conference on Computer Vision, pp. 581–595. Springer International Publishing (2014)
Google Scholar
Fernando, B., Gavves, E., Oramas, J., Ghodrati, A., Tuytelaars, T.: Rank pooling for action recognition (2016)
Google Scholar
Wang, L., Qiao, Y., Tang, X. Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4305–4314 (2015)
Google Scholar
Bottou, L., Vapnik, V.: Local learning algorithms. Neural Comput. 4(6), 888–900 (1992)
Article Google Scholar
Strasburger, H., Rentschler, I., Jüttner, M.: Peripheral vision and pattern recognition: a review. J. Vis. 11(5), 13–13 (2011)
Google Scholar
Ni, B., Paramathayalan, V.R., Moulin, P.: Multiple granularity analysis for fine-grained action detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 756–763 (2014)
Google Scholar
Freedman, R.G., Jung, H.T., Zilberstein, S.: Plan and activity recognition from a topic modeling perspective. In: ICAPS (2014)
Google Scholar
Serre, T., Kreiman, G., Kouh, M., Cadieu, C., Knoblich, U., Poggio, T.: A quantitative theory of immediate visual recognition. Prog. Brain Res. 165, 33–56 (2007)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: Deep networks for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694–4702 (2015)
Google Scholar
Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., Courville, A.: Describing videos by exploiting temporal structure. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4507–4515 (2015)
Google Scholar
Salakhutdinov, R., Hinton, G.E.: Deep boltzmann machines. In: AISTATS, vol. 1, p. 3 (2009)
Google Scholar
Taylor, G.W., Fergus, R., LeCun, Y., Bregler, C.: Convolutional learning of spatio-temporal features. In: European Conference on Computer Vision, pp. 140–153. Springer, Heidelberg (2010)
Google Scholar
Le, Q. V.: Building high-level features using large scale unsupervised learning. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing pp. 8595–8598 (2013)
Google Scholar
Sun, L., Jia, K., Chan, T.H., Fang, Y., Wang, G., Yan, S.: DL-SFA: deeply-learned slow feature analysis for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2632 (2014)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009. pp. 248–255. IEEE (2009)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Berg, A.C.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Google Scholar
Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: A database for studying face recognition in unconstrained environments, vol. 1, no. 2, p. 3, Technical Report 07-49, University of Massachusetts, Amherst (2007)
Google Scholar
Zhang, W., Sun, J., Tang, X.: Cat head detection-how to effectively exploit shape and texture features. In: European Conference on Computer Vision, pp. 802–816. Springer, Heidelberg (2008)
Google Scholar
Keller, C. G., Enzweiler, M., Gavrila, D. M.: A new benchmark for stereo-based pedestrian detection. In: 2011 IEEE Intelligent Vehicles Symposium (IV), pp. 691–696. IEEE (2011)
Google Scholar
Chen, D.L., Dolan, W.B.: Collecting highly parallel data for paraphrase evaluation. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 190–200. Association for Computational Linguistics (2011)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Google Scholar
Denkowski, M., Lavie, A.: Meteor universal: Language specific translation evaluation for any target language. In: Proceedings of the Ninth Workshop on Statistical Machine Translation (2014)
Google Scholar
Vedantam, R., Lawrence Zitnick, C., Parikh, D.: Cider: consensus-based image description evaluation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4566–4575 (2015)
Google Scholar
Torabi, A., Pal, C., Larochelle, H., Courville, A.: Using descriptive video services to create a large data source for video annotation research (2015). arXiv:1503.01070

Download references

Author information

Authors and Affiliations

Faculty of Computer and Information Sciences, Scientific Computing Department, Ain Shams University, Cairo, Egypt
Bassel S. Chawky, A. S. Elons, A. Ali & Howida A. Shedeed

Authors

Bassel S. Chawky
View author publications
You can also search for this author in PubMed Google Scholar
A. S. Elons
View author publications
You can also search for this author in PubMed Google Scholar
A. Ali
View author publications
You can also search for this author in PubMed Google Scholar
Howida A. Shedeed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bassel S. Chawky .

Editor information

Editors and Affiliations

Faculty of Computers and Information, Information Technology Department, Cairo University, Giza, Egypt
Aboul Ella Hassanien
CUCEI, Departamento de Ciencias Computacionales, Universidad de Guadalajara, Guadalajara, Mexico
Diego Alberto Oliva

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chawky, B.S., Elons, A.S., Ali, A., Shedeed, H.A. (2018). A Study of Action Recognition Problems: Dataset and Architectures Perspectives. In: Hassanien, A., Oliva, D. (eds) Advances in Soft Computing and Machine Learning in Image Processing. Studies in Computational Intelligence, vol 730. Springer, Cham. https://doi.org/10.1007/978-3-319-63754-9_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-63754-9_19
Published: 15 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63753-2
Online ISBN: 978-3-319-63754-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics