Object Detection-Based Location and Activity Classification from Egocentric Videos: A Systematic Analysis

Kapidis, Georgios; Poppe, Ronald; van Dam, Elsbeth; Noldus, Lucas P. J. J.; Veltkamp, Remco C.

doi:10.1007/978-3-030-25590-9_6

Georgios Kapidis^12,13,
Ronald Poppe¹³,
Elsbeth van Dam¹²,
Lucas P. J. J. Noldus¹² &
…
Remco C. Veltkamp¹³

Part of the book series: Computer Communications and Networks ((CCN))

1399 Accesses
4 Citations

Abstract

Egocentric vision has emerged in the daily practice of application domains such as lifelogging, activity monitoring, robot navigation and the analysis of social interactions. Plenty of research focuses on location detection and activity recognition, with applications in the area of Ambient Assisted Living. The basis of this work is the idea that indoor locations and daily activities can be characterized by the presence of specific objects. Objects can be obtained either from laborious human annotations or automatically, using vision-based detectors. We perform a study regarding the use of object detections as input for location and activity classification and analyze the influence of various detection parameters. We compare our detections against manually provided object labels and show that location classification is affected by detection quality and quantity. Utilization of the temporal structure in object detections mitigates the consequences of noisy ones. Moreover, we determine that the recognition of activities is related to the presence of specific objects and that the lack of explicit associations between certain activities and objects hurts classification performance for these activities. Finally, we discuss the outcomes of each task and our method’s potential for real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Code and data for our experiments are located here: https://github.com/georkap/object-based-location-classification.
2.
https://pjreddie.com/darknet/.

References

Kapidis G, Poppe RW, van Dam EA et al (2018) Where Am I? Comparing CNN and LSTM for location classification in egocentric videos. In: 2018 IEEE international conference on pervasive computing and communications workshops (PerCom Workshops), pp 878–883
Google Scholar
Ma M, Fan H, Kitani KM (2016) Going deeper into first-person activity recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 1894–1903
Google Scholar
del Molino AG, Tan C, Lim J, Tan A (2017) Summarization of egocentric videos: a comprehensive survey. IEEE Trans Hum-Mach Syst 47:65–76. https://doi.org/10.1109/THMS.2016.2623480
Article Google Scholar
Yonetani R, Kitani KM, Sato Y (2016) Recognizing micro-actions and reactions from paired egocentric videos. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2629–2638
Google Scholar
Damen D, Leelasawassuk T, Mayol-Cuevas W (2016) You-Do, I-Learn: egocentric unsupervised discovery of objects and their modes of interaction towards video-based guidance. Comput Vis Image Underst 149:98–112. https://doi.org/10.1016/j.cviu.2016.02.016
Article Google Scholar
Kretch KS, Franchak JM, Adolph KE (2014) Crawling and walking infants see the world differently. Child Dev 85:1503–1518. https://doi.org/10.1111/cdev.12206
Article Google Scholar
Nguyen T-H-C, Nebel J-C, Florez-Revuelta F (2016) Recognition of activities of daily living with egocentric vision: a review. Sensors (Basel, Switzerland) 16:72. https://doi.org/10.3390/s16010072
Article Google Scholar
Karaman S, Benois-Pineau J, Megret R et al (2010) Human daily activities indexing in videos from wearable cameras for monitoring of patients with dementia diseases. In: 2010 20th international conference on pattern recognition, pp 4113–4116
Google Scholar
Teriús-Padrón JG, Kapidis G, Fallmann S et al (2018) Towards self-management of chronic diseases in smart homes: physical exercise monitoring for chronic obstruction pulmonary disease patients. In: 2018 IEEE international conference on pervasive computing and communications workshops (PerCom Workshops), pp 776–781
Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893
Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Article Google Scholar
Pirsiavash H, Ramanan D (2012) Detecting activities of daily living in first-person camera views. In: 2012 IEEE conference on computer vision and pattern recognition, pp 2847–2854
Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6517–6525
Google Scholar
Furnari A, Farinella GM, Battiato S (2016) Temporal segmentation of egocentric videos to highlight personal locations of interest. In: Hua G, Jégou H (eds) Computer vision—ECCV 2016 workshops. Springer International Publishing, pp 474–489
Google Scholar
Furnari A, Farinella GM, Battiato S (2017) Recognizing personal locations from egocentric videos. IEEE Trans Human-Mach Syst 47:6–18. https://doi.org/10.1109/THMS.2016.2612002
Article Google Scholar
Nakamura K, Yeung S, Alahi A, Fei-Fei L (2017) Jointly learning energy expenditures and activities using egocentric multimodal signals. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6817–6826
Google Scholar
Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: 2012 IEEE conference on computer vision and pattern recognition, pp 1346–1353
Google Scholar
Fathi A, Ren X, Rehg JM (2011) Learning to recognize objects in egocentric activities. CVPR 2011:3281–3288
Google Scholar
Fathi A, Li Y, Rehg JM (2012) Learning to recognize daily actions using gaze. In: Fitzgibbon A, Lazebnik S, Perona P et al (eds) Computer vision—ECCV 2012. Springer, Berlin, pp 314–327
Chapter Google Scholar
Poleg Y, Arora C, Peleg S (2014) Temporal segmentation of egocentric videos. In: 2014 IEEE conference on computer vision and pattern recognition, pp 2537–2544
Google Scholar
Betancourt A, Díaz-Rodríguez N, Barakova E et al (2017) Unsupervised understanding of location and illumination changes in egocentric videos. Pervasive Mob Comput 40:414–429. https://doi.org/10.1016/j.pmcj.2017.03.016
Article Google Scholar
Altwaijry H, Moghimi M, Belongie S (2014) Recognizing locations with Google Glass: a case study. In: IEEE winter conference on applications of computer vision, pp 167–174
Google Scholar
Lee N, Kim C, Choi W et al (2017) Development of indoor localization system using a mobile data acquisition platform and BoW image matching. KSCE J Civ Eng 21:418–430. https://doi.org/10.1007/s12205-016-1057-5
Article Google Scholar
Lu G, Yan Y, Sebe N, Kambhamettu C (2017) Indoor localization via multi-view images and videos. Comput Vis Image Underst 161:145–160. https://doi.org/10.1016/j.cviu.2017.05.003
Article Google Scholar
Qian K, Zhao W, Ma Z et al (2018) Wearable-assisted localization and inspection guidance system using egocentric stereo cameras. IEEE Sens J 18:809–821. https://doi.org/10.1109/JSEN.2017.2773487
Article Google Scholar
Dovgalecs V, Mégret R, Berthoumieu Y (2013) Multiple feature fusion based on co-training approach and time regularization for place classification in wearable video. Adv Multimed 2013. https://doi.org/10.1155/2013/175064
Article Google Scholar
Vaca-Castano G, Das S, Sousa JP (2015) Improving egocentric vision of daily activities. In: 2015 IEEE international conference on image processing (ICIP), pp 2562–2566
Google Scholar
Vaca-Castano G, Das S, Sousa JP et al (2017) Improved scene identification and object detection on egocentric vision of daily activities. Comput Vis Image Underst 156:92–103. https://doi.org/10.1016/j.cviu.2016.10.016
Article Google Scholar
Greff K, Srivastava RK, Koutník J et al (2017) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28:2222–2232. https://doi.org/10.1109/TNNLS.2016.2582924
Article MathSciNet Google Scholar
Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv preprint arXiv:150602078
Smith LN (2018) A disciplined approach to neural network hyper-parameters: part 1—learning rate, batch size, momentum, and weight decay. CoRR arXiv preprint arXiv:abs/1803.09820
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28:976–990. https://doi.org/10.1016/j.imavis.2009.11.014
Article Google Scholar
Bambach S, Lee S, Crandall DJ, Yu C (2015) Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: 2015 IEEE international conference on computer vision (ICCV), pp 1949–1957
Google Scholar
Bertasius G, Park HS, Yu SX, Shi J (2017) First person action-object detection with EgoNet. In: Proceedings of robotics: science and systems
Google Scholar
Li Y, Zhefan Y, Rehg JM (2015) Delving into egocentric actions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 287–295
Google Scholar
Poleg Y, Ephrat A, Peleg S, Arora C (2016) Compact CNN for indexing egocentric videos. In: 2016 IEEE winter conference on applications of computer vision (WACV), pp 1–9
Google Scholar
Fathi A, Farhadi A, Rehg JM (2011) Understanding egocentric activities. In: 2011 international conference on computer vision, pp 407–414
Google Scholar
Wray M, Moltisanti D, Mayol-Cuevas W, Damen D (2016) SEMBED: semantic embedding of egocentric action videos. In: Hua G, Jégou H (eds) Computer vision—ECCV 2016 workshops. Springer International Publishing, pp 532–545
Google Scholar
Wu J, Osuntogun A, Choudhury T et al (2007) A scalable approach to activity recognition based on object use. In: 2007 IEEE 11th international conference on computer vision, pp 1–8
Google Scholar
Su Y-C, Grauman K (2016) Leaving some stones unturned: dynamic feature prioritization for activity detection in streaming video. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. Springer International Publishing, pp 783–800
Google Scholar
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788
Google Scholar
Deng J, Dong W, Socher R et al (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255
Google Scholar
Lin T-Y, Maire M, Belongie S et al (2014) Microsoft COCO: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision—ECCV 2014. Springer International Publishing, pp 740–755
Google Scholar
Girshick R (2015) Fast R-CNN. In: 2015 IEEE international conference on computer vision (ICCV), pp 1440–1448
Google Scholar
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. CoRR arXiv preprint arXiv:abs/1804.02767
Nguyen T-H-C, Nebel J-C, Florez-Revuelta F (2018) Recognition of activities of daily living from egocentric videos using hands detected by a deep convolutional network. In: Campilho A, Karray F, ter Haar Romeny B (eds) Image analysis and recognition. Springer International Publishing, pp 390–398
Google Scholar

Download references

Acknowledgements

This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No. 676157.

Author information

Authors and Affiliations

Noldus Information Technology, Wageningen, The Netherlands
Georgios Kapidis, Elsbeth van Dam & Lucas P. J. J. Noldus
Department of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands
Georgios Kapidis, Ronald Poppe & Remco C. Veltkamp

Authors

Georgios Kapidis
View author publications
You can also search for this author in PubMed Google Scholar
Ronald Poppe
View author publications
You can also search for this author in PubMed Google Scholar
Elsbeth van Dam
View author publications
You can also search for this author in PubMed Google Scholar
Lucas P. J. J. Noldus
View author publications
You can also search for this author in PubMed Google Scholar
Remco C. Veltkamp
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georgios Kapidis .

Editor information

Editors and Affiliations

School of Computer Science & Informatics, De Montfort University, Leicester, UK
Feng Chen
Life Supporting Technologies, Universidad Politecnica de Madrid, Madrid, Spain
Rebeca I. García-Betances
School of Computing, Ulster University, Belfast, UK
Liming Chen
Life Supporting Technologies, Universidad Politecnica de Madrid, Madrid, Spain
María Fernanda Cabrera-Umpiérrez
School of Computing, Ulster University, Newtownabbey, UK
Chris Nugent

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kapidis, G., Poppe, R., van Dam, E., Noldus, L.P.J.J., Veltkamp, R.C. (2020). Object Detection-Based Location and Activity Classification from Egocentric Videos: A Systematic Analysis. In: Chen, F., García-Betances, R., Chen, L., Cabrera-Umpiérrez, M., Nugent, C. (eds) Smart Assisted Living. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-030-25590-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-25590-9_6
Published: 21 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25589-3
Online ISBN: 978-3-030-25590-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics