Abstract
Egocentric vision has emerged in the daily practice of application domains such as lifelogging, activity monitoring, robot navigation and the analysis of social interactions. Plenty of research focuses on location detection and activity recognition, with applications in the area of Ambient Assisted Living. The basis of this work is the idea that indoor locations and daily activities can be characterized by the presence of specific objects. Objects can be obtained either from laborious human annotations or automatically, using vision-based detectors. We perform a study regarding the use of object detections as input for location and activity classification and analyze the influence of various detection parameters. We compare our detections against manually provided object labels and show that location classification is affected by detection quality and quantity. Utilization of the temporal structure in object detections mitigates the consequences of noisy ones. Moreover, we determine that the recognition of activities is related to the presence of specific objects and that the lack of explicit associations between certain activities and objects hurts classification performance for these activities. Finally, we discuss the outcomes of each task and our method’s potential for real-world applications.
Parts of this chapter are © 2018 IEEE. Reprinted, with permission, from Kapidis et al. [1].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Code and data for our experiments are located here: https://github.com/georkap/object-based-location-classification.
- 2.
References
Kapidis G, Poppe RW, van Dam EA et al (2018) Where Am I? Comparing CNN and LSTM for location classification in egocentric videos. In: 2018 IEEE international conference on pervasive computing and communications workshops (PerCom Workshops), pp 878–883
Ma M, Fan H, Kitani KM (2016) Going deeper into first-person activity recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 1894–1903
del Molino AG, Tan C, Lim J, Tan A (2017) Summarization of egocentric videos: a comprehensive survey. IEEE Trans Hum-Mach Syst 47:65–76. https://doi.org/10.1109/THMS.2016.2623480
Yonetani R, Kitani KM, Sato Y (2016) Recognizing micro-actions and reactions from paired egocentric videos. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2629–2638
Damen D, Leelasawassuk T, Mayol-Cuevas W (2016) You-Do, I-Learn: egocentric unsupervised discovery of objects and their modes of interaction towards video-based guidance. Comput Vis Image Underst 149:98–112. https://doi.org/10.1016/j.cviu.2016.02.016
Kretch KS, Franchak JM, Adolph KE (2014) Crawling and walking infants see the world differently. Child Dev 85:1503–1518. https://doi.org/10.1111/cdev.12206
Nguyen T-H-C, Nebel J-C, Florez-Revuelta F (2016) Recognition of activities of daily living with egocentric vision: a review. Sensors (Basel, Switzerland) 16:72. https://doi.org/10.3390/s16010072
Karaman S, Benois-Pineau J, Megret R et al (2010) Human daily activities indexing in videos from wearable cameras for monitoring of patients with dementia diseases. In: 2010 20th international conference on pattern recognition, pp 4113–4116
TeriĂºs-PadrĂ³n JG, Kapidis G, Fallmann S et al (2018) Towards self-management of chronic diseases in smart homes: physical exercise monitoring for chronic obstruction pulmonary disease patients. In: 2018 IEEE international conference on pervasive computing and communications workshops (PerCom Workshops), pp 776–781
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Pirsiavash H, Ramanan D (2012) Detecting activities of daily living in first-person camera views. In: 2012 IEEE conference on computer vision and pattern recognition, pp 2847–2854
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6517–6525
Furnari A, Farinella GM, Battiato S (2016) Temporal segmentation of egocentric videos to highlight personal locations of interest. In: Hua G, Jégou H (eds) Computer vision—ECCV 2016 workshops. Springer International Publishing, pp 474–489
Furnari A, Farinella GM, Battiato S (2017) Recognizing personal locations from egocentric videos. IEEE Trans Human-Mach Syst 47:6–18. https://doi.org/10.1109/THMS.2016.2612002
Nakamura K, Yeung S, Alahi A, Fei-Fei L (2017) Jointly learning energy expenditures and activities using egocentric multimodal signals. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6817–6826
Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: 2012 IEEE conference on computer vision and pattern recognition, pp 1346–1353
Fathi A, Ren X, Rehg JM (2011) Learning to recognize objects in egocentric activities. CVPR 2011:3281–3288
Fathi A, Li Y, Rehg JM (2012) Learning to recognize daily actions using gaze. In: Fitzgibbon A, Lazebnik S, Perona P et al (eds) Computer vision—ECCV 2012. Springer, Berlin, pp 314–327
Poleg Y, Arora C, Peleg S (2014) Temporal segmentation of egocentric videos. In: 2014 IEEE conference on computer vision and pattern recognition, pp 2537–2544
Betancourt A, DĂaz-RodrĂguez N, Barakova E et al (2017) Unsupervised understanding of location and illumination changes in egocentric videos. Pervasive Mob Comput 40:414–429. https://doi.org/10.1016/j.pmcj.2017.03.016
Altwaijry H, Moghimi M, Belongie S (2014) Recognizing locations with Google Glass: a case study. In: IEEE winter conference on applications of computer vision, pp 167–174
Lee N, Kim C, Choi W et al (2017) Development of indoor localization system using a mobile data acquisition platform and BoW image matching. KSCE J Civ Eng 21:418–430. https://doi.org/10.1007/s12205-016-1057-5
Lu G, Yan Y, Sebe N, Kambhamettu C (2017) Indoor localization via multi-view images and videos. Comput Vis Image Underst 161:145–160. https://doi.org/10.1016/j.cviu.2017.05.003
Qian K, Zhao W, Ma Z et al (2018) Wearable-assisted localization and inspection guidance system using egocentric stereo cameras. IEEE Sens J 18:809–821. https://doi.org/10.1109/JSEN.2017.2773487
Dovgalecs V, MĂ©gret R, Berthoumieu Y (2013) Multiple feature fusion based on co-training approach and time regularization for place classification in wearable video. Adv Multimed 2013. https://doi.org/10.1155/2013/175064
Vaca-Castano G, Das S, Sousa JP (2015) Improving egocentric vision of daily activities. In: 2015 IEEE international conference on image processing (ICIP), pp 2562–2566
Vaca-Castano G, Das S, Sousa JP et al (2017) Improved scene identification and object detection on egocentric vision of daily activities. Comput Vis Image Underst 156:92–103. https://doi.org/10.1016/j.cviu.2016.10.016
Greff K, Srivastava RK, KoutnĂk J et al (2017) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28:2222–2232. https://doi.org/10.1109/TNNLS.2016.2582924
Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv preprint arXiv:150602078
Smith LN (2018) A disciplined approach to neural network hyper-parameters: part 1—learning rate, batch size, momentum, and weight decay. CoRR arXiv preprint arXiv:abs/1803.09820
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28:976–990. https://doi.org/10.1016/j.imavis.2009.11.014
Bambach S, Lee S, Crandall DJ, Yu C (2015) Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: 2015 IEEE international conference on computer vision (ICCV), pp 1949–1957
Bertasius G, Park HS, Yu SX, Shi J (2017) First person action-object detection with EgoNet. In: Proceedings of robotics: science and systems
Li Y, Zhefan Y, Rehg JM (2015) Delving into egocentric actions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 287–295
Poleg Y, Ephrat A, Peleg S, Arora C (2016) Compact CNN for indexing egocentric videos. In: 2016 IEEE winter conference on applications of computer vision (WACV), pp 1–9
Fathi A, Farhadi A, Rehg JM (2011) Understanding egocentric activities. In: 2011 international conference on computer vision, pp 407–414
Wray M, Moltisanti D, Mayol-Cuevas W, Damen D (2016) SEMBED: semantic embedding of egocentric action videos. In: Hua G, Jégou H (eds) Computer vision—ECCV 2016 workshops. Springer International Publishing, pp 532–545
Wu J, Osuntogun A, Choudhury T et al (2007) A scalable approach to activity recognition based on object use. In: 2007 IEEE 11th international conference on computer vision, pp 1–8
Su Y-C, Grauman K (2016) Leaving some stones unturned: dynamic feature prioritization for activity detection in streaming video. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. Springer International Publishing, pp 783–800
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788
Deng J, Dong W, Socher R et al (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255
Lin T-Y, Maire M, Belongie S et al (2014) Microsoft COCO: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision—ECCV 2014. Springer International Publishing, pp 740–755
Girshick R (2015) Fast R-CNN. In: 2015 IEEE international conference on computer vision (ICCV), pp 1440–1448
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. CoRR arXiv preprint arXiv:abs/1804.02767
Nguyen T-H-C, Nebel J-C, Florez-Revuelta F (2018) Recognition of activities of daily living from egocentric videos using hands detected by a deep convolutional network. In: Campilho A, Karray F, ter Haar Romeny B (eds) Image analysis and recognition. Springer International Publishing, pp 390–398
Acknowledgements
This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No. 676157.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kapidis, G., Poppe, R., van Dam, E., Noldus, L.P.J.J., Veltkamp, R.C. (2020). Object Detection-Based Location and Activity Classification from Egocentric Videos: A Systematic Analysis. In: Chen, F., GarcĂa-Betances, R., Chen, L., Cabrera-UmpiĂ©rrez, M., Nugent, C. (eds) Smart Assisted Living. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-030-25590-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-25590-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25589-3
Online ISBN: 978-3-030-25590-9
eBook Packages: Computer ScienceComputer Science (R0)