A hybrid egocentric video summarization method to improve the healthcare for Alzheimer patients

  • Saba Sultan
  • Ali JavedEmail author
  • Aun Irtaza
  • Hassan Dawood
  • Hussain Dawood
  • Ali Kashif Bashir
Original Research


Alzheimer patients face difficulty to remember the identity of persons and performing daily life activities. This paper presents a hybrid method to generate the egocentric video summary of important people, objects and medicines to facilitate the Alzheimer patients to recall their deserted memories. Lifelogging video data analysis is used to recall the human memory; however, the massive amount of lifelogging data makes it a challenging task to select the most relevant content to educate the Alzheimer’s patient. To address the challenges associated with massive lifelogging content, static video summarization approach is applied to select the key-frames that are more relevant in the context of recalling the deserted memories of the Alzheimer patients. This paper consists of three main modules that are face, object, and medicine recognition. Histogram of oriented gradient features are used to train the multi-class SVM for face recognition. SURF descriptors are employed to extract the features from the input video frames that are then used to find the corresponding points between the objects in the input video and the reference objects stored in the database. Morphological operators are applied followed by the optical character recognition to recognize and tag the medicines for Alzheimer patients. The performance of the proposed system is evaluated on 18 real-world homemade videos. Experimental results signify the effectiveness of the proposed system in terms of providing the most relevant content to enhance the memory of Alzheimer patients.


Alzheimer Education Egocentric data Healthcare Video summarization 



  1. Aghdam HH, Heravi EJ, Puig D (2015) An unsupervised method for summarizing egocentric sport videos. In: Eighth international conference on machine vision (ICMV 2015)Google Scholar
  2. Bay H, Tuytelaars T, Van Gool L (2006) SURF: speeded up robust features. In: Computer Vision—ECCV 2006. AustriaGoogle Scholar
  3. Blighe M, Doherty A, Smeaton AF, Connor NEO (2008) Keyframe detection in visual lifelogs. In: Conference on pervasive technologiesGoogle Scholar
  4. Bolanos M, Dimiccoli M, Radeva P (2017) Towards storytelling from visual lifelogging: an overview. IEEE Trans Hum Mach Syst 47:77–90Google Scholar
  5. Crandall D, Antani S, Kasturi R (2002) Extraction of special effects caption text events from digital video. Int J Doc Anal Recognit 5:148–150Google Scholar
  6. Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 263–286Google Scholar
  7. Doherty AR, Byrne D, Smeaton AF, Jones GJF, Hughes M (2008) Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs. In: Proceedings of the 2008 international conference on content-based image and video retrieval, pp 259–268. ACMGoogle Scholar
  8. Grauman K, Lu Z (2013) Story-driven summarization for egocentric video. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). TexasGoogle Scholar
  9. Javed A, Bajwa KB, Malik H, Irtaza A (2016) An efficient framework for automatic highlights generation from sports videos. IEEE Signal Process Lett 23(7):954–958CrossRefGoogle Scholar
  10. Jeong D, Yoo HJ, Cho NI (2016) A static video summarization method based on the sparse coding of features and representativeness of frames. EURASIP J Image Video Process 2017(1):1CrossRefGoogle Scholar
  11. Karaman S, Benois-Pineau J, Dovgalecs V, Mégret R, Pinquier J, André-Obrecht R, Gaëstel Y, Dartigues J-F (2014) Hierarchical Hidden Markov Model in detecting activities of daily living in wearable videos for studies of dementia. Multimedia Tools Appl 69(3):743–771CrossRefGoogle Scholar
  12. Lee YJ, Grauman K (2015) Predicting important objects for egocentric video summarization. Int J Comput Vis 114(1):38–55MathSciNetCrossRefGoogle Scholar
  13. Lidon A, Bolanos M, Dimiccoli M, Radeva P, Garolera M (2017) Semantic summarization of egocentric photo stream events. In: LTA’17 Proceedings of the 2nd workshop on lifelogging tools and applications, Mountain View, California, USA, 23–24 October 2017. ACM, New YorkGoogle Scholar
  14. Lu Y (1995) Machine printed character segmentation—an overview. Pattern Recognit 28(1):67–80CrossRefGoogle Scholar
  15. Meditskos G, Plans P-M, Stavropoulos TG, Benois-Pineau J, Buso V, Kompatsiaris I (2018) Multi-modal activity recognition from egocentric vision, semantic enrichment and lifelogging applications for the care of dementia. J Vis Commun Image Represent 51:169–190CrossRefGoogle Scholar
  16. Nguyen T-H-C, Nebel J-C, Florez-Revuelta F (2016) Recognition of activities of daily living with egocentric vision: a review. Sensors (Basel) 16:72CrossRefGoogle Scholar
  17. Shivakumara P, Sreedhar RP, Phan TQ, Lu S, Tan CL (2012) Multioriented video scene text detection through bayesian classification and boundary growing. IEEE Trans Circuits Syst Video Technol 22(8):1231–1233CrossRefGoogle Scholar
  18. Smith R (2007) An overview of the tesseract OCR engine. In: Proceedings of 9th international conference on document analysis and recognition (ICDAR)Google Scholar
  19. Song X, Sun L, Lei J, Tao D, Yuan G, Song M (2016) Event-based large scale surveillance video summarization. J Neurocomput 187(C):66–74CrossRefGoogle Scholar
  20. Su Y-C, Grauman K (2016) Detecting engagement in egocentric video. In: Proceedings of the European conference on computer vision (ECCV). AmsterdamGoogle Scholar
  21. Tang P, Wang C, Wang X, Liu W, Zeng W, Wang J (2018) Object detection in videos by short and long range object linking. arXiv:1801.09823
  22. Toshev A, Makadia A, Daniilidis K (2009) Shape-based object recognition in videos using 3D synthetic object models. In: 2009 IEEE conference on computer vision and pattern recognitionGoogle Scholar
  23. Varini P, Serra G, Cucchiara R (2015) Egocentric video summarization of cultural tour based on user preferences. In: MM ‘15 Proceedings of the 23rd ACM international conference on Multimedia. BrisbaneGoogle Scholar
  24. Varini P, Serra G, Cucchiara R (2015) Personalized egocentric video summarization for cultural experience. In: Proceedings of the 5th ACM on international conference on multimedia retrieval. New YorkGoogle Scholar
  25. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154CrossRefGoogle Scholar
  26. Zhang K, Sha F, Chao W-L, Grauman K (2016) Summary transfer: exemplar-based subset selection for video summarization. In: IEEE conference on computer vision and pattern recognition (CVPR). Las VegasGoogle Scholar
  27. Zhang K, Chao W-L, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: Proceedings of European conference on computer vision (ECCV), California, 2016Google Scholar
  28. Zhang Y, Kampffmeyer M, Liang X, Tan M, Xing EP (2018a) Query-conditioned three-player adversarial network for video summarization. Computer Vision and Pattern Recognition. BMVC 2018, pp 1–9Google Scholar
  29. Zhang Y, Liang X, Zhang D, Tan M, Xing EP (2018b) Unsupervised object-level video summarization with online motion auto-encoder. Pattern Recogn Lett.

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Saba Sultan
    • 1
  • Ali Javed
    • 1
    Email author
  • Aun Irtaza
    • 2
  • Hassan Dawood
    • 1
  • Hussain Dawood
    • 3
  • Ali Kashif Bashir
    • 4
  1. 1.Software Engineering DepartmentUniversity of Engineering and TechnologyTaxilaPakistan
  2. 2.Computer Science DepartmentUniversity of Engineering and TechnologyTaxilaPakistan
  3. 3.Department of Network and Computer Engineering, College of Computer Science and Engineering University of JeddahJeddahSaudi Arabia
  4. 4.Department of Computing and MathematicsManchester Metropolitan UniversityManchesterUK

Personalised recommendations