Hierarchical Hidden Markov Model in detecting activities of daily living in wearable videos for studies of dementia

Karaman, Svebor; Benois-Pineau, Jenny; Dovgalecs, Vladislavs; Mégret, Rémi; Pinquier, Julien; André-Obrecht, Régine; Gaëstel, Yann; Dartigues, Jean-François

doi:10.1007/s11042-012-1117-x

Hierarchical Hidden Markov Model in detecting activities of daily living in wearable videos for studies of dementia

Published: 01 June 2012

Volume 69, pages 743–771, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Svebor Karaman¹,
Jenny Benois-Pineau¹,
Vladislavs Dovgalecs²,
Rémi Mégret²,
Julien Pinquier³,
Régine André-Obrecht³,
Yann Gaëstel⁴ &
…
Jean-François Dartigues⁴

1046 Accesses
31 Citations
Explore all metrics

Abstract

This paper presents a method for indexing activities of daily living in videos acquired from wearable cameras. It addresses the problematic of analyzing the complex multimedia data acquired from wearable devices, which has been recently a growing concern due to the increasing amount of this kind of multimedia data. In the context of dementia diagnosis by doctors, patient activities are recorded in the environment of their home using a lightweight wearable device, to be later visualized by the medical practitioners. The recording mode poses great challenges since the video data consists in a single sequence shot where strong motion and sharp lighting changes often appear. Because of the length of the recordings, tools for an efficient navigation in terms of activities of interest are crucial. Our work introduces a video structuring approach that combines automatic motion based segmentation of the video and activity recognition by a hierarchical two-level Hidden Markov Model. We define a multi-modal description space over visual and audio features, including mid-level features such as motion, location, speech and noise detections. We show their complementarities globally as well as for specific activities. Experiments on real data obtained from the recording of several patients at home show the difficulty of the task and the promising results of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Article Open access 08 October 2020

A review of computer vision-based approaches for physical rehabilitation and assessment

Article Open access 19 June 2021

Learning spatiotemporal relationships with a unified framework for video object segmentation

Article 07 May 2024

References

Amieva H, Le Goff M, Millet X, Orgogozo J-M, Pérès K, Barberger-Gateau P, Jacqmin-Gadda H, Dartigues J-F (2008) Prodromal Alzheimer’s disease: successive emergence of the clinical symptoms. Ann Neurol 64(5):492–498
Article Google Scholar
André-Obrecht R (1988) A new statistical approach for automatic speech segmentation. IEEE Trans Audio Speech Signal Process 36(1):29–40
Article Google Scholar
Ballan L, Bertini M, Del Bimbo A, Seidenari L, Serra G (2011) Event detection and recognition for semantic annotation of video. Multimed Tool Appl 51(1):279–302
Article Google Scholar
Bay H, Tuytelaars T, Van Gool L (2008) SURF: speeded-up robust features. Comput Vis Image Understand 110(3):346–359
Article Google Scholar
Bengio Y, Delalleau O, Le Roux N, Paiement J-F, Vincent P, Ouimet M (2006) Spectral dimensionality reduction. Feature Extraction. Foundations and Applications, Springer, pp. 519–550
Benois-Pineau J, Kramer P (2005) Camera motion detection in the rough indexing paradigm. TREC Video
Boreczky JS, Wilcox LD (1998) A Hidden Markov Model framework for video segmentation using audio and image features. Proc IEEE Int Conf Acoust Speech Signal Process 6:3741–3744
Google Scholar
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167
Article Google Scholar
Byrne D, Doherty AR, Jones GJF, Smeaton AF, Kumpulainen S, Järvelin K (2008) The SenseCam as a tool for task observation. In Proceedings of the 22nd British CHI Group Annual Conference on HCI 2008: People and Computers XXII: Culture, Creativity, Interaction-Volume 2, 19–22
Chatzis SP, Kosmopoulos DI, Varvarigou TA (2009) Robust sequential data modeling using an outlier tolerant hidden markov model. IEEE Trans Pattern Anal Mach Intell 31(9):1657–1669
Article Google Scholar
Delakis M, Gravier G, Gros P (2008) Audiovisual integration with Segment Models for tennis video parsing. Comput Vis Image Understand 111(2):142–154
Article Google Scholar
Doherty A, Caprani N, Óconaire C, Kalnikaite V, Gurrin C, Smeaton AF, O’Connor NE (2011) Passively recognising human activities through lifelogging. Comput Hum Behav 27(5):1948–1958
Article Google Scholar
First Workshop on Egocentric Vision, held in conjunction with CVPR (2009)
Fine S, Singer Y, Tishby N (1998) The Hierarchical Hidden Markov Model: analysis and applications. Mach Learn 32:41–62
Article MATH Google Scholar
GaëstelY, Onifade-Fagbemi C, Trophy F, Karaman S, Benois-Pineau J, Mégret R, Pinquier J, André-Obrecht R, Dartigues J-F (2011) Autonomy at home and early diagnosis in Alzheimer Disease: usefulness of video indexing applied to clinical issues. The IMMED Project. Alzheimer’s Association International Conference on Alzheimer’s Disease—AAICAD, 16–21 Juillet, France
Gales M, Young J (1993) The theory of segmental Hidden Markov Models. University of Cambridge, Department of Engineering
Galliano S, Geofrois E, De Mosterfa, Bonastre JF, Gravier G (2005) The Ester phase II evaluation campaign for the rich transcription of the French broadcast news. EUROSPEECH, pp. 1149–1152
Gao Z, Chen M, Hauptmann A, Cai A (2010) Comparing evaluation protocols on the KTH dataset. International Conference on Human Behavior Understanding—HBU, LNCS volume 6219, pp. 88–100
Gorisse D, Precioso F, Gosselin P, Granjon L, Pellerin D, Rombaut M, Bredin H, Koenig L, Vieux R, Mansencal B, Benois-Pineau J, Boujut H, Morand C, Jégou H, Ayache S, Safadi B, Tong Y, Thollard F, Quénot GM, Cord M, Benoît A, Lambert P (2010) IRIM at TRECVID 2010: semantic indexing and instance search. Proc. TRECVID 2010 Workshop
Guyot P, Pinquier J, André-Obrecht R (2012, June 27–29) Water flow detection from a wearable device with an new feature, the spectral cover. Submitted to CBMI’2012, IEEE Workshop, Annecy, France
Hamid R, Maddi S, Johnson A, Bobick A, Essa I, Isbell Ch (2009) A novel sequence representation for unsupervised analysis of human activities. Artif Intell 173:1221–1244
Article MathSciNet Google Scholar
Harte N, Lennon D, Kokaram A (2009) On parsing visual sequences with the hidden Markov model. EURASIP J Image Video Process, 2009:1–13
Hill M, Hua G, Natsev A, Smith JR, Xie L, Huang B, Merler M, Ouyang H, Zhou M (2010) IBM research TRECVID-2010 video copy detection and multimedia event detection system. Proc. TRECVID 2010 Workshop
Hodges S, Williams L, Berry E, Izadi S, Srinivasan J, Butler A, Smyth G, Kapur N, Wood KR (2006) Sensecam: a retrospective memory aid. UBICOMP’2006, pp. 177–193
HTK Web-Site: http://htk.eng.cam.ac.uk
Ivanov Y, Bobick A (2000) Recognition of visual activities and interactions by stochastic parsing. IEEE Trans Pattern Anal Mach Intell 22(8):852–872
Article Google Scholar
Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. Tenth IEEE International Conference on Computer Vision—ICCV, 1, pp. 604–610
Karaman S, Benois-Pineau J, Dartigues J-F, Gaëstel Y, Mégret R, Pinquier J (2011) Activities of daily living indexing by hierarchical HMM for dementia diagnostics. Content-Based Multimedia Indexing and retrieval—CBMI’2011. IEEE Workshop, 13–15 Juin, Madrid, Espagne
Kijak E, Gravier G, Gros P, Oisel L, Bimbot F (2003) HMM based structuring of tennis videos using visual and audio cues. ICME 3:309–312
Google Scholar
Lan Z-Z, Bao L, Yu S-I, Liu W, Hauptmann AG (2012) Double fusion for multimedia event detection. International Conference on Multimedia Modeling (MMM’12), pp. 173–185
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE Conference on Computer Vision and Pattern Recognition—CVPR, 2, pp. 2169–2178
Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos ‘in the wild’. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1996–2003
Megret R, Szolgay D, Benois-Pineau J, Joly P, Pinquier J, Dartigues J-F, Helmer C (2008) Wearable video monitoring of people with age dementia: video indexing at the service of healthcare. International Workshop on Content-Based Multimedia Indexing - CBMI, Conference Proceedings, art. no. 4564934, pp. 101–108
Ostendorf M, Digalakis V, Kimball OA (1995) From HMMs to segment models: a unified view of stochastic modeling for speech recognition. IEEE Trans Speech Audio Process 4:360–378
Article Google Scholar
Piccardi L, Noris B, Barbey O, Billard A, Schiavone G, Keller F, von Hofsten C 2007 Wearcam: a head wireless camera for monitoring gaze attention and for the diagnosis of developmental disorders in young children. International Symposium on Robot & Human Interactive Communication, pp. 177–193
Pinquier J, André-Obrecht R (2006) Audio indexing: primary components retrieval—robust classification in audio documents. Multimed Tool Appl 30(3):313–330
Article Google Scholar
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990
Article Google Scholar
Quenot G, Benois-Pineau J, Mansencal B, Rossi E et al (2008) Rushes summarization by IRIM consortium: redundancy removal and multi-feature fusion. VS’08 (Trec Video Summarization),
Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Article Google Scholar
Scholkopf B, Smola A, Muller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(6):1299–1319
Article Google Scholar
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. Proceedings of the 17th International Conference on Pattern Recognition (ICPR’2004), pp. 32–36
Sikora T, Manjunath B, Salembier P (2002) Introduction to MPEG-7: multimedia content description interface
Spriggs EH, De La Torre F, Hebert M (2009) Temporal segmentation and activity classification from first-person sensing. In First Workshop on Egocentric Vision, pp. 17–24
Sundaram S, Mayol-Cuevas W (2009) High level activity recognition using low resolution wearable vision. In First Workshop on Egocentric Vision, pp. 25–32
Sundaram S, Mayol-Cuevas W (2010) Egocentric visual event classification with location-based priors. In International Symposium on Visual Computing, Lecture Notes in Computer Science volume 6454, pp. 596–605
Surie D, Pederson T, Lagriffoul F, Janlert L-E, Sjölie D (2007) Activity recognition using an egocentric perspective of everyday objects. Ubiquitous Intelligence and Computing. Springer, pp. 246–257
Young S, Evermann G et al (1997) The HTK book
Young SJ, Young S (1994) The HTK hidden Markov model toolkit: design and philosophy. Entropic Cambridge Research Laboratory, Ltd
Zouba N, Bremond F, Anfonso A, Thonnat M, Pascual E, Guerin O (2010 May) Monitoring elderly activities at home. Gerontechnology 9(2):263
Google Scholar

Download references

Acknowledgments

This work is partly supported by a grant from the ANR (Agence Nationale de la Recherche) with reference ANR-09-BLAN-0165-02, within the IMMED project.

Author information

Authors and Affiliations

LaBRI—University of Bordeaux, 351 Cours de la Libération, 33405, Talence Cedex, France
Svebor Karaman & Jenny Benois-Pineau
IMS—University of Bordeaux, 351 Cours de la Libération, Talence, France
Vladislavs Dovgalecs & Rémi Mégret
IRIT—University of Toulouse, 118 route de Narbonne, 31062, Toulouse Cedex 9, France
Julien Pinquier & Régine André-Obrecht
INSERM U.897—University Victor Ségalen Bordeaux 2, Bordeaux, France
Yann Gaëstel & Jean-François Dartigues

Authors

Svebor Karaman
View author publications
You can also search for this author in PubMed Google Scholar
Jenny Benois-Pineau
View author publications
You can also search for this author in PubMed Google Scholar
Vladislavs Dovgalecs
View author publications
You can also search for this author in PubMed Google Scholar
Rémi Mégret
View author publications
You can also search for this author in PubMed Google Scholar
Julien Pinquier
View author publications
You can also search for this author in PubMed Google Scholar
Régine André-Obrecht
View author publications
You can also search for this author in PubMed Google Scholar
Yann Gaëstel
View author publications
You can also search for this author in PubMed Google Scholar
Jean-François Dartigues
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jenny Benois-Pineau.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karaman, S., Benois-Pineau, J., Dovgalecs, V. et al. Hierarchical Hidden Markov Model in detecting activities of daily living in wearable videos for studies of dementia. Multimed Tools Appl 69, 743–771 (2014). https://doi.org/10.1007/s11042-012-1117-x

Download citation

Published: 01 June 2012
Issue Date: April 2014
DOI: https://doi.org/10.1007/s11042-012-1117-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical Hidden Markov Model in detecting activities of daily living in wearable videos for studies of dementia

Abstract

Access this article

Similar content being viewed by others

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

A review of computer vision-based approaches for physical rehabilitation and assessment

Learning spatiotemporal relationships with a unified framework for video object segmentation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hierarchical Hidden Markov Model in detecting activities of daily living in wearable videos for studies of dementia

Abstract

Access this article

Similar content being viewed by others

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

A review of computer vision-based approaches for physical rehabilitation and assessment

Learning spatiotemporal relationships with a unified framework for video object segmentation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation