Skip to main content
Log in

Human action annotation, modeling and analysis based on implicit user interaction

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper proposes an integrated framework for analyzing human actions in video streams. Despite most current approaches that are just based on automatic spatiotemporal analysis of sequences, the proposed method introduces the implicit user-in-the-loop concept for dynamically mining semantics and annotating video streams. This work sets a new and ambitious goal: to recognize, model and properly use “average user’s” selections, preferences and perception, for dynamically extracting content semantics. The proposed approach is expected to add significant value to hundreds of billions of non-annotated or inadequately annotated video streams existing in the Web, file servers, databases etc. Furthermore expert annotators can gain important knowledge relevant to user preferences, selections, styles of searching and perception.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Assfalg J, Bertini M, Colombo C, Bimbo AD (2002) Semantic annotation of sports videos. IEEE Multimed 9(2):52–60

    Article  Google Scholar 

  2. Bader BW, Kolda TG (2006) Algorithm 862: MATLAB tensor classes for fast algorithm prototyping. ACM Trans Math Softw 32(4):635–653

    Article  MathSciNet  Google Scholar 

  3. Bagdanov AD, Bertini M, Bimbo A, Serra G, Torniai C (2007) Semantic annotation and retrieval of video events using multimedia ontologies. Proceedings of the 1st International Conference on Semantic Computing. Irvine, CA, pp 713–720, September

  4. Bertini M, Cucchiara R, del Bimbo A, Torniai C (2005) Video annotation with pictorially enriched ontologies. Proceedings of the 2005 IEEE International Conference on Multimedia and Expo. Amsterdam, Netherlands, pp 1428–1431, July

  5. Bhattacharya A, Ljosa V, Pan J-Y, Verardo MR, Yang H, Faloutsos C, Singh AK (2005) ViVo: Visual Vocabulary construction for mining biomedical images. Proceedings of the 5th IEEE International Conference on Data Mining, Houston, Texas, November

  6. “comScore’s qSearch 2.0 service”, comScore’s report article. Online at: http://www.comscore.com (last access 15/1/2009)

  7. de Lathauwer L, de Moor B, Vandewalle J (2000) A multilinear singular value decomposition. SIAM J Matrix Anal Appl 21(4):1253–1278

    Article  MATH  MathSciNet  Google Scholar 

  8. Deng Y, Manjunath BS (2001) Unsupervised segmentation of color-texture regions in images and video. IEEE Trans Pattern Anal Mach Intell (PAMI ’01) 23(8):800–810

    Article  Google Scholar 

  9. Doulamis N, Doulamis A (2006) Evaluation of relevance feedback schemes in content-based retrieval systems. Signal Process Image Comm 21(4):334–357

    Article  Google Scholar 

  10. Doulamis AD, Doulamis ND, Kollias SD (2000) On line retrainable neural networks: improving the performance of neural network in image analysis problems. IEEE Trans Neural Netw 11(1):137–155

    Article  Google Scholar 

  11. Doulamis A, Doulamis N, Kollias S (2000) Non-sequential video content representation using temporal variation of feature vectors. IEEE Trans Consum Electron 46(3):758–768

    Article  Google Scholar 

  12. Fan J, Luo H, Gao Y, Jain R (2007) Incorporating concept ontology for hierarchical video classification, annotation, and visualization. IEEE Trans Multimedia 9(5):939–957

    Article  Google Scholar 

  13. Fan J, Gao Y, Luo H (2008) Integrating concept ontology and multitask learning to achieve more effective classifier training for multilevel image annotation. IEEE Trans Image Process 17(3):407–426

    Article  MathSciNet  Google Scholar 

  14. Gao S, Wang D-H, Lee C-H (2006) Automatic image annotation through multi-topic text categorization. Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp II–II, Toulouse, France, May

  15. Harit G, Chaudhury S, Ghosh H (2006) Using multimedia ontology for generating conceptual annotations and hyperlinks in video collections. Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence. Hong Kong, pp 211–217, December

  16. Haykin S (1994) Neural networks: a comprehensive foundation. Macmillan, New York

    MATH  Google Scholar 

  17. Haykin S (1996) Adaptive Filter theory, 3rd edn. Prentice Hall, New Jersey

    Google Scholar 

  18. Jansen BJ, Spink A, Saracevic T (2000) Real life, real users, and real needs: a study and analysis of user queries on the web. Inf Process Manag 36(2):207–227

    Article  Google Scholar 

  19. Joachims T (2002) Optimizing search engines using clickthrough data. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Canada, pp 133–142, July

  20. Joshi D, Wang JZ, Li J (2006) The story picturing engine—a system for automatic text illustration. ACM Trans Multimed Comput Comm Appl 2(1):68–89

    Article  Google Scholar 

  21. Kolda TG, Sun J (2008) Scalable tensor decompositions for multi-aspect data mining. Proceedings of the 8th IEEE International Conference on Data Mining. Pisa, Italy, December. Online at: http://csmr.ca.sandia.gov/∼tgkolda/pubs/bibtgkfiles/ICDM08-Kolda-Sun-preprint.pdf

  22. Li J, Wang JZ (2008) Real-time computerized annotation of pictures. IEEE Trans Pattern Anal Mach Intell 30(6):985–1002

    Article  Google Scholar 

  23. Moon B, Jagadish HV, Faloutsos C, Salz J (2001) Analysis of the clustering properties of Hilbert space-filling curve. IEEE Trans Knowl Data Eng 13(1):124–141

    Article  Google Scholar 

  24. Nesvadba J (2007) From push-based passive content consumption to pull-based content experiences. Panel presentation in the 8th IEEE International Workshop on Image Analysis for Interactive Multimedia Services, Santorini, Greece. Online at: http://mklab.iti.gr/wiamis2007/files/2007_WIAMIS_Nesvadba_PanelSearchEngines.pdf (last access 15/1/2009)

  25. Petridis S, Tsapatsoulis N (2006) Semantics extraction from multimedia content: the BOEMIE architecture. In: Proceeding of the 1st International conference on Semantics and digital Media Technology (SAMT 2006), Athens, Greece, December. Online at: http://www.cs.ucy.ac.cy/∼nicolast/papers/BOEMIE-SAMT.pdf (last access 15/1/2009)

  26. Petridis K, Kompatsiaris I, Strintzis MG, Bloehdorn S, Handschuh S, Staab S, Simou N, Tzouvaras V, Avrithis Y (2004) Knowledge representation for semantic multimedia content analysis and reasoning. Proceedings of the 2004 European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology. London, UK, pp 33–46, November

  27. Rapantzikos K, Tsapatsoulis N (2005) Enhancing the robustness of skin-based face detection schemes through a visual attention architecture. Proc of the 2005 Int Conf Image Proc 2:1298–1301

    Google Scholar 

  28. Rapantzikos K, Tsapatsoulis N, Avrithis Y, Kollias S (2007) Bottom-up spatiotemporal visual attention model for video analysis. IET Image Process 1(2):237–248

    Article  Google Scholar 

  29. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proc. ICPR’04, Cambridge, UK

  30. “Search Engine Statistics For 2006–07,” SEO weekly article, Online at: http://www.accuracast.com/seo-weekly/se-statistics.php (last access 15/1/2009)

  31. Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380

    Article  Google Scholar 

  32. Stevenson K, Leung C (2005) Comparative evaluation of web image search engines for multimedia applications. Proceedings of the 2005 IEEE International Conference on Multimedia and Expo. Amsterdam, Netherlands, pp 1194–1197, July

  33. Tsapatsoulis N, Petridis S (2007) Classifying images from athletics based on spatial relations. Proceedings of the 2nd International Workshop on Semantic Media Adaptation and Personalization, pp 92–97, December

  34. Tsapatsoulis N, Avrithis Y, Kollias S (2001) Facial image indexing in multimedia databases. Patt Anal and Appl 4(2/3):93–107

    Article  MATH  MathSciNet  Google Scholar 

  35. Tsapatsoulis N, Pattichis C, Kounoudes A, Loizou C, Constantinides A, Taylor JG (2006) Visual attention based region of interest coding for video-telephony applications. 5th International Symposium on Communication Systems, Networks and Digital Signal Processing (CSNDSP’06), Patras, Greece, July

  36. Tseng VS, Su J-H, Huang J-H, Chen C-J (2008) Integrated mining of visual features, speech features, and frequent patterns for semantic video annotation. IEEE Trans Multimedia 10(2):260–267

    Article  Google Scholar 

  37. Vasilescu MAO, Terzopoulos D (2004) TensorTextures: multilinear image-based rendering. Proceedings of ACM SIGGRAPH 2004 Conference. Los Angeles, CA, pp 334–340, August

  38. Xu B, Wang P, Lu J, Li Y, Kang D (2004) Bridge ontology and its role in semantic annotation. Proceedings of the 3rd International Conference on Cyberworlds. Tokyo, Japan, pp 329–334, November

Download references

Acknowledgments

This research was performed in the framework of the PSIFIORIKSI project (Audiovisual Content Digitisation and Multimedia Metadata Extraction, Authoring and Storing based on MPEG-7), funded by the Research Promotion Foundation (RPF) of the Republic of Cyprus.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Klimis S. Ntalianis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ntalianis, K.S., Doulamis, A.D., Tsapatsoulis, N. et al. Human action annotation, modeling and analysis based on implicit user interaction. Multimed Tools Appl 50, 199–225 (2010). https://doi.org/10.1007/s11042-009-0369-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-009-0369-6

Keywords

Navigation