Abstract
This paper proposes an integrated framework for analyzing human actions in video streams. Despite most current approaches that are just based on automatic spatiotemporal analysis of sequences, the proposed method introduces the implicit user-in-the-loop concept for dynamically mining semantics and annotating video streams. This work sets a new and ambitious goal: to recognize, model and properly use “average user’s” selections, preferences and perception, for dynamically extracting content semantics. The proposed approach is expected to add significant value to hundreds of billions of non-annotated or inadequately annotated video streams existing in the Web, file servers, databases etc. Furthermore expert annotators can gain important knowledge relevant to user preferences, selections, styles of searching and perception.
Similar content being viewed by others
References
Assfalg J, Bertini M, Colombo C, Bimbo AD (2002) Semantic annotation of sports videos. IEEE Multimed 9(2):52–60
Bader BW, Kolda TG (2006) Algorithm 862: MATLAB tensor classes for fast algorithm prototyping. ACM Trans Math Softw 32(4):635–653
Bagdanov AD, Bertini M, Bimbo A, Serra G, Torniai C (2007) Semantic annotation and retrieval of video events using multimedia ontologies. Proceedings of the 1st International Conference on Semantic Computing. Irvine, CA, pp 713–720, September
Bertini M, Cucchiara R, del Bimbo A, Torniai C (2005) Video annotation with pictorially enriched ontologies. Proceedings of the 2005 IEEE International Conference on Multimedia and Expo. Amsterdam, Netherlands, pp 1428–1431, July
Bhattacharya A, Ljosa V, Pan J-Y, Verardo MR, Yang H, Faloutsos C, Singh AK (2005) ViVo: Visual Vocabulary construction for mining biomedical images. Proceedings of the 5th IEEE International Conference on Data Mining, Houston, Texas, November
“comScore’s qSearch 2.0 service”, comScore’s report article. Online at: http://www.comscore.com (last access 15/1/2009)
de Lathauwer L, de Moor B, Vandewalle J (2000) A multilinear singular value decomposition. SIAM J Matrix Anal Appl 21(4):1253–1278
Deng Y, Manjunath BS (2001) Unsupervised segmentation of color-texture regions in images and video. IEEE Trans Pattern Anal Mach Intell (PAMI ’01) 23(8):800–810
Doulamis N, Doulamis A (2006) Evaluation of relevance feedback schemes in content-based retrieval systems. Signal Process Image Comm 21(4):334–357
Doulamis AD, Doulamis ND, Kollias SD (2000) On line retrainable neural networks: improving the performance of neural network in image analysis problems. IEEE Trans Neural Netw 11(1):137–155
Doulamis A, Doulamis N, Kollias S (2000) Non-sequential video content representation using temporal variation of feature vectors. IEEE Trans Consum Electron 46(3):758–768
Fan J, Luo H, Gao Y, Jain R (2007) Incorporating concept ontology for hierarchical video classification, annotation, and visualization. IEEE Trans Multimedia 9(5):939–957
Fan J, Gao Y, Luo H (2008) Integrating concept ontology and multitask learning to achieve more effective classifier training for multilevel image annotation. IEEE Trans Image Process 17(3):407–426
Gao S, Wang D-H, Lee C-H (2006) Automatic image annotation through multi-topic text categorization. Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp II–II, Toulouse, France, May
Harit G, Chaudhury S, Ghosh H (2006) Using multimedia ontology for generating conceptual annotations and hyperlinks in video collections. Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence. Hong Kong, pp 211–217, December
Haykin S (1994) Neural networks: a comprehensive foundation. Macmillan, New York
Haykin S (1996) Adaptive Filter theory, 3rd edn. Prentice Hall, New Jersey
Jansen BJ, Spink A, Saracevic T (2000) Real life, real users, and real needs: a study and analysis of user queries on the web. Inf Process Manag 36(2):207–227
Joachims T (2002) Optimizing search engines using clickthrough data. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Canada, pp 133–142, July
Joshi D, Wang JZ, Li J (2006) The story picturing engine—a system for automatic text illustration. ACM Trans Multimed Comput Comm Appl 2(1):68–89
Kolda TG, Sun J (2008) Scalable tensor decompositions for multi-aspect data mining. Proceedings of the 8th IEEE International Conference on Data Mining. Pisa, Italy, December. Online at: http://csmr.ca.sandia.gov/∼tgkolda/pubs/bibtgkfiles/ICDM08-Kolda-Sun-preprint.pdf
Li J, Wang JZ (2008) Real-time computerized annotation of pictures. IEEE Trans Pattern Anal Mach Intell 30(6):985–1002
Moon B, Jagadish HV, Faloutsos C, Salz J (2001) Analysis of the clustering properties of Hilbert space-filling curve. IEEE Trans Knowl Data Eng 13(1):124–141
Nesvadba J (2007) From push-based passive content consumption to pull-based content experiences. Panel presentation in the 8th IEEE International Workshop on Image Analysis for Interactive Multimedia Services, Santorini, Greece. Online at: http://mklab.iti.gr/wiamis2007/files/2007_WIAMIS_Nesvadba_PanelSearchEngines.pdf (last access 15/1/2009)
Petridis S, Tsapatsoulis N (2006) Semantics extraction from multimedia content: the BOEMIE architecture. In: Proceeding of the 1st International conference on Semantics and digital Media Technology (SAMT 2006), Athens, Greece, December. Online at: http://www.cs.ucy.ac.cy/∼nicolast/papers/BOEMIE-SAMT.pdf (last access 15/1/2009)
Petridis K, Kompatsiaris I, Strintzis MG, Bloehdorn S, Handschuh S, Staab S, Simou N, Tzouvaras V, Avrithis Y (2004) Knowledge representation for semantic multimedia content analysis and reasoning. Proceedings of the 2004 European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology. London, UK, pp 33–46, November
Rapantzikos K, Tsapatsoulis N (2005) Enhancing the robustness of skin-based face detection schemes through a visual attention architecture. Proc of the 2005 Int Conf Image Proc 2:1298–1301
Rapantzikos K, Tsapatsoulis N, Avrithis Y, Kollias S (2007) Bottom-up spatiotemporal visual attention model for video analysis. IET Image Process 1(2):237–248
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proc. ICPR’04, Cambridge, UK
“Search Engine Statistics For 2006–07,” SEO weekly article, Online at: http://www.accuracast.com/seo-weekly/se-statistics.php (last access 15/1/2009)
Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Stevenson K, Leung C (2005) Comparative evaluation of web image search engines for multimedia applications. Proceedings of the 2005 IEEE International Conference on Multimedia and Expo. Amsterdam, Netherlands, pp 1194–1197, July
Tsapatsoulis N, Petridis S (2007) Classifying images from athletics based on spatial relations. Proceedings of the 2nd International Workshop on Semantic Media Adaptation and Personalization, pp 92–97, December
Tsapatsoulis N, Avrithis Y, Kollias S (2001) Facial image indexing in multimedia databases. Patt Anal and Appl 4(2/3):93–107
Tsapatsoulis N, Pattichis C, Kounoudes A, Loizou C, Constantinides A, Taylor JG (2006) Visual attention based region of interest coding for video-telephony applications. 5th International Symposium on Communication Systems, Networks and Digital Signal Processing (CSNDSP’06), Patras, Greece, July
Tseng VS, Su J-H, Huang J-H, Chen C-J (2008) Integrated mining of visual features, speech features, and frequent patterns for semantic video annotation. IEEE Trans Multimedia 10(2):260–267
Vasilescu MAO, Terzopoulos D (2004) TensorTextures: multilinear image-based rendering. Proceedings of ACM SIGGRAPH 2004 Conference. Los Angeles, CA, pp 334–340, August
Xu B, Wang P, Lu J, Li Y, Kang D (2004) Bridge ontology and its role in semantic annotation. Proceedings of the 3rd International Conference on Cyberworlds. Tokyo, Japan, pp 329–334, November
Acknowledgments
This research was performed in the framework of the PSIFIORIKSI project (Audiovisual Content Digitisation and Multimedia Metadata Extraction, Authoring and Storing based on MPEG-7), funded by the Research Promotion Foundation (RPF) of the Republic of Cyprus.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ntalianis, K.S., Doulamis, A.D., Tsapatsoulis, N. et al. Human action annotation, modeling and analysis based on implicit user interaction. Multimed Tools Appl 50, 199–225 (2010). https://doi.org/10.1007/s11042-009-0369-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-009-0369-6