Cognition inspired format for the expression of computer vision metadata
- 122 Downloads
Over the last decade noticeable progress has occurred in automated computer interpretation of visual information. Computers running artificial intelligence algorithms are growingly capable of extracting perceptual and semantic information from images, and registering it as metadata. There is also a growing body of manually produced image annotation data. All of this data is of great importance for scientific purposes as well as for commercial applications. Optimizing the usefulness of this, manually or automatically produced, information implies its precise and adequate expression at its different logical levels, making it easily accessible, manipulable and shareable. It also implies the development of associated manipulating tools. However, the expression and manipulation of computer vision results has received less attention than the actual extraction of such results. Hence, it has experienced a smaller advance. Existing metadata tools are poorly structured, in logical terms, as they intermix the declaration of visual detections with that of the observed entities, events and comprising context. This poor structuring renders such tools rigid, limited and cumbersome to use. Moreover, they are unprepared to deal with more advanced situations, such as the coherent expression of the information extracted from, or annotated onto, multi-view video resources. The work here presented comprises the specification of an advanced XML based syntax for the expression and processing of Computer Vision relevant metadata. This proposal takes inspiration from the natural cognition process for the adequate expression of the information, with a particular focus on scenarios of varying numbers of sensory devices, notably, multi-view video.
KeywordsMetadata Multi-view video Multimedia annotation Computer vision Cognition
The Work was largely developed in the context of: project Media Arts and Technologies (MAT), NORTE-07-0124-FEDER-000061, financed by the North Portugal Regional Operational Programme (ON.2 – O Novo Norte), under the National Strategic Reference Framework (NSRF), through the European Regional Development Fund (ERDF), and by national funds, through the Portuguese funding agency, Fundação para a Ciência e a Tecnologia (FCT); Project QREN 23277 RETAIL PRO, a co-promotion R&D project funded by European Regional Development Fund (ERDF) through ON2 as part of the National Strategic Reference Framework (NSRF), and managed by Agência de Inovação (ADI); Project QREN 33910 ARENA, a R&D project funded by European Regional Development Fund (ERDF) through ON2 as part of the National Strategic Reference Framework (NSRF), and managed by IAPMEI - Agência para a Competitividade e Inovação, I.P.
- 1.Barrett D (2013) One surveillance camera for every 11 people in Britain, says CCTV survey. The Telegraph. http://www.telegraph.co.uk/technology/10172298/One-surveillance-camera-for-every-11-people-in- Britain-says-CCTV-survey.html
- 4.Castro H, Alves AP (2009) Cognitive object format, international conference on knowledge engineering and ontology development. Funchal. doi: 10.5220/0002263103510358.
- 6.Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: efficient and robust 3D object recognition. In CVPRGoogle Scholar
- 7.Francescani C, NYPD (2013) expands surveillance net to fight crime as well as terrorism. Reuters, http://www.reuters.com/article/2013/06/21/us-usa-ny-surveillance-idUSBRE95K0T520130621
- 8.Information technology - multimedia content description interface - part 9: Profiles and levels, amendment 1: extensions to profiles and levels ISO/IEC 15938-9:2005/Amd.1:2012 (2012)Google Scholar
- 10.List T, Fisher RB (2004) CVML – An XML-based computer vision markup language. Proceedings of the 17th international conference on pattern recognition ICPRGoogle Scholar
- 11.Manjunath BS, Salembier P, Sikora T (2002) Introduction to mpeg-7: multimedia content description interface. ISBN: 978–0-471-48678-7Google Scholar
- 12.Marr D (2010) Vision. A computational investigation into the human representation and processing of visual information. The MIT Press, Cambridge. ISBN 978-0262514620Google Scholar
- 13.Newcombe RA, Davison AJ (2010) Live dense reconstruction with a single moving camera. In proceedings of the ieee conference on computer vision and pattern recognition (CvPR) 1:2.2Google Scholar
- 14.Pereira F, Koenen R (2001) MPEG-7: a standard for multimedia content description. Intern J Imag Grap 1(3):527--547Google Scholar
- 15.Project CAVIAR website, http://homepages.inf.ed.ac.uk/rbf/CAVIAR
- 16.Project ViPER website, http://viper-toolkit.sourceforge.net
- 19.Sanes DH, Reh TA, Harris WA (2006) Development of the nervous system. Elsevier Academic Press, LondonGoogle Scholar
- 20.Sano M, Bailer W, Messina A, Evain J-P, Matton M (2013) The MPEG-7 audiovisual description profile (avdp) and its application to multi-view video IVMSP Workshop. 2013 IEEE 11th, pp 1--4, 2013.Google Scholar
- 21.Schallauer P, Bailer W, Hofmann A, Mörzinger R (2009) SAM – an interoperable metadata model for multimodal surveillance applications. In proceedings of spie defense, security, and sensing 2009. OrlandoGoogle Scholar
- 23.Volkmer T, Smith JR, Natsev A (2005) A web-based system for collaborative annotation of large image and video collections: an evaluation and user study. Proceedings of the 13th annual ACM international conference on multimedia, pp 892–901Google Scholar
- 24.Wines M (2011) China: chongqing will Add 200,000 surveillance cameras. The New York Times. http://www.nytimes.com/2011/03/11/world/asia/11webbrfs-Cameras.html?_r=0
- 25.Yan Y, Ricci E, Subramanian R, Lanz O, Sebe N (2013) No matter where you are: flexible graph-guided multi-task learning for multi-view head pose classification under target motion. International conference on computer visionGoogle Scholar