Abstract
This paper tackles the issue of retrieving different instances of an object of interest within a given video document or in a video database. The principle consists in considering a semi-global image representation based on an over-segmentation of image frames. An aggregation mechanism is then applied in order to group a set of sub-regions into an object similar to the query, under a global similarity criterion. Two different strategies are proposed. The first one involves a greedy, dynamic region construction method. The second is based on simulated annealing, and aims at determining a global optimum. Experimental results show promising performances, with object detection rates of up to 79%.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Snoek, C.G.M., Worring, M.: Concept-Based Video Retrieval. Foundation and Trend in Information Retrieval 2(4), 215–322 (2008)
Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and TRECVid. In: Proc. 8th ACM International Workshop on Multimedia Information Retrieval, MIR 2006, USA, October 26 - 27, pp. 321–330. ACM Press, New York (2006)
Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: IEEE International Conf. on Computer Vision, ICCV 2003 (2003)
Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV) 2(60), 91–110 (2004)
Mikolajczyk, K., Schmid, C.: An Affine Invariant Interest Point Detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conference (BMVC 2002), pp. 384–393 (2002)
Fergus, R., Perona, P., Zisserman, A.: Weakly supervised scale-invariant learning of models for visual recognition. Int. Journal of Computer Vision 71(3), 273–303 (2007)
Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. IJCV 77(1-3), 259–289 (2008)
Jiang, H., Drew, M.S., Li, Z.: Matching by linear programming and successive convexification. IEEE Trans. PAMI 29, 959–975 (2007)
Li, H., Kim, E., Huang, X., He, L.: Object matching with a locally affine-invariant constraint. In: IEEE International Conf. on Computer Vision and Pattern Recognition (CVPR 2010), pp. 1641–1648 (2010)
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition (2005)
Tola, E., Lepetit, V., Fua, P.: A fast local descriptor for dense matching. In: IEEE International Conf. on Computer Vision and Pattern Recognition, CVPR 2008 (2008)
Tuytelaars, T., Schmid, C.: Vector quantizing feature space with a regular lattice. In: IEEE International Conf. on Computer Vision, ICCV 2007 (2007)
Tuytelaars, T.: Dense Interest Points. In: IEEE International Conf. on Computer Vision and Pattern Recognition (CVPR 2010), pp. 2281–2288 (2010)
Browne, P., Smeaton, A.F.: Video retrieval using dialogue, keyframe similarity and video objects. In: IEEE International Conf. on Image Processing (ICIP 2005), September 11-14, pp. III-1208- III-1211 (2005)
Foley, C., et al.: TRECVID 2010 Experiments at Dublin City University. TRECVid 2010 - Text REtrieval Conference TRECVid Workshop, Gaithersburg, MD (November 2010)
Gorisse, D., et al.: IRIM at TRECVID 2010: Semantic Indexing and Instance Search. TRECVid 2010 - Text REtrieval Conference TRECVid Workshop (November 2010)
Ren, X., Malik, J.: Learning a classification model for segmentation. In: IEEE International Conf. on Computer Vision (ICCV 2003), vol. 1, pp. 10–17 (2003)
Gould, S., Rodgers, J., Cohen, D., Elidan, G., Koller, D.: Multi-class segmentation with relative location prior. International Journal on Computer Vision (2008)
Malisiewicz, T., Efros, A.: Improving spatial support for objects via multiple segmentations. In: British Machine Vision Conference, BMVC 2007 (2007)
Chevalier, F., Domenger, J.P., Benois-Pineau, J., Delest, M.: Retrieval of objects in video by similarity based on graph matching. Pattern Recognition Letters 28(8), 939–949 (2007)
Vieux, R., Benois-Pineau, J., Domenger, J.-P., Braquelaire, A.: Segmentation-based multi-class semantic object detection. In: Multimedia Tools and Applications, pp. 1–22 (2010)
Kim, K., Grauman, K.: Boundary Preserving Dense Local Regions. In: IEEE International Conf. on Computer Vision and Pattern Recognition (2010)
Manjunath, B.S., Ohm, J.R., Vasudevan, V.V., Yamada, A.: Color and Texture Descriptors. IEEE Transactions on Circuits and Systems for Video Technology 11(6), 703–715 (2001)
Yang, N.C., Chang, W.H., Kuo, C.M., Li, T.H.: A fast MPEG-7 dominant color extraction with new similarity measure for image retrieval. Journal of Visual Communication and Image Representation 19(2), 92–105 (2008)
Zin, T.T., Tin, P., Toriu, T., Hama, H.: Dominant Color Embedded Markov Chain Model for Object Image Retrieval. In: 5th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, September 12-14, pp. 186–189 (2009)
Tapu, R., Zaharia, T.: A complete framework for temporal video segmentation. In: Proc. IEEE Int. Conf. on Consumer Electronics Berlin (ICCE-Berlin), Germany (September 2011)
Comaniciu, D., Meer, P.: Mean Shift: A Robust Approach Toward Feature Space Analysis. IEEE Tran. on Pattern Analysis and Machine Intelligence, 603–619 (May 2002)
Hafner, J., Sawhney, H.S., Equitz, W., Flickner, M., Niblack, W.: Efficient color histogram indexing for quadratic form distance functions. IEEE Trans. Pattern Anal. Machine Intell. 17, 729–736 (1995)
Kirkpatrick, S., Gelatt, C.D., Vechi, M.P.: Optimization by simulated annealing. Science, 220 (1983)
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equations of state calculation by fast computing machines. Journal of Chemical Physics 21(6), 1087–1092 (1953)
Lundy, M., Mees, A.: Convergence of an annealing algorithm. Mathematical Programming 34, 111–124 (1986)
Bursuc, A., Zaharia, T., Prêteux, F.: Mobile Video Browsing and Retrieval with the OVIDIUS Platform. In: Proc. ACM Multimedia 2010 International Conference, Florence, Italy (October 2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bursuc, A., Zaharia, T., Prêteux, F. (2012). Retrieval of Multiple Instances of Objects in Videos. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, CW., Andreopoulos, Y., Breiteneder, C. (eds) Advances in Multimedia Modeling. MMM 2012. Lecture Notes in Computer Science, vol 7131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27355-1_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-27355-1_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27354-4
Online ISBN: 978-3-642-27355-1
eBook Packages: Computer ScienceComputer Science (R0)