Skip to main content
Log in

Accessing Video Contents through Key Objects over IP

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In order to support content-based video database access over the Internet Protocol (IP), achieving the following objectives are important: (i) video query by a representative object (key object) or some statistical characterization of the target contents, (ii) bandwidth-efficient browsing over IP, and (iii) scalable and user-centric video transmission over a heterogeneous and variable-bandwidth network. We present a video object extraction and scalable coding system designed to meet the above objectives. In our system, key objects of meaning to video database users are generated via a human-computer-interaction procedure, and are tracked across frames. Given a key object, an algorithm classifies a subset of its VOPs as key VOPs. This subset forms the basis of a highly bandwidth-efficient base layer for supporting activities such as browsing and refining queries. Over the base layer, a number of enhancement layers can be defined to progressively increase the spatial and temporal resolutions of retrieved video. It is expected that heterogeneous users can subscribe to different numbers of the enhancement layers according to their own conditions, such as access authorization, available connection bandwidth, and quality preference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. G. Adiv, “Determining three-dimensional motion and structure from optical flow generated by several moving objects,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 7, pp. 384–401, 1985.

    Google Scholar 

  2. A.A. Alatan, L. Onural, M. Wollborn, R. Mech, E. Tuncel, and T. Sikora, “Image sequence analysis for emerging interactive multimedia services-The European COST 211 framework,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 8, pp. 802–813, 1998.

    Google Scholar 

  3. A.D. Bimbo, E. Vicario, and D. Zingoni, “Symbolic description and visual querying of image sequences using spatio-temporal logic,” IEEE Trans. on Knowledge and Data Engineering, Vol. 7, pp. 609, 1995.

    Google Scholar 

  4. P. Bouthemy and E. Francois, “Motion segmentation and qulitative dynamic scene analysis from an image sequence,” Int'l J. Computer Vision, Vol. 10, pp. 157–182, 1993.

    Google Scholar 

  5. J. Cai and A. Goshtasby, “Detecting human faces in color image,” Image and Vision Computing, Vol. 18, pp. 63–75, 1999.

    Google Scholar 

  6. R. Castagno, T. Ebrahimi, and M. Kunt, “Semiautomatic segmentation and tracking of semantic video objects,” IEEE Trans on Circuits and Systems for Video Technology, Vol. 8, pp. 572–584, 1998.

    Google Scholar 

  7. S.F. Chang, W. Chen, H.J. Meng, H. Sundaram, and D. Zhong, “A fully automatic content-based video search engine supporting spatiotemporal queries,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 8, pp. 602–615, 1998.

    Google Scholar 

  8. J.-Y. Chen, C. Taskiran, A. Albiol, E.J. Delp, and C.A. Bouman, “ViBE: A Compressed Video Database Structured for Active Browsing and Search,” in Proc. SPIE: Multimedia Storage and Archiving Systems IV, Sept. 1999, Boston, Vol. 3846, pp. 148–164.

    Google Scholar 

  9. J.D. Courtney, “Automatic video indexing via object motion analysis,” Pattern Recognition, Vol. 30, pp. 607–625, 1997.

    Google Scholar 

  10. I.J. Cox, M. Miller, T.P. Minka, T.V. Papathomas, and P.N. Yianilos, “The Bayesian image retrieval system, PicHunter: Theory, implementation and psychophysical experiments,” IEEE Trans. on Image Processing, Vol. 9, pp. 20–37, 2000.

    Google Scholar 

  11. Y. Deng and B.S. Manjunath, “NeTra-V: Toward an object-based video representation,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 8, pp. 616–627, 1998.

    Google Scholar 

  12. N. Diehl, “Object-oriented motion estimation and segmentation in image sequences,” IEEE Trans. on Image Processing, Vol. 3, pp. 1901–1904, 1990.

    Google Scholar 

  13. M.F. Dubuisson and A.K. Jain, “Contour extraction for moving objects in complex outdoor scene,” Int'l J. Computer Vision, Vol. 14, pp. 83–105, 1995.

    Google Scholar 

  14. C. Faloutsos and K.-I. Lin, “FastMap: A fast algorithm for indexing, data-mining and visualization for traditional and multimedia datasets,” in ACM SIGMOD, San Jose, CA, 1995, pp. 163–174.

  15. J. Fan, “Adaptive motion-compensated video coding scheme towards content-based bit rate allocation,” J. Electronic Imaging, Vol. 9, Oct. 2000.

  16. J. Fan, M. Body, X. Zhu, and M.-S. Hacid, “Seeded image segmentation toward content-based image retrieval application,” Storage and Retrieval of Multimedia Database, San Jose, CA, Jan. 23–26, 2002.

  17. J. Fan, G. Fujita, M. Furuie, T. Onoye, I. Shirakawa, and L. Wu, “Automatic moving object extraction towards compact video representation,” Optical Engineering, Vol. 39, No. 2, pp. 438–452, 2000.

    Google Scholar 

  18. J. Fan and F. Gan, “Motion estimation based on uncompensability analysis,” IEEE Trans. on Image Processing, Vol. 6, pp. 1584–1587, 1997.

    Google Scholar 

  19. J. Fan, R. Wang, L. Zhang, D. Xing, and F. Gan, “Image sequence segmentation based on 2D temporal entropy,” Pattern Recognition Letters, Vol. 17, pp. 1101–1107, 1996.

    Google Scholar 

  20. J. Fan, D.K.Y. Yau, A.K. Elmagarmid, and W.G. Aref, “Automatic image segmentation by integrating color edge detection and seeded region growing,” IEEE Trans. on Image Processing,Vol. 10, No. 10, pp. 1454–1466, 2001.

    Google Scholar 

  21. J. Fan, L. Zhang, and F. Gan, “Spatiotemporal segmentation based on two-dimensional spatio-temporal entropic thresholding,” Optical Engineering, Vol. 36, pp. 2845–2851, 1997.

    Google Scholar 

  22. M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, “Query by image and video content: The QBIC system,” IEEE Computer, Vol. 38, pp. 23–31, 1995.

    Google Scholar 

  23. D. Forsyth and M. Fleck, “Finding people and animals by guided assembly,” in Proc. of ICIP, Santa Barbara, USA, 1997.

  24. C. Gu and M.C. Lee, “Semantic segmentation and tracking of semantic video objects,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 8, pp. 572–584, 1998.

    Google Scholar 

  25. B. Gunsel, A.M. Ferman, and A.M. Tekalp, “Temporal video segmentation using unsupervised clustering and semantic object tracking,” J. Electronic Imaging, Vol. 7, pp. 592–604, 1998.

    Google Scholar 

  26. J. Guo, J. Kim, and C.-C. J. Kuo, “SIVOG: Smart interactive video object generation system,” ACM Multimedia, Orlando, FL, 1999, pp. 13–16.

  27. J. Haddon and J. Boyce, “Image segmentation by unifying region and boundary information,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 12, pp. 929–948, 1990.

    Google Scholar 

  28. M. Hoetter and R. Thoma, “Image segmentation based on object oriented mapping parameter estimation,” Signal Processing, Vol. 15, pp. 315–334, 1989.

    Google Scholar 

  29. A. Humrapur, A. Gupta, B. Horowitz, C.F. Shu, C. Fuller, J. Bach, M. Gorkani, and R. Jain, “Virage video engine,” in SPIE Proc. Storage and Retrieval for Image and Video Databases V, San Jose, CA, Feb. 1997, pp. 188–197.

  30. D.P. Huttenlocher, G. Klanderman, and W. Rucklidge, “Comparing images using the Hausdorff distance,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 15, pp. 850–863, 1993.

    Google Scholar 

  31. M. Irani and P. Anandan, “Video indexing based on mosaic representation,” Proc. IEEE, Vol. 86, pp. 905–921, 1998.

    Google Scholar 

  32. Y. Ishikawa, R. Subramanya, and C. Faloutsos, “Mindreader: Query databases through multiple examples,” in Proc. of the 24th VLDB Conf., 1998.

  33. A. Jaimes and S.F. Chang, “Model-based classification of visual information for content-based retrieval,” in Proc. SPIE: Storage and Retrieval for Image and Video Database, San Jose, CA, 1999.

  34. A.K. Jain, A. Vailaya, and X. Wei, “Query by video clip,” ACM Multimedia Systems, Vol. 7, pp. 369–384, 1999.

    Google Scholar 

  35. Y.-M. Kwon, E. Ferrari, and E. Bertino, “Modeling spatio-temporal constraints fro multimedia objects,” Data and Knowledge Engineering, Vol. 30, pp. 217–238, 1999.

    Google Scholar 

  36. H. Luo and A. Eleftheriadis, “Designing an interactive tool for video object segmentation,” ACM Multimedia'99, pp. 265–269.

  37. T. Meier and K.N. Ngan, “Automatic segmentation of moving objects for video object plane generation,” IEEE Trans. Circuits and Systems for Video Technology, Vol. 8, pp. 525–538, 1998.

    Google Scholar 

  38. F. Moscheni, S. Bhattacharjee, and M. Kunt, “Spatiotemporal segmentation based on region merging,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 20, pp. 897–914, 1998.

    Google Scholar 

  39. M.R. Naphade and T.S. Huang, “Aprobabilistic framework for semantic video indexing, filtering and retrieval,” IEEE Trans. on Multimedia, Vol. 3, pp. 141–151, 2001.

    Google Scholar 

  40. E. Oomoto and K. Tanaka, “OVID: Design and implementation of a video object database system,” IEEE Trans Knowledge and Data Engineering, Vol. 5, pp. 629–643, 1993.

    Google Scholar 

  41. N. Pal and S. Pal, “Entropic thresholding,” Signal Processing, Vol. 16, pp. 97–108, 1989.

    Google Scholar 

  42. A. Pentland, R.W. Picard, and S. Sclaroff, “Photobook: Content-based manipulation of image databases,” Int. J. Computer Vision, Vol. 18, pp. 233–254, 1996.

    Google Scholar 

  43. R.W. Picard and T.P. Minka, “Vision texture for annotation,” ACM Multimedia Systems, special issue on content-based retrieval, Vol. 3, pp. 3–14, 1995.

    Google Scholar 

  44. Y. Rui and T.S. Huang, “A novel relevance feedback technique in image retrieval,” ACM Multimedia'99, pp. 67–70.

  45. Y. Rui, T.S. Huang, and S. Mehrotra, “Browsing and retrieving video content in a unified framework,” in Proc. IEEE Int'l Conf. on Multimedia Computing and Systems, Austin, TX, 1998, pp. 237–240.

  46. Y. Rui, T.S. Huang, M. Ortega, and S. Mehrotra, “Relevance feedback: A power tool for interactive content-based image retrieval,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 8, pp. 644–655, 1998.

    Google Scholar 

  47. Y. Rui, A.C. She, and T.S. Huang, “Modified Fourier descriptors for shape representation-a practical approach,” in Proc. of First Int. Workshop on Image Database amd Multi Media Search, 1996.

  48. P. Salembier and M. Pardás, “Hierarchical morphological segmentation for image sequence coding,” IEEE Trans. on Image Processing, Vol. 3, pp. 639–651, 1994.

    Google Scholar 

  49. S. Satoh and T. Kanade, “Name-It: Association of face and name in video,” in Proc. of Computer Vision and Pattern Recognition, 1997.

  50. G. Sheikholeslami, W. Chang, and A. Zhang, “Semantic clustering and querying on heterogeneous features for visual data,” ACM Multimedia'99, pp. 3–12, Bristol, UK.

  51. J.R. Smith and S.F. Chang, “VisualSEEK: A fully automated content-based image query system,” in ACM Multimedia Conf., Bosston, MA, Nov. 1996, pp. 87–98.

  52. H. Tamura, S. Mori, and T. Yamawaki, “Texture features corresponding to visual perception,” IEEE Trans. on System, Man, and Cybern., Vol. 8, pp. 460–472, 1978.

    Google Scholar 

  53. A. Vailaya, M. Figueiredo, A.K. Jain, and H.J. Zhang, “A Bayesian framework for semantic classification of outdoor vacation images,” Prof. SPIE, Vol. 3656, pp. 415–426, 1999.

    Google Scholar 

  54. J.Y.A. Wang and E.H. Adelson, “Representing moving image with layers,” IEEE Trans. Image Processing, Vol. 3, pp. 625–638, 1994.

    Google Scholar 

  55. G. Wei and I.K. Sethi, “Face detection for image annotation,” Pattern Recognition Letters, Vol. 20, pp. 1313–1321, 1999.

    Google Scholar 

  56. Y. Xu and E.C. Uberbacher, “2D image segmentation using minimum spanning trees,” Image and Vision Computing, Vol. 15, pp. 47–57, 1997.

    Google Scholar 

  57. B.-L. Yeo and M.M. Yeung, “Classification, simplification and dynamic visualization of scene transition graphs for video browsing,” in Proc. SPIE, Vol. 3312, pp. 60–70, 1997.

    Google Scholar 

  58. M.M. Yeung, B.-L. Yeo, and B. Liu, “Extracting story units from long program for video browsing and navigation,” in Proc. Third IEEE Int'l Conf. Multimedia Computing and Systems, June 1996.

  59. H.J. Zhang, J. Wu, D. Zhong, and S. Smoliar, “An integrated system for content-based video retrieval and browsing,” Pattern Recognition, Vol. 30, pp. 643–658, 1997.

    Google Scholar 

  60. D. Zhao and J. Chen, “Affine curve moment invariants for shape recognition,” Pattern Recognition, Vol. 30, pp. 895–901, 1997.

    Google Scholar 

  61. D. Zhong, H.J. Zhang, and S.-F. Chang, “Clustering methods for video browsing and annotation,” in Proc. SPIE, 1996.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, J., Zhu, X., Najarian, K. et al. Accessing Video Contents through Key Objects over IP. Multimedia Tools and Applications 21, 75–96 (2003). https://doi.org/10.1023/A:1025086200838

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025086200838

Navigation