Multimedia Tools and Applications

, Volume 21, Issue 1, pp 75–96 | Cite as

Accessing Video Contents through Key Objects over IP

  • Jianping Fan
  • Xingquan Zhu
  • Kayvan Najarian
  • Lide Wu


In order to support content-based video database access over the Internet Protocol (IP), achieving the following objectives are important: (i) video query by a representative object (key object) or some statistical characterization of the target contents, (ii) bandwidth-efficient browsing over IP, and (iii) scalable and user-centric video transmission over a heterogeneous and variable-bandwidth network. We present a video object extraction and scalable coding system designed to meet the above objectives. In our system, key objects of meaning to video database users are generated via a human-computer-interaction procedure, and are tracked across frames. Given a key object, an algorithm classifies a subset of its VOPs as key VOPs. This subset forms the basis of a highly bandwidth-efficient base layer for supporting activities such as browsing and refining queries. Over the base layer, a number of enhancement layers can be defined to progressively increase the spatial and temporal resolutions of retrieved video. It is expected that heterogeneous users can subscribe to different numbers of the enhancement layers according to their own conditions, such as access authorization, available connection bandwidth, and quality preference.

semantic object extraction key object key VOP scalable coding 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    G. Adiv, “Determining three-dimensional motion and structure from optical flow generated by several moving objects,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 7, pp. 384–401, 1985.Google Scholar
  2. 2.
    A.A. Alatan, L. Onural, M. Wollborn, R. Mech, E. Tuncel, and T. Sikora, “Image sequence analysis for emerging interactive multimedia services-The European COST 211 framework,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 8, pp. 802–813, 1998.Google Scholar
  3. 3.
    A.D. Bimbo, E. Vicario, and D. Zingoni, “Symbolic description and visual querying of image sequences using spatio-temporal logic,” IEEE Trans. on Knowledge and Data Engineering, Vol. 7, pp. 609, 1995.Google Scholar
  4. 4.
    P. Bouthemy and E. Francois, “Motion segmentation and qulitative dynamic scene analysis from an image sequence,” Int'l J. Computer Vision, Vol. 10, pp. 157–182, 1993.Google Scholar
  5. 5.
    J. Cai and A. Goshtasby, “Detecting human faces in color image,” Image and Vision Computing, Vol. 18, pp. 63–75, 1999.Google Scholar
  6. 6.
    R. Castagno, T. Ebrahimi, and M. Kunt, “Semiautomatic segmentation and tracking of semantic video objects,” IEEE Trans on Circuits and Systems for Video Technology, Vol. 8, pp. 572–584, 1998.Google Scholar
  7. 7.
    S.F. Chang, W. Chen, H.J. Meng, H. Sundaram, and D. Zhong, “A fully automatic content-based video search engine supporting spatiotemporal queries,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 8, pp. 602–615, 1998.Google Scholar
  8. 8.
    J.-Y. Chen, C. Taskiran, A. Albiol, E.J. Delp, and C.A. Bouman, “ViBE: A Compressed Video Database Structured for Active Browsing and Search,” in Proc. SPIE: Multimedia Storage and Archiving Systems IV, Sept. 1999, Boston, Vol. 3846, pp. 148–164.Google Scholar
  9. 9.
    J.D. Courtney, “Automatic video indexing via object motion analysis,” Pattern Recognition, Vol. 30, pp. 607–625, 1997.Google Scholar
  10. 10.
    I.J. Cox, M. Miller, T.P. Minka, T.V. Papathomas, and P.N. Yianilos, “The Bayesian image retrieval system, PicHunter: Theory, implementation and psychophysical experiments,” IEEE Trans. on Image Processing, Vol. 9, pp. 20–37, 2000.Google Scholar
  11. 11.
    Y. Deng and B.S. Manjunath, “NeTra-V: Toward an object-based video representation,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 8, pp. 616–627, 1998.Google Scholar
  12. 12.
    N. Diehl, “Object-oriented motion estimation and segmentation in image sequences,” IEEE Trans. on Image Processing, Vol. 3, pp. 1901–1904, 1990.Google Scholar
  13. 13.
    M.F. Dubuisson and A.K. Jain, “Contour extraction for moving objects in complex outdoor scene,” Int'l J. Computer Vision, Vol. 14, pp. 83–105, 1995.Google Scholar
  14. 14.
    C. Faloutsos and K.-I. Lin, “FastMap: A fast algorithm for indexing, data-mining and visualization for traditional and multimedia datasets,” in ACM SIGMOD, San Jose, CA, 1995, pp. 163–174.Google Scholar
  15. 15.
    J. Fan, “Adaptive motion-compensated video coding scheme towards content-based bit rate allocation,” J. Electronic Imaging, Vol. 9, Oct. 2000.Google Scholar
  16. 16.
    J. Fan, M. Body, X. Zhu, and M.-S. Hacid, “Seeded image segmentation toward content-based image retrieval application,” Storage and Retrieval of Multimedia Database, San Jose, CA, Jan. 23–26, 2002.Google Scholar
  17. 17.
    J. Fan, G. Fujita, M. Furuie, T. Onoye, I. Shirakawa, and L. Wu, “Automatic moving object extraction towards compact video representation,” Optical Engineering, Vol. 39, No. 2, pp. 438–452, 2000.Google Scholar
  18. 18.
    J. Fan and F. Gan, “Motion estimation based on uncompensability analysis,” IEEE Trans. on Image Processing, Vol. 6, pp. 1584–1587, 1997.Google Scholar
  19. 19.
    J. Fan, R. Wang, L. Zhang, D. Xing, and F. Gan, “Image sequence segmentation based on 2D temporal entropy,” Pattern Recognition Letters, Vol. 17, pp. 1101–1107, 1996.Google Scholar
  20. 20.
    J. Fan, D.K.Y. Yau, A.K. Elmagarmid, and W.G. Aref, “Automatic image segmentation by integrating color edge detection and seeded region growing,” IEEE Trans. on Image Processing,Vol. 10, No. 10, pp. 1454–1466, 2001.Google Scholar
  21. 21.
    J. Fan, L. Zhang, and F. Gan, “Spatiotemporal segmentation based on two-dimensional spatio-temporal entropic thresholding,” Optical Engineering, Vol. 36, pp. 2845–2851, 1997.Google Scholar
  22. 22.
    M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, “Query by image and video content: The QBIC system,” IEEE Computer, Vol. 38, pp. 23–31, 1995.Google Scholar
  23. 23.
    D. Forsyth and M. Fleck, “Finding people and animals by guided assembly,” in Proc. of ICIP, Santa Barbara, USA, 1997.Google Scholar
  24. 24.
    C. Gu and M.C. Lee, “Semantic segmentation and tracking of semantic video objects,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 8, pp. 572–584, 1998.Google Scholar
  25. 25.
    B. Gunsel, A.M. Ferman, and A.M. Tekalp, “Temporal video segmentation using unsupervised clustering and semantic object tracking,” J. Electronic Imaging, Vol. 7, pp. 592–604, 1998.Google Scholar
  26. 26.
    J. Guo, J. Kim, and C.-C. J. Kuo, “SIVOG: Smart interactive video object generation system,” ACM Multimedia, Orlando, FL, 1999, pp. 13–16.Google Scholar
  27. 27.
    J. Haddon and J. Boyce, “Image segmentation by unifying region and boundary information,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 12, pp. 929–948, 1990.Google Scholar
  28. 28.
    M. Hoetter and R. Thoma, “Image segmentation based on object oriented mapping parameter estimation,” Signal Processing, Vol. 15, pp. 315–334, 1989.Google Scholar
  29. 29.
    A. Humrapur, A. Gupta, B. Horowitz, C.F. Shu, C. Fuller, J. Bach, M. Gorkani, and R. Jain, “Virage video engine,” in SPIE Proc. Storage and Retrieval for Image and Video Databases V, San Jose, CA, Feb. 1997, pp. 188–197.Google Scholar
  30. 30.
    D.P. Huttenlocher, G. Klanderman, and W. Rucklidge, “Comparing images using the Hausdorff distance,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 15, pp. 850–863, 1993.Google Scholar
  31. 31.
    M. Irani and P. Anandan, “Video indexing based on mosaic representation,” Proc. IEEE, Vol. 86, pp. 905–921, 1998.Google Scholar
  32. 32.
    Y. Ishikawa, R. Subramanya, and C. Faloutsos, “Mindreader: Query databases through multiple examples,” in Proc. of the 24th VLDB Conf., 1998.Google Scholar
  33. 33.
    A. Jaimes and S.F. Chang, “Model-based classification of visual information for content-based retrieval,” in Proc. SPIE: Storage and Retrieval for Image and Video Database, San Jose, CA, 1999.Google Scholar
  34. 34.
    A.K. Jain, A. Vailaya, and X. Wei, “Query by video clip,” ACM Multimedia Systems, Vol. 7, pp. 369–384, 1999.Google Scholar
  35. 35.
    Y.-M. Kwon, E. Ferrari, and E. Bertino, “Modeling spatio-temporal constraints fro multimedia objects,” Data and Knowledge Engineering, Vol. 30, pp. 217–238, 1999.Google Scholar
  36. 36.
    H. Luo and A. Eleftheriadis, “Designing an interactive tool for video object segmentation,” ACM Multimedia'99, pp. 265–269.Google Scholar
  37. 37.
    T. Meier and K.N. Ngan, “Automatic segmentation of moving objects for video object plane generation,” IEEE Trans. Circuits and Systems for Video Technology, Vol. 8, pp. 525–538, 1998.Google Scholar
  38. 38.
    F. Moscheni, S. Bhattacharjee, and M. Kunt, “Spatiotemporal segmentation based on region merging,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 20, pp. 897–914, 1998.Google Scholar
  39. 39.
    M.R. Naphade and T.S. Huang, “Aprobabilistic framework for semantic video indexing, filtering and retrieval,” IEEE Trans. on Multimedia, Vol. 3, pp. 141–151, 2001.Google Scholar
  40. 40.
    E. Oomoto and K. Tanaka, “OVID: Design and implementation of a video object database system,” IEEE Trans Knowledge and Data Engineering, Vol. 5, pp. 629–643, 1993.Google Scholar
  41. 41.
    N. Pal and S. Pal, “Entropic thresholding,” Signal Processing, Vol. 16, pp. 97–108, 1989.Google Scholar
  42. 42.
    A. Pentland, R.W. Picard, and S. Sclaroff, “Photobook: Content-based manipulation of image databases,” Int. J. Computer Vision, Vol. 18, pp. 233–254, 1996.Google Scholar
  43. 43.
    R.W. Picard and T.P. Minka, “Vision texture for annotation,” ACM Multimedia Systems, special issue on content-based retrieval, Vol. 3, pp. 3–14, 1995.Google Scholar
  44. 44.
    Y. Rui and T.S. Huang, “A novel relevance feedback technique in image retrieval,” ACM Multimedia'99, pp. 67–70.Google Scholar
  45. 45.
    Y. Rui, T.S. Huang, and S. Mehrotra, “Browsing and retrieving video content in a unified framework,” in Proc. IEEE Int'l Conf. on Multimedia Computing and Systems, Austin, TX, 1998, pp. 237–240.Google Scholar
  46. 46.
    Y. Rui, T.S. Huang, M. Ortega, and S. Mehrotra, “Relevance feedback: A power tool for interactive content-based image retrieval,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 8, pp. 644–655, 1998.Google Scholar
  47. 47.
    Y. Rui, A.C. She, and T.S. Huang, “Modified Fourier descriptors for shape representation-a practical approach,” in Proc. of First Int. Workshop on Image Database amd Multi Media Search, 1996.Google Scholar
  48. 48.
    P. Salembier and M. Pardás, “Hierarchical morphological segmentation for image sequence coding,” IEEE Trans. on Image Processing, Vol. 3, pp. 639–651, 1994.Google Scholar
  49. 49.
    S. Satoh and T. Kanade, “Name-It: Association of face and name in video,” in Proc. of Computer Vision and Pattern Recognition, 1997.Google Scholar
  50. 50.
    G. Sheikholeslami, W. Chang, and A. Zhang, “Semantic clustering and querying on heterogeneous features for visual data,” ACM Multimedia'99, pp. 3–12, Bristol, UK.Google Scholar
  51. 51.
    J.R. Smith and S.F. Chang, “VisualSEEK: A fully automated content-based image query system,” in ACM Multimedia Conf., Bosston, MA, Nov. 1996, pp. 87–98.Google Scholar
  52. 52.
    H. Tamura, S. Mori, and T. Yamawaki, “Texture features corresponding to visual perception,” IEEE Trans. on System, Man, and Cybern., Vol. 8, pp. 460–472, 1978.Google Scholar
  53. 53.
    A. Vailaya, M. Figueiredo, A.K. Jain, and H.J. Zhang, “A Bayesian framework for semantic classification of outdoor vacation images,” Prof. SPIE, Vol. 3656, pp. 415–426, 1999.Google Scholar
  54. 54.
    J.Y.A. Wang and E.H. Adelson, “Representing moving image with layers,” IEEE Trans. Image Processing, Vol. 3, pp. 625–638, 1994.Google Scholar
  55. 55.
    G. Wei and I.K. Sethi, “Face detection for image annotation,” Pattern Recognition Letters, Vol. 20, pp. 1313–1321, 1999.Google Scholar
  56. 56.
    Y. Xu and E.C. Uberbacher, “2D image segmentation using minimum spanning trees,” Image and Vision Computing, Vol. 15, pp. 47–57, 1997.Google Scholar
  57. 57.
    B.-L. Yeo and M.M. Yeung, “Classification, simplification and dynamic visualization of scene transition graphs for video browsing,” in Proc. SPIE, Vol. 3312, pp. 60–70, 1997.Google Scholar
  58. 58.
    M.M. Yeung, B.-L. Yeo, and B. Liu, “Extracting story units from long program for video browsing and navigation,” in Proc. Third IEEE Int'l Conf. Multimedia Computing and Systems, June 1996.Google Scholar
  59. 59.
    H.J. Zhang, J. Wu, D. Zhong, and S. Smoliar, “An integrated system for content-based video retrieval and browsing,” Pattern Recognition, Vol. 30, pp. 643–658, 1997.Google Scholar
  60. 60.
    D. Zhao and J. Chen, “Affine curve moment invariants for shape recognition,” Pattern Recognition, Vol. 30, pp. 895–901, 1997.Google Scholar
  61. 61.
    D. Zhong, H.J. Zhang, and S.-F. Chang, “Clustering methods for video browsing and annotation,” in Proc. SPIE, 1996.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Jianping Fan
    • 1
  • Xingquan Zhu
    • 2
  • Kayvan Najarian
    • 1
  • Lide Wu
    • 3
  1. 1.Department of Computer ScienceUniversity of North CarolinaCharlotteUSA
  2. 2.Department of Computer SciencePurdue UniversityWest LafayetteUSA
  3. 3.Department of Computer ScienceFudan UniversityShanghaiChina

Personalised recommendations