Abstract
An approach for video segmentation into shots and sub-shots that works directly in the MPEG compressed domain is presented. It is based only on the information about macroblock coding mode and motion vectors in P and B frames. The system follows a two-pass scheme and has a hybrid rule-based/neural structure. A rough scan over the P frames locates the potential shot boundaries and the solution is then refined by a precise scan over the B frames of the respective neighborhoods. The “simpler” boundaries are recognized by the rule-based module, while the decisions for the “complex” ones are refined by the neural part. The latter is also used to distinguish dissolves from object and camera motions and to further divide shots into sub-shots. The experiments demonstrate high speed and classification accuracy without computationally expensive calculations and need for many thresholds.
Similar content being viewed by others
References
P. Aigrain and P. Joly, “The automatic real-time analysis of film editing and transition effects and its applications,” Computers and Graphics, Vol. 18, No. 1, pp. 93-103, 1994.
A. Akutsu and Y. Tonomura, “Video tomography: An efficient method for camerawork extraction and motion analysis,” in Proc. ACM Multimedia'94, 1994, pp. 349-356.
A. Akutsu, Y. Tonomura, H. Hashimoto, and Y. Ohba, “Video indexing using motion vectors,” in Proc. SPIE: Visual Communications and Image Processing, Vol. 1818, Boston, USA, 1992, pp. 1522-1530.
G. Ananger and T.D.C. Little, “A survey of technologies for parsing and indexing digital video,” Journal of Visual Communication and Image Representation, Vol. 7, No. 1, pp. 28-43, 1996.
F. Arman, A. Hsu, and M.-Y. Chiu, “Image processing on compressed data for large video databases,” in Proc. First ACM Intern. Conference on Multimedia, 1993, pp. 267-272.
J. Astola, P. Haavisto, and Y. Neuvo,“Vector median filters,” in Proc. IEEE, Vol. 78, No. 4, 1990, pp. 678-689.
J.S. Boreczky and L.A. Rowe, “Comparison of video shot boundary detection techniques,” in Proc. IS&T/SPIE Intern. Symposium Electronic Imaging, San Jose, CA, 1996.
J.S. Boreczky and L.D. Wilcox, “A hidden Markov model framework for video segmentation using audio and image features,” in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, Seattle, Vol. 6, 1998, pp. 3741-3744.
B. Furht, S.W. Smoliar, and H.J. Zhang, Video and Image Processing in Multimedia Systems, Kluwer Academic Publ.: Boston, MA, 1995.
U. Gargi, R. Kasturi, and S. Antani, “Performance characterization and comparison of video indexing algorithms,” in Proc. Computer Society Conf. Computer Vision and Pattern Recognition (CVPR'98), 1998.
U. Gargi, S. Oswald, D. Kosiba, S. Devadiga, and R. Kasturi, “Evaluation of video sequence indexing and hierarchical video indexing,” in Proc. SPIE Conf. Storage and Retrieval in Image and Video Databases, 1995, pp. 144-151.
A. Hampapur, R. Jain, and T.E. Weymouth, “Production model based digital video segmentation,” Multimadia Tools and Applications, Vol. 1, No. 1, 1995, pp. 9-46.
ISO/IEC 13818 Draft International Standard: Generic Coding of Moving Pictures and Associated Audio, Part 2: video.
R. Kasturi and R. Jain, “Dynamic vision,” in Computer Vision: Principles, R. Kasturi and R. Jain (Eds.), IEEE Computer Society Press: Washington DC, 1991, pp. 469-480.
T. Kohonen, “The self-organizing map,” in Proc. of the IEEE, Vol. 78, No. 9, 1990, pp. 1464-1480.
I. Koprinska and S. Carrato, “Camera operation detection in MPEG video data by means of neural networks,” in Proc. COST 254 Workshop Emerging Technologies for Communication Terminals, Toulouse, France, 1997, pp. 300-304.
I. Koprinska and S. Carrato, “Detecting and classifying video shot boundaries in MPEG compressed sequences,” in Proc. IX European Signal Processing Conf. (EUSIPCO'98), Island of Rhodes, Greece, 1998, pp. 1729-1732.
J. Maeda, “Method for extracting camera operations to describe sub-scenes in video sequences,” in Proc. IS&T/SPIE/IEEE Conf. Digital Video Compression, Vol. 2187, San Jose, USA, 1994, pp. 56-67.
J. Meng, Y. Juan, and S. Chang, “Scene change detection in a MPEG compressed video sequence,” in Proc. IS&T/SPIE Intern. Symposium Electronic Imaging, Vol. 2417, San Jose, USA, 1995, pp. 14-25.
A. Nagasaka and Y. Tanaka, “Automatic video indexing and full-video search for object appearances,” in Visual Database Systems II, E. Knuth and L.M. Wegner (Eds.), Elsevier, 1995, pp. 113-127.
N.V. Patel and I.K. Sethi, “Video shot detection and characterization for video databases,” Pattern Recognition, special issue on Multimedia, Vol. 30, pp. 583-592, 1997.
J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1993.
I.K. Sethi and N.V. Patel, “A statistical approach to scene change detection,” in Proc. IS&T/SPIE Conf. Storage and Retrieval for Image and Video Databases III, Vol. 2420, San Jose, CA, 1995, pp. 2-11.
I.K. Sethi and G.P.R. Sarvarayuda, “Hierarchical classifier design using mutual information,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 4, 1982, pp. 441-445.
B. Shahraray, “Scene change detection and content-based sampling of video sequences,” in Proc. IS&T/SPIE, Vol. 2419, 1995, pp. 2-13.
K. Shen and E. Delp, “A fast algorithm for video parsing using MPEG compressed sequences,” in Proc. Intern. Conf. Image Processing (ICIP'96), Lausanne, Switzerland, 1996.
D. Swanberg, C.-F. Shu, and R. Jain, “Knowledge guided parsing in video databases,” in Proc. SPIE Conf., Vol. 1908, 1993, pp. 13-24.
C. Taskiran and E. Delp, “Video scene change detection using the generalized sequence trace,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Seattle, USA, 1998, pp. 2961-2964.
L. Teodosio and W. Bender, “Silent video stills: Content and context preserved,” in Proc. ACM Multimedia Conference, Anaheim, CA, 1993, pp. 39-46.
Y. Tonomura, “Video handling based on structured information for hypermedia systems,” in Proc. ACM Intern. Conference on Multimedia Information Systems'91, 1991, pp. 333-344.
S.W. Weiss and C.A. Kulikowski, Computer Systems That Learn. Morgan Kaufmann, 1991.
W. Xiong, J.C.-M. Lee, and M.C. Ip, “Net comparison: A fast and effective method for classifying image sequences,” in Proc. of SPIE Conference on Storage and Retrieval for Image and Video Databases III, Vol. 2420, San Jose, CA, 1995, pp. 318-328.
B. Yeo and B. Liu, “Rapid scene analysis on compressed video,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 5, No. 6, 1995, pp. 533-544.
H. Yu, G. Bozdagi, and S. Harrington, “Feature-based hierarchical video segmentation,” in Proc. of the Int. Conf. on Image Processing (ICIP'97), Santa Barbara, 1997, pp. 498-501.
R. Zabih, J. Miler, and K. Mai, “A feature-based algorithm for detecting and classifying production effects,” Multimedia Systems, Vol. 7, pp. 119-128, 1999.
H.J. Zhang, A. Kankanhalli, and S.W. Smoliar, “Automatic partitioning of full-motion video,” Multimedia Systems, Vol. 1, No. 1, pp. 10-28, 1993.
H.J. Zhang, C.Y. Low, and S.W. Smoliar, “Video parsing and browsing using compressed data,” Multimedia Tools and Applications, Vol. 1, pp. 89-111, 1995.
http://nucleus.hut.fi/nnrc/.
http://www.mpeg.org/MPEG/MSSG/.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Koprinska, I., Carrato, S. Hybrid Rule-Based/Neural Approach for Segmentation of MPEG Compressed Video. Multimedia Tools and Applications 18, 187–212 (2002). https://doi.org/10.1023/A:1019940716250
Issue Date:
DOI: https://doi.org/10.1023/A:1019940716250