Video Event Classification Using Bag of Words and String Kernels

Ballan, Lamberto; Bertini, Marco; Del Bimbo, Alberto; Serra, Giuseppe

doi:10.1007/978-3-642-04146-4_20

Lamberto Ballan¹⁹,
Marco Bertini¹⁹,
Alberto Del Bimbo¹⁹ &
…
Giuseppe Serra¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5716))

Included in the following conference series:

International Conference on Image Analysis and Processing

1577 Accesses
5 Citations

Abstract

The recognition of events in videos is a relevant and challenging task of automatic semantic video analysis. At present one of the most successful frameworks, used for object recognition tasks, is the bag-of-words (BoW) approach. However this approach does not model the temporal information of the video stream. In this paper we present a method to introduce temporal information within the BoW approach. Events are modeled as a sequence composed of histograms of visual features, computed from each frame using the traditional BoW model. The sequences are treated as strings where each histogram is considered as a character. Event classification of these sequences of variable size, depending on the length of the video clip, are performed using SVM classifiers with a string kernel that uses the Needlemann-Wunsch edit distance. Experimental results, performed on two datasets, soccer video and TRECVID 2005, demonstrate the validity of the proposed approach.

Download to read the full chapter text

Chapter PDF

Semantic Concept Detection Using Dense Codeword Motion

Human Action Classification Using an Extended BoW Formalism

Research on motion recognition algorithm based on bag-of-words model

Article 25 May 2019

Ting Huang, Sheng-Rong Ru, … Long Zhang

Keywords

References

Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L.: A comparison of affine region detectors. International Journal of Computer Vision 65(1-2) (2005)
Google Scholar
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(10) (2005)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
Article Google Scholar
Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: Proc. of ICCV (2003)
Google Scholar
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proc. of CVPR (2003)
Google Scholar
Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proc. of ACM MIR (2007)
Google Scholar
Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision 73(2), 213–238 (2007)
Article Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Proc. of VSPETS (2005)
Google Scholar
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words. International Journal of Computer Vision 79(3), 299–318 (2008)
Article Google Scholar
Wang, F., Jiang, Y.-G., Ngo, C.-W.: Video event detection using motion relativity and visual relatedness. In: Proc. of ACM Multimedia (2008)
Google Scholar
Xu, D., Chang, S.-F.: Video event recognition using kernel methods with multilevel temporal alignment. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(11) (2008)
Google Scholar
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3), 443–453 (1970)
Article Google Scholar
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research (2002)
Google Scholar
Leslie, C., Eskin, E., Weston, J., Noble, W.S.: Mismatch string kernels for SVM protein classification. In: Proc. of NIPS (2003)
Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proc. of ACM Workshop on Computational Learning Theory (1992)
Google Scholar
Sadlier, D.A., O’Connor, N.E.: Event detection in field sports video using audio-visual features and a support vector machine. IEEE Transactions on Circuits and Systems for Video Technology 15(10), 1225–1233 (2005)
Article Google Scholar
Neuhaus, M., Bunke, H.: Edit distance-based kernel functions for structural pattern classification. Pattern Recognition 39(10), 1852–1863 (2006)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Media Integration and Communication Center, University of Florence, Italy
Lamberto Ballan, Marco Bertini, Alberto Del Bimbo & Giuseppe Serra

Authors

Lamberto Ballan
View author publications
You can also search for this author in PubMed Google Scholar
Marco Bertini
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Del Bimbo
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Serra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Ingegneria dell’Informazione e Ingegneria Elettrica, Università di Salerno, Via Ponte Don Melillo, 1, 84084, Fisciano (SA), Italy
Pasquale Foggia
Dipartimento di Informatica e Sistemistica, Università di Napoli Federico II, Via Claudio, 21, I-80125, Napoli, Italy
Carlo Sansone
Dipartimento di Ingegneria dell’Informazione ed Ingegneria Elettrica, Università di Salerno, via P.te Don Melillo, I-84084, Fisciano (SA), Italy
Mario Vento

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ballan, L., Bertini, M., Del Bimbo, A., Serra, G. (2009). Video Event Classification Using Bag of Words and String Kernels. In: Foggia, P., Sansone, C., Vento, M. (eds) Image Analysis and Processing – ICIAP 2009. ICIAP 2009. Lecture Notes in Computer Science, vol 5716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04146-4_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-04146-4_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04145-7
Online ISBN: 978-3-642-04146-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Video Event Classification Using Bag of Words and String Kernels

Abstract

Chapter PDF

Similar content being viewed by others

Semantic Concept Detection Using Dense Codeword Motion

Human Action Classification Using an Extended BoW Formalism

Research on motion recognition algorithm based on bag-of-words model

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Video Event Classification Using Bag of Words and String Kernels

Abstract

Chapter PDF

Similar content being viewed by others

Semantic Concept Detection Using Dense Codeword Motion

Human Action Classification Using an Extended BoW Formalism

Research on motion recognition algorithm based on bag-of-words model

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation