Using web sources for improving video categorization

Perea-Ortega, José M.; Montejo-Ráez, Arturo; Martín-Valdivia, M. Teresa; Ureña-López, L. Alfonso

doi:10.1007/s10844-010-0123-6

Using web sources for improving video categorization

Published: 23 April 2010

Volume 36, pages 117–130, (2011)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

José M. Perea-Ortega¹,
Arturo Montejo-Ráez¹,
M. Teresa Martín-Valdivia¹ &
…
L. Alfonso Ureña-López¹

154 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, several experiments about video categorization using a supervised learning approach are presented. To this end, the VideoCLEF 2008 evaluation forum has been chosen as experimental framework. After an analysis of the VideoCLEF corpus, it was found that video transcriptions are not the best source of information in order to identify the thematic of video streams. Therefore, two web-based corpora have been generated in the aim of adding more informational sources by integrating documents from Wikipedia articles and Google searches. A number of supervised categorization experiments using the test data of VideoCLEF have been accomplished. Several machine learning algorithms have been proved to validate the effect of the corpus on the final results: Naïve Bayes, K-nearest-neighbors (KNN), Support Vectors Machine (SVM) and the j48 decision tree. The results obtained show that web can be a useful source of information for generating classification models for video data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Video Genre Classification Algorithm by Keyframe Relevance

Supervised Video Genre Classification Using Optimum-Path Forest

Automatic Genre Classification from Videos

Notes

http://www.wikipedia.org/
http://www.google.com/
http://www.sigwac.org.uk/
http://www.clef-campaign.org/
http://www.cdvp.dcu.ie/VideoCLEF/
SMART Project. Stop word List for English Information Retrieval, available in http://www.unine.ch/info/clef/englishST.txt.
Snowball stemmer is available in http://snowball.tartarus.org/.
RapidMiner is available from http://rapid-i.com/.
Weka is a set of data mining algorithms and tools easily integrated in RapidMiner. More information is available at http://www.cs.waikato.ac.nz/ml/weka/.

References

Arni, T., Clough, P., Sanderson, M., & Grubinger, M. (2009). Overview of the ImageCLEFphoto 2008 photographic retrieval task. In CLEF. Lecture notes in computer science (Vol. 5706, pp. 500–511). Springer.
Bargeron, D., Gupta, A., Grudin, J., & Sanocki, E. (1999). Annotations for streaming video on the web: System design and usage studies. In Proceedings of the eighth international world-wide web conference.
Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., & Harshman, R. A. (1990). Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6), 391–407.
Article Google Scholar
Díaz-Galiano, M. C., García-Cumbreras, M. A., Martín-Valdivia, M. T., Montejo-Ráez, A., & Ureña López, L. A. (2005). The University of Jaén at Imageclef 2005: Adhoc and medical tasks. In C. Peters, F. C. Gey, J. Gonzalo, H. Müller, G. J. F. Jones, M. Kluck, et al. (Eds.), CLEF. Lecture notes in computer science (Vol. 4022, pp. 612–621). Springer.
Díaz-Galiano, M. C., García-Cumbreras, M. A., Martín-Valdivia, M. T., Montejo-Ráez, A., & Ureña López, L. A. (2006). Using information gain to improve the Imageclef 2006 collection. In C. Peters, P. Clough, F. C. Gey, J. Karlgren, B. Magnini, D. W. Oard, et al. (Eds.), CLEF. Lecture notes in computer science (Vol. 4730, pp. 711–714). Springer.
Díaz-Galiano, M. C., García-Cumbreras, M. A., Martín-Valdivia, M. T., Montejo-Ráez, A., & Ureña López, L. A. (2007). Integrating mesh ontology to improve medical information retrieval. In C. Peters, V. Jijkoun, T. Mandl, H. Müller, D. W. Oard, A. Peñas, et al. (Eds.), CLEF. Lecture notes in computer science (Vol. 5152, pp. 601–606). Springer.
Díaz-Galiano, M. C., García-Cumbreras, M. A., Martín-Valdivia, M. T., Montejo-Ráez, A., & Ureña López, L. A. (2008). SINAI at ImageCLEFmed 2008. In Proceedings of the cross language evaluation forum (CLEF 2008).
Díaz-Galiano, M. C., Perea-Ortega, J. M., Martín-Valdivia, M. T., Montejo-Ráez, A., & Ureña López, L. A. (2007). SINAI at TRECVID 2007. In Proceedings of the TRECVID 2007 workshop (TRECVID 2007).
Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10, 1895–1923.
Article Google Scholar
Henning, M., Kalpathy-Cramer, J., Kahn, C. E., Hatt, W., Bedrick, S., & Hersh, W. R. (2009). Overview of the ImageCLEFmed 2008 medical image retrieval task. In CLEF. Lecture notes in computer science (Vol. 5706, pp. 512–522). Springer.
Lam, S. L. Y., & Lee, D. L. (1999). Feature reduction for neural network based text categorization. In DASFAA ’99: Proceedings of the sixth international conference on database systems for advanced applications (pp. 195–202). Washington, DC: IEEE Computer Society.
Chapter Google Scholar
Larson, M., Newman, E., & Jones, G. (2009). Overview of VideoCLEF 2008: Automatic generation of topic-based feeds for dual language audio-visual content. In Evaluating systems for multilingual and multimodal information access. Lecture notes in computer science (Vol. 5706, pp. 906–917). Springer.
Lewis, D. D. (1991). Evaluating text categorization. In Proceedings of speech and natural language workshop (pp. 312–318). Morgan Kaufmann.
Li, J., Chang, S. F., Lesk, M., Lienhart, R., Luo, J., & Smeulders, A. W. M. (2007) New challenges in multimedia research for the increasingly connected and fast growing digital society. In J. Z. Wang, N. Boujemaa, A. D. Bimbo, & J. Li (Eds.), Multimedia information retrieval (pp. 3–10). ACM.
Martín-Valdivia, M. T., Díaz-Galiano, M. C., Montejo-Ráez, A., & Ureña López, L. A. (2008). Using information gain to improve multi-modal information retrieval systems. Information Processing and Management, 44(3), 1146–1158.
Google Scholar
Mitchell, T. M. (1997). Machine learning. New York: McGraw-Hill.
MATH Google Scholar
Montejo-Ráez, A., & Ureña López, L. A. (2006). Binary classifiers versus adaboost for labeling of digital documents. Sociedad Española para el Procesamiento del Lenguaje Natural, 37, 319–326.
Google Scholar
Perea-Ortega, J. M,, Montejo-Ráez, A., Martín-Valdivia, M. T., Díaz-Galiano, M. C., & Ureña-López, L. A. (2008). SINAI at VideoCLEF 2008. In Proceedings of the cross language evaluation forum (CLEF 2008).
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47.
Article Google Scholar
Smeaton, A. F., Over, P., & Kraaij, W. (2006). Evaluation campaigns and TRECVid. In J. Z. Wang, N. Boujemaa, & Y. Chen (Eds.), Multimedia information retrieval (pp. 321–330). ACM.
Volkmer, T., Smith, J. R., & Natsev, A. (2005). A web-based system for collaborative annotation of large image and video collections: An evaluation and user study. In ACM multimedia (pp. 892–901). ACM.
Yamamoto, D., & Nagao, K. (2004). iVAS: Web-based video annotation system and its applications. In 3rd international semantic web conference (ISWC2004) (pp. 7–11).

Download references

Acknowledgements

This paper has been partially supported by a grant from the Spanish Government, project TEXT-COOL 2.0 (TIN2009-13391-C04-02), project GEOASIS (P08-TIC-41999) granted by the Andalusian Government and project RFC/PP2008/UJA-08-16-14. We would like to thank the Cross-Language Evaluation Forum in general and Carol Peters in particular.

Author information

Authors and Affiliations

SINAI Research Group, Computer Science Department, University of Jaén, 23008, Jaén, Spain
José M. Perea-Ortega, Arturo Montejo-Ráez, M. Teresa Martín-Valdivia & L. Alfonso Ureña-López

Authors

José M. Perea-Ortega
View author publications
You can also search for this author in PubMed Google Scholar
Arturo Montejo-Ráez
View author publications
You can also search for this author in PubMed Google Scholar
M. Teresa Martín-Valdivia
View author publications
You can also search for this author in PubMed Google Scholar
L. Alfonso Ureña-López
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José M. Perea-Ortega.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Perea-Ortega, J.M., Montejo-Ráez, A., Martín-Valdivia, M.T. et al. Using web sources for improving video categorization. J Intell Inf Syst 36, 117–130 (2011). https://doi.org/10.1007/s10844-010-0123-6

Download citation

Received: 02 October 2009
Revised: 06 April 2010
Accepted: 08 April 2010
Published: 23 April 2010
Issue Date: February 2011
DOI: https://doi.org/10.1007/s10844-010-0123-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using web sources for improving video categorization

Abstract

Access this article

Similar content being viewed by others

A Novel Video Genre Classification Algorithm by Keyframe Relevance

Supervised Video Genre Classification Using Optimum-Path Forest

Automatic Genre Classification from Videos

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using web sources for improving video categorization

Abstract

Access this article

Similar content being viewed by others

A Novel Video Genre Classification Algorithm by Keyframe Relevance

Supervised Video Genre Classification Using Optimum-Path Forest

Automatic Genre Classification from Videos

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation