Recognizing Complex Events in Real Movies by Audio Features

  • Ji-Xiang Du
  • Yi-Lan Guo
  • Chuan-Min Zhai
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 304)


This paper proposes a novel approach to taking audio feature into account for better event recognition performance in recognizing complex events in real movies. Firstly, local-space time feature and audio feature are extracted, and then an individual video sequence is represented as a SOFM density map, finally we integrate such density map with SVM for recognition events. Using the public Hollywood dataset, the presented result justify the proposed method explicitly improve the average accuracy and average precision compared to other relative approaches.


local space-time features audio feature self-organization feature map event recognition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine Recognition of Human Activities: A Survey. IEEE Transactions on Circuits and Systems for Video Technology, 1473–1488 (2008)Google Scholar
  2. 2.
    Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV, pp. 432–439 (2003)Google Scholar
  3. 3.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via Sparse spatio-temporal Features. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)Google Scholar
  4. 4.
    Willems, G., Tuytelaars, T., Van Gool, L.: An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Kläser, A., Marszałek, M., Schmid, C.: A Spatio-temporal Descriptor Based on 3D-gradients. In: BMVC (2008)Google Scholar
  6. 6.
    Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning Realistic Human Actions from Movies. In: CVPR, pp. 1–8 (2008)Google Scholar
  7. 7.
    Arrighi, R., Marini, F., Burr, D.: Meaningful Auditory in Formation Enhances Perception of Visual Biological Motion. Journal of Vision, 1–7 (2009)Google Scholar
  8. 8.
    Snoek, C.G., Worring, M.: Multimodal Video Indexing: A Review of the State-of-the-art. Multimedia Tools and Applications, 5–35 (2005)Google Scholar
  9. 9.
    Kim, H.-G., Jeong, J., Kim, J.-H., Kim, J.: Real-time Highlight Detection in Baseball Video for TVs with Time-shift Function. IEEE Transactions on Consumer Electronics, 831–838 (2008)Google Scholar
  10. 10.
    Guo, Y.-L., Du, J.-X., Zhai, C.-M.: Event Recognition Based on a Local Space-Time Interest Points and Self-Organization Feature Map Method. In: Huang, D.-S., Gan, Y., Bevilacqua, V., Figueroa, J.C. (eds.) ICIC 2011. LNCS, vol. 6838, pp. 242–249. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  11. 11.
    Zhu, L.: Insect Sound Recognition Based on MFCC and PNN. In: 2011 International Conference on Multimedia and Signal Processing, pp. 42–46 (2011)Google Scholar
  12. 12.
    Xue, Z., Yan, L.: Self-organizing Map as a New Method for Clustering and Data Analysis. In: Proceeding of 1993 International Joint Conference on Neural Networks, pp. 2448–2451 (1993)Google Scholar
  13. 13.
    Marszalek, M., Laptev, I., Schmid, C.: Actions in Context. In: CVPR 2009, pp. 2929–2936 (2009)Google Scholar
  14. 14.
    Wang, H., Muneeb Ullah, M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of Local Spatio-temporal Features for Action Recognition. In: BMVC 2009 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Ji-Xiang Du
    • 1
  • Yi-Lan Guo
    • 1
  • Chuan-Min Zhai
    • 1
  1. 1.Department of Computer Science and TechnologyHuaqiao UniversityXiamenChina

Personalised recommendations