Local Expert Forest of Score Fusion for Video Event Classification

  • Jingchen Liu
  • Scott McCloskey
  • Yanxi Liu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7576)


We address the problem of complicated event categorization from a large dataset of videos “in the wild”, where multiple classifiers are applied independently to evaluate each video with a ‘likelihood’ score. The core contribution of this paper is a local expert forest model for meta-level score fusion for event detection under heavily imbalanced class distributions. Our motivation is to adapt to performance variations of the classifiers in different regions of the score space, using a divide-and-conquer technique. We propose a novel method to partition the likelihood-space, being sensitive to local label distributions in imbalanced data, and train a pair of locally optimized experts each time. Multiple pairs of experts based on different partitions (‘trees’) form a ‘forest’, balancing local adaptivity and over-fitting of the model. As a result, our model disregards classifiers in regions of the score space where their performance is bad, achieving both local source selection and fusion. We experiment with the TRECVID Multimedia Event Detection (MED) dataset, detecting 15 complicated events from around 34k video clips comprising more than 1000 hours, and demonstrate superior performance compared to other score-level fusion methods.


Event Category Local Expert Video Event Binary Partition Score Fusion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: ICPR (2004)Google Scholar
  2. 2.
    Wong, S., Kim, T., Cipolla, R.: Learning motion categories using both semantics and structural information. In: CVPR (2007)Google Scholar
  3. 3.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)Google Scholar
  4. 4.
    Yuan, J., Liu, Z., Wu, Y.: Discriminative subvolume search for efficient action detection. In: CVPR (2009)Google Scholar
  5. 5.
    Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: CVPR (2009)Google Scholar
  6. 6.
    Over, P., Awad, G., Fiscus, J., Antonishek, B., Smeaton, A., Kraaij, W., Quenot, G.: Trecvid 2010 – an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2010, NIST, USA (2011)Google Scholar
  7. 7.
    Gong, S., Xiang, T.: Recognition of group activities using dynamic probabilistic networks. In: ICCV (2003)Google Scholar
  8. 8.
    Yu, G., Goussies, N.A., Yuan, J., Liu, Z.: Fast action detection via discriminative random forest voting and top-k subvolume search. Multimedia 13, 507–517 (2011)CrossRefGoogle Scholar
  9. 9.
    Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The det curve in assessment of detection task performance. In: European Conf. on Speech Communication and Technology (1997)Google Scholar
  10. 10.
    Dass, S., Nandakumar, K., Jain, A.: A Principled Approach to Score Level Fusion in Multimodal Biometric Systems. In: Kanade, T., Jain, A., Ratha, N.K. (eds.) AVBPA 2005. LNCS, vol. 3546, pp. 1049–1058. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Yin, Z., Porikli, F., Collins, R.: Likelihood map fusion for visual object tracking. In: BMVC (2008)Google Scholar
  12. 12.
    Mittal, A., Zisserman, A., Torr, P.: Hand detection using multiple proposals. In: BMVC (2011)Google Scholar
  13. 13.
    Ma, C., Lee, C.: An efficient gradient computation approach to discriminative fusion optimization in semantic concept detection. In: ICPR (2008)Google Scholar
  14. 14.
    Gao, S., Wu, W., Lee, C., Chua, T.S.: A maximal figure-of-merit (mfom)-learning approach to robust classifier design for text categorization. ACM Trans. on Information Systems 42, 145–175 (2006)Google Scholar
  15. 15.
    Tseng, B., Lin, C., Naphade, M., Natsev, A., Smith, J.: Normalized classifier fusion for semantic visual concept detection. In: ICIP (2003)Google Scholar
  16. 16.
    Bach, F., Heckerman, D., Horvitz, E.: On the path to an ideal roc curve: considering cost asymmetry in learning classifiers. In: Artificial Intelligence and Statistics (2005)Google Scholar
  17. 17.
    Gao, S., Lee, C., Lim, J.: An ensemble classifier learning approach to roc optimization. In: ICPR (2006)Google Scholar
  18. 18.
    Jordan, M.I.: Hierarchical mixtures of experts and the em algorithm. Neural Computation 6, 181–214 (1994)CrossRefGoogle Scholar
  19. 19.
    Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 1.21 (2011),
  20. 20.
    Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC (2008)Google Scholar
  21. 21.
    Li, L., Su, H., Xing, E., Li, F.: Object bank: A high-level image representation for scene classification & semantic feature sparsification. In: Neural Information Processing Systems (NIPS), Vancouver, Canada (2010)Google Scholar
  22. 22.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 42, 145–175 (2001)zbMATHCrossRefGoogle Scholar
  23. 23.
    Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), CrossRefGoogle Scholar
  24. 24.
    Herschtal, A., Raskutti, B.: Optimizing area under the roc curve using gradient descent. In: ICML (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jingchen Liu
    • 1
  • Scott McCloskey
    • 2
  • Yanxi Liu
    • 1
  1. 1.Penn State UniversityState CollegeUSA
  2. 2.Honeywell LabsGolden ValleyUSA

Personalised recommendations