Automatic Representative Framelets Selection for Human Action Recognition in Surveillance Videos

  • K. Kiruba
  • D. Shiloah ElizabethEmail author
  • C. Sunil Retmin Raj
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1118)


In computer vision, intelligent video surveillance is one of the most challenging tasks as it requires real-time processing and high reliability. This paper addresses the problem of processing the long-duration videos for complex activity recognition using automatic representative framelets selection approach. This paper proposes an automatic representative framelets selection approach that uses faster region-based convolutional neural network (R-CNN) object detector to find the human as region of interest (ROI). Detected bounding box is extracted and corner detection method is used to find the significant key points. The Lucas–Kanade (LK) optical flow is used to generate the trajectories from frame to frame. The motion velocity magnitude has been calculated for each frame by accumulating the trajectories or tracks. Motion velocity magnitude is visualized. From the visualization, each hill shape represents a meaningful atomic action. Density-based spatial clustering of applications with noise (DBSCAN) is done to group the actionlets according to their similarities. Single frame is selected from each cluster randomly as a representative framelet. Our experiments demonstrate that the proposed method leads to significant improvement in activity recognition system in terms of space and time consumption. This paper presents representative framelets selection results for four benchmark dataset, namely KTH dataset, Weizmann dataset, UCF-11 dataset, IXMAS dataset and one synthetic dataset with challenging and realistic environment.


Human action recognition Faster R-CNN Harris corner detection LK optical flow DBSCAN clustering 


  1. 1.
    Jian, M., Zhang, S., Wu, L., Zhang, S., Wang, X., He, Y.: Deep key frame extraction for sport training. Neurocomputing 7, 147–156 (2019)CrossRefGoogle Scholar
  2. 2.
    Tang, H., Liu, H., Xiao, W., Sebe, N.: Fast and robust dynamic hand gesture recognition via key frames extraction and feature fusion. Neurocomputing, pp. 424–433 (2019)Google Scholar
  3. 3.
    Kiruba, K., Elizabeth, S.D., Raj, S.R.: Hexagonal Volume Local Binary Pattern (H-VLBP) with deep stacked autoencoder for human action recognition. Cogn. Syst. Res. 58, 71–93 (2019)CrossRefGoogle Scholar
  4. 4.
    Sahu, K., Verma, S.: Key frame extraction from video sequence: a survey. Int. Res. J. Eng. Technol. (IRJET), 4(5) (2017)Google Scholar
  5. 5.
    Azra Nasreen, S.G.: Key frame extraction from videos - a survey. Int. J. Comput. Sci. Commun. Netw. 3(3), 194–198 (2013)Google Scholar
  6. 6.
    Himani Parekh, P.N.: A survey on key frame based video summarization techniques. Int. J. Eng. Res. Comput. Sci. Eng. (IJERCSE), 4(11) (2017)Google Scholar
  7. 7.
    Cooper, M., Foote, J.: Discriminative techniques for key frame selection. In: 2005 IEEE International Conference on Multimedia and Expo, pp. 4 (2005)Google Scholar
  8. 8.
    Ejaz, N., Mehmood, I., Baik, S.W.: Efficient visual attention based framework for extracting key frames from videos. Sig. Process. Image Commun. 28(1), 34–44 (2013)CrossRefGoogle Scholar
  9. 9.
    Huang, C., Wang, H.: Novel key-frames selection framework for comprehensive video summarization. IEEE Trans Circuits and Systems for Video Technology (2018)Google Scholar
  10. 10.
    Mithlesh, C., Shukala, D.: A case study of key frame extraction techniques. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 5(3) (2016)Google Scholar
  11. 11.
    Kumar, G.N., Reddy, V.: Key frame extraction using rough set theory for video retrieval. Soft Comput. Sign. Process. 751–757 (2019)Google Scholar
  12. 12.
    Kumar, K., Deepti, D., Navjot, S.: Key-lectures: keyframes extraction in video lectures. Mach. Intell. Sign. Anal. 453–459 (2019)Google Scholar
  13. 13.
    Lingam, K.M., Reddy, V.: Key frame extraction using content relative thresholding technique for video retrieval. Soft Comput. Sign. Process. 811–820 (2019)Google Scholar
  14. 14.
    Mounika, B.R., Prakash, O., Khare, A.: Fusion of zero-normalized pixel correlation coefficient and higher-order color moments for key frame extraction. Recent Trends Commun. Comput. Electron. 357–364 (2019)Google Scholar
  15. 15.
    Dai, J., Li, Y., He, K., Sun, J.: R-fcn: Object detection via region-based fully convolutional networks, In: 30th Conference on Neural Information Processing Systems (NIPS), pp. 379–387 (2016)Google Scholar
  16. 16.
    Girshick, R.:Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)Google Scholar
  17. 17.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 91–99 (2015)Google Scholar
  18. 18.
    Li, K., Fu, Y.: Prediction of human activity by discovering temporal sequence patterns. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1644–1657 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • K. Kiruba
    • 1
  • D. Shiloah Elizabeth
    • 1
    Email author
  • C. Sunil Retmin Raj
    • 2
  1. 1.CSE, Anna UniversityChennaiIndia
  2. 2.IT, Anna UniversityChennaiIndia

Personalised recommendations