Advertisement

Spatiotemporal CNNs for Pornography Detection in Videos

  • Murilo Varges da SilvaEmail author
  • Aparecido Nilceu Marana
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11401)

Abstract

With the increasing use of social networks and mobile devices, the number of videos posted on the Internet is growing exponentially. Among the inappropriate contents published on the Internet, pornography is one of the most worrying as it can be accessed by teens and children. Two spatiotemporal CNNs, VGG-C3D CNN and ResNet R\((2+1)\)D CNN, were assessed for pornography detection in videos in the present study. Experimental results using the Pornography-800 dataset showed that these spatiotemporal CNNs performed better than some state-of-the-art methods based on bag of visual words and are competitive with other CNN-based approaches, reaching accuracy of \(95.1\%\).

Keywords

Pornography detection Spatiotemporal CNN 3D CNN Video classification 

Notes

Acknowledgments

We thank NVIDIA Corporation for the donation of the GPU used in this study. This study was financed in part by CAPES - Brazil (Finance Code 001).

References

  1. 1.
    Avila, S., Thome, N., Cord, M., Valle, E., Araújo, A.D.A.: Bossa: extended bow formalism for image classification. In: 18th IEEE ICIP, pp. 2909–2912 (2011)Google Scholar
  2. 2.
    Avila, S., Thome, N., Cord, M., Valle, E., Araújo, A.D.A.: Pooling in image representation: the visual codeword point of view. Comput. Vis. Image Underst. 117(5), 453–465 (2013)Google Scholar
  3. 3.
    Caetano, C., Avila, S., Guimarães, S., Araújo, A.D.A.: Pornography detection using BossaNova video descriptor. In: 2014 22nd (EUSIPCO), pp. 1681–1685 (2014)Google Scholar
  4. 4.
    Caetano, C., Avila, S., Schwartz, W.R., Guimarães, S.J.F., Araújo, A.D.A.: A mid-level video representation based on binary descriptors: a case study for pornography detection. CoRR abs/1605.03804 (2016)Google Scholar
  5. 5.
    Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. CoRR abs/1705.07750 (2017)Google Scholar
  6. 6.
    Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006).  https://doi.org/10.1007/11744047_33CrossRefGoogle Scholar
  7. 7.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)Google Scholar
  8. 8.
    Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)CrossRefGoogle Scholar
  9. 9.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
  10. 10.
    Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: Everingham, M., Needham, C., Fraile, R. (eds.) BMVC 2008–19th British Machine Vision Conference, pp. 275:1–10. British Machine Vision Association, Leeds, United Kingdom (2008)Google Scholar
  11. 11.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS 2012, pp. 1097–1105. Curran Associates Inc., USA (2012)Google Scholar
  12. 12.
    Laptev, I., Lindeberg, T.: Space-time interest points. In: Proceedings Ninth IEEE International Conference on Computer Vision, vol. 1, pp. 432–439 (2003)Google Scholar
  13. 13.
    van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)zbMATHGoogle Scholar
  14. 14.
    Moreira, D., et al.: Pornography classification: the hidden clues invideo spacetime. Forensic Sci. Int. 268, 46–61 (2016)CrossRefGoogle Scholar
  15. 15.
    Moustafa, M.: Applying deep learning to classify pornographic images and videos. CoRR abs/1511.08899 (2015)Google Scholar
  16. 16.
    Perez, M., et al.: Video pornography detection through deep learning techniques and motion information. Neurocomputing 230, 279–293 (2017)CrossRefGoogle Scholar
  17. 17.
    Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM International Conference on Multimedia, MM 2007, pp. 357–360. ACM, New York (2007)Google Scholar
  18. 18.
    Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 568–576. Curran Associates, Inc. (2014)Google Scholar
  19. 19.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)Google Scholar
  20. 20.
    de Souza, F.D.M., Valle, E., Cámara-Chávez, G., Araújo, A.: An evaluation on color invariant based local spatiotemporal features for action recognition. In: IEEE SIBGRAPI (2012)Google Scholar
  21. 21.
    Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: IEEE ICCV, pp. 4489–4497. Washington, DC, USA (2015)Google Scholar
  22. 22.
    Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. CoRR abs/1711.11248 (2017)Google Scholar
  23. 23.
    Valle, E., de Avila, S., da Luz Jr., A., de Souza, F., Coelho, M., Araújo, A.: Content-based filtering for video sharing social networks. CoRR abs/1101.2427 (2011)Google Scholar
  24. 24.
    Wang, H., Schmid, C.: Action recognition with improved trajectories. In: 2013 IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)Google Scholar
  25. 25.
    Xie, S., Sun, C., Huang, J., Tu, Z., Murphy, K.: Rethinking spatiotemporal feature learning for video understanding. CoRR abs/1712.04851 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.UFSCar - Federal University of Sao CarlosSao CarlosBrazil
  2. 2.IFSP - Federal Institute of Education of Sao PauloBiriguiBrazil
  3. 3.UNESP - Sao Paulo State UniversityBauruBrazil

Personalised recommendations