Skip to main content
Log in

Enhanced multicore–manycore interaction in high-performance video encoding

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

This paper presents an efficient cooperative interaction between multicore (CPU) and manycore (GPU) resources in the design of a high-performance video encoder. The proposed technique, applied to the well-established and highly optimized VP8 encoding format, can achieve a significant speed-up with respect to the mostly optimized software encoder (up to \(\times\)6), with minimum degradation of the visual quality and low processing latency. This result has been obtained through a highly optimized CPU–GPU interaction, the exploitation of specific GPU features, and a modified search algorithm specifically adapted to the GPU execution model. Several experimental results are reported and discussed, confirming the effectiveness of the proposed technique. The presented approach, though implemented for the VP8 standard, is of general interest, as it could be applied to any other video encoding scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. libvpx code repository. https://chromium.googlesource.com/webm/libvpx (2016)

  2. WebM: an Open Web Media Project. https://www.webmproject.org (2016)

  3. WebM project—VP8 encode parameter guide. https://www.webmproject.org/docs/encoder-parameters (2016)

  4. Italtel-Unimi—github repository. https://github.com/Italtel-Unimi (2017)

  5. NVIDIA CUDA C programming guide, Version 8.0. NVIDIA Corp. http://docs.nvidia.com/cuda/cuda-c-programming-guide (2017)

  6. NVIDIA CUDA Toolkit 8.0, NVIDIA Corp. https://developer.nvidia.com/cuda-toolkit (2017)

  7. NVIDIA, Parallel Thread Execution ISA—Application Guide, v5.0, NVIDIA Corp. http://docs.nvidia.com/cuda/pdf/ ptx_isa_5.0.pdf (2017)

  8. libx264 project and code repository. http://www.videolan.org/developers/x264.html (2017)

  9. WebM Project—Contribution guidelines. Tech. rep. https://chromium.googlesource.com/webm/contributor-guide (2017)

  10. Xiph.org video test media (derf’s collection). https://media.xiph.org/video/derf (2017)

  11. Albanese, A., Crosta, P., Meani, C., Paglierani, P.: Gpu-accelerated video transcoding unit for multi-access edge computing scenarios. In: The Sixteenth International Conference on Networks (ICN2017), Venice, 23–27 April, pp. 143–147 (2017)

  12. Bankoski, J., Koleszar, J., Quillio, L., Salonen, J., Wilkins, P., Xu, Y.: VP8 data format and decoding guide (rfc 6386). http://www.rfc-editor.org/info/rfc6386 (2011)

  13. Bankoski, J., Wilkins, P., Xu, Y..: Technical overview of VP8, an open source video codec for the web. In: 2011 IEEE International Conference on Multimedia and Expo, pp. 1–6 (2011)

  14. Cheng, J., Grossman, M., McKercher, T.: Professional CUDA C Programming. Wiley, Indianapolis, Indiana (2014)

    Google Scholar 

  15. Cheung, N.M., Fan, X., Au, O.C., Kung, M.C.: Video coding on multicore graphics processors. IEEE Signal Process. Mag. 27(2), 79–89 (2010)

    Article  Google Scholar 

  16. CISCO Corporation: The Zettabyte Era: Trends and Analysis. http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/vni-hyperconnectivity-wp.html (2017)

  17. Comi, P., Crosta, P.S., Beccari, M., Paglierani, P., Grossi, G., Pedersini, F., Petrini, A.: Hardware-accelerated high-resolution video coding in virtual network functions. In: 2016 European Conference on Networks and Communications (EuCNC), pp. 32–36 (2016)

  18. Hayes, A.B., Li, L., Chavarría-Miranda, D., Song, S.L., Zhang, E.Z.: Orion: A framework for gpu occupancy tuning. In: Proceedings of the 17th International Middleware Conference, Middleware ’16, pp. 18:1–18:13. ACM, New York (2016)

  19. Jiang, W., Wang, P., Long, M., Jin, H.: A novel parallelized motion estimation algorithm for GPU based video encoding. In: 2016 IEEE 17th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM), pp. 1–8 (2016)

  20. Ko, Y., Yi, Y., Ha, S.: An efficient parallel motion estimation algorithm and x264 parallelization in cuda. In: Proceedings of the 2011 Conference on Design Architectures for Signal Image Processing (DASIP), pp. 1–8 (2011)

  21. Marth, E., Marcus, G.: Parallelization of the x264 encoder using opencl. In: International Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 2010, Los Angeles, July 26–30, 2010, Poster Proceedings, p. 72:1 (2010)

  22. NVIDIA Corporation: High performance video encoding with NVIDIA GPUs. In: 2016 GPU Technology Conference. http://on-demand.gputechconf.com/gtc/2016/ presentation/s6226-abhijit-patait-high-performance-video.pdf (2016)

  23. Paglierani, P., et al.: Network functions implementation and testing. Tech. rep., T-NOVA Project Deliverable D5.31 http://www.t-nova.eu/results (2015)

  24. Paglierani, P., Grossi, G., Pedersini, F., Petrini, A.: GPU-based VP8 encoding: Performance in native and virtualized environments. In: 2016 International Conference on Telecommunications and Multimedia, TEMU 2016, pp. 1–5 (2016)

  25. Radicke, S., HaHn, J.-U., Wang, Q., Grecos, C.: Many-core HEVC encoding based on wavefront parallel processing and GPU-accelerated motion estimation, In: Obaidat, M., Holzinger, A., Filipe, J. (eds.) E-Business and telecommunications: 11th international joint conference, ICETE 2014, Vienna, Austria, 28–30 August 2014. Communications in computer and information science, vol. 554, pp. 393–417. Springer (2015)

  26. Sankaraiah, S., Shuan, L.H., Eswaran, C., Abdullah, J.: Performance optimization of video coding process on multi-core platform using gop level parallelism. Int. J. Parallel Program. 42(6), 931–947 (2014)

    Article  Google Scholar 

  27. Shah, N.N., Dalal, U.D., Prajapati, P.H.: Multi-point search pattern for fast search motion estimation of high resolution video coding. In: I. J. Image, Graphics and Signal Processing (IJIGSP), pp. 60–68 (2015)

  28. Shahid, M.U., Ahmed, A., Martina, M., Masera, G., Magli, E.: Parallel h.264/AVC fast rate-distortion optimized motion estimation by using a graphics processing unit and dedicated hardware. IEEE Trans. Circuits Syst. Video Technol. 25(4), 701–715 (2015)

    Article  Google Scholar 

  29. Zhu, S., Ma, K.K.: A new diamond search algorithm for fast block-matching motion estimation. IEEE Trans. Image Process. 9(2), 287–290 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giuliano Grossi.

Additional information

This work was undertaken under the EU FP7-ICT (7th Framework Programme—Information and Communication Technologies) T-NOVA Project, partially funded by the European Commission under the Grant Agreement No. FP7-ICT-619520.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Grossi, G., Paglierani, P., Pedersini, F. et al. Enhanced multicore–manycore interaction in high-performance video encoding. J Real-Time Image Proc 17, 887–902 (2020). https://doi.org/10.1007/s11554-018-0834-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-018-0834-4

Keywords

Navigation