Journal of Real-Time Image Processing

, Volume 16, Issue 6, pp 2127–2146 | Cite as

Architecture for parallel marker-free variable length streams decoding

  • Yousef BaroudEmail author
  • José Manuel Mariños Velarde
  • Zhe Wang
  • Steffen Kieß
  • Seyyed Mahdi Najmabadi
  • Jajnabalkya Guhathakurta
  • Sven Simon
Original Research Paper


Due to throughput requirements above 1 gigapixel/sec for the real-time compression of modern image and video data streams, parallelism for encoding and decoding is inevitable. To achieve parallel decoding, a well-established technique is to insert markers into the variable length code (VLC) stream. By locating markers, it is then possible to extract the sub-streams that are, in turn, decoded in parallel. The use of markers adversely affects compression especially when a high degree of parallelism is required. In this paper, we propose an architecture of a marker-free parallel decoding approach of VLC streams. Instead of multiple local entropy decoders, the proposed architecture is based on using a single parallel entropy decoder in conjunction with a novel format to construct the VLC stream. The approach runs at high clock rates supporting parallelism to a high number of decoders. A synthesized clock frequency well above 110 MHz is achieved for up to 20 decoders on a medium-sized FPGA.


Parallel image decoding Hardware architectures Marker-free Variable length codes FPGA Parallel Golomb decoder 



This work is part of the project Intelligenter Optischer Sensor zur 2D/3D Objekt-Erfassung und dimensionellen Messtechnik (IOS23) which is financed by the Baden-Württemberg-Stiftung gGmbH.


  1. 1.
    Recommendation ITU-R BT.2020-2: Parameter values for ultra-high definition television systems for production and international programme exchange (2015)Google Scholar
  2. 2.
    ITU-T Recommendation H.264 : Advanced video coding for generic audiovisual services. (2007)
  3. 3.
    ITU-T Recommendation ITU-T H.265: High efficiency video coding. (2013)
  4. 4.
    Meenderinck, C., Azevedo, A., Juurlink, B., Alvarez Mesa, M., Ramirez, A.: Parallel scalability of video decoders. J. Signal Process. Syst. 57(2), 173–194 (2009). doi: 10.1007/s11265-008-0256-9 CrossRefGoogle Scholar
  5. 5.
    Wu, N., Wen, M., Ren, H.S.J., Zhang, C.: A parallel H.264 encoder with CUDA: mapping and evaluation. In: 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS), pp. 276–283 (2012). doi: 10.1109/ICPADS.2012.46
  6. 6.
    Lu, Y., Zhang, Q., Wei, B.: Real-time CPU based H.265/HEVC encoding solution with x86 platform technology. In: 2015 International Conference on Computing, Networking and Communications (ICNC), pp. 418–421 (2015). doi: 10.1109/ICCNC.2015.7069380
  7. 7.
    Saponara, S., Martina, M., Casula, M., Fanucci, L., Masera, G.: Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding. Microprocess. Microsyst. Embed. Hardw. Des. 34(7–8), 316–328 (2010). doi: 10.1016/j.micpro.2010.06.003 CrossRefGoogle Scholar
  8. 8.
    Mei-Hua, X., Yu-Lan, C., Feng, R., Zhang-Jin, C.: Optimizing design and FPGA implementation for CABAC decoder. In: 2007 International Symposium on High Density packaging and Microsystem Integration, pp. 1–5 (2007). doi: 10.1109/HDP.2007.4283645
  9. 9.
    Nunez, J.L., Chouliaras, V.A.: High-performance arithmetic coding VLSI macro for the H264 video compression standard. IEEE Trans. Consum. Electron. 51(1), 144–151 (2005). doi: 10.1109/TCE.2005.1405712 CrossRefGoogle Scholar
  10. 10.
    Yang, Y.C., Guo, J.I.: High-throughput H.264/AVC high-profile CABAC decoder for HDTV applications. IEEE Trans. Circuits Syst. Video Technol. 19(9), 1395–1399 (2009). doi: 10.1109/TCSVT.2009.2020340 CrossRefGoogle Scholar
  11. 11.
    Sze, V., Chandrakasan, A.P.: Joint algorithm-architecture optimization of CABAC. J. Signal Process. Syst. 69(3), 239–252 (2012). doi: 10.1007/s11265-012-0678-2 CrossRefGoogle Scholar
  12. 12.
    Liao, T.T., Shen, C.A., Tseng, Y.H.: The algorithm and VLSI architecture of a high efficient motion estimation with adaptive search range for HEVC systems. J. Real-Time Image Process. (2017). doi: 10.1007/s11554-017-0697-0 CrossRefGoogle Scholar
  13. 13.
    Lung, C.Y., Shen, C.A.: Design and implementation of a highly efficient fractional motion estimation for the HEVC encoder. J. Real-Time Image Process. (2016). doi: 10.1007/s11554-016-0663-2 CrossRefGoogle Scholar
  14. 14.
    Varma, K.C.R.C., Kumar, M.V.P., Mahapatra, S.: Search range reduction for uni-prediction and bi-prediction in HEVC. J. Real-Time Image Process. (2016). doi: 10.1007/s11554-016-0636-5 CrossRefGoogle Scholar
  15. 15.
    Sze, V., Budagavi, M.: Parallelization of cabac transform coefficient coding for hevc. In: 2012 Picture Coding Symposium, pp. 509–512 (2012). doi: 10.1109/PCS.2012.6213266
  16. 16.
    Ono, F., Rucklidge, W., Arps, R., Constantinescu, C.: JBIG2—the ultimate bi-level image coding standard. In: ICIP, pp. 140–143 (2000).
  17. 17.
    Wallace, G.K.: The JPEG still picture compression standard. Commun. ACM 34(4), 30–44 (1991)CrossRefGoogle Scholar
  18. 18.
    Weinberger, M.J., Seroussi, G., Sapiro, G.: The LOCO-I lossless image compression algorithm: principles and standardization into JPEG-LS. IEEE Trans. Image Process. 9(8), 1309–1324 (2000)CrossRefGoogle Scholar
  19. 19.
    Singh, S., Bhasin, A., Saha, K.: Parallelization of variable length decoding. (2013). US Patent 8,520,958
  20. 20.
    Korodi, G., He, D., Yang, E., Martin-Cocher, G.: Methods and devices for load balancing in parallel entropy coding and decoding. (2014). US Patent 8,730,071
  21. 21.
    Ebrahimi, T., Horne, C.: MPEG-4 natural video coding—an overview. In: Signal Processing: Image Communication, vol. 14. Elsevier, Amsterdam, Netherlands, pp. 365–385 (2000)Google Scholar
  22. 22.
    ITU: ISO/IEC 10918-1: 1993(E) CCIT Recommendation T.81. (1993)
  23. 23.
    Moussalli, R., Najjar, W.A., Luo, X., Khan, A.: A high throughput no-stall Golomb-rice hardware decoder. In: 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2013, Seattle, WA, USA, 28–30 April 2013, pp. 65–72. IEEE Computer Society (2013). doi: 10.1109/FCCM.2013.9
  24. 24.
    Altera: White paper: video and image processing design using fpgas systems. Tech. Rep. WP-VIDEO0306-1.1, Altera Corporation (2007)Google Scholar
  25. 25.
    Bailey, D.: Design for Embedded Image Processing on FPGAs. Wiley, New York (2011). CrossRefGoogle Scholar
  26. 26.
    Baroud, Y., Lê, N., Wang, Z., Kieß, S., Najmabadi, S.M., Simon, S.: A parallel codec architecture for marker-free variable length code streams. In: Proceedings of the 10th HiPEAC Workshop on Reconfigurable Computing (WRC) (2016)Google Scholar
  27. 27.
    Baroud, Y., Velarde, J.M.M., Simon, S.: Architecture for parallelizing decoding of marker-free variable length code streams. In: 2016 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), pp. 270–275 (2016). doi: 10.1109/SPA.2016.7763626
  28. 28.
    Fimoff, M., Laud, T., Lee, R.: Method of processing variable size blocks of data by storing numbers representing size of data blocks in a fifo. (2010). US Patent RE41,569
  29. 29.
    Kwon, O.: Apparatus for parallel encoding/decoding of digital video signals. (1996). EP Patent App. EP19,940,120,951
  30. 30.
    Lei, S., Sun, M.T.: An entropy coding system for digital hdtv applications. IEEE Trans. Circuits Syst. Video Technol. 1(1), 147–155 (1991). CrossRefGoogle Scholar
  31. 31.
    Boliek, M., Allen, J.D., Schwarz, E.L., Gormish, M.J.: Very high speed entropy coding. In: ICIP, vol. 3 (1994)Google Scholar
  32. 32.
    Lin, H.D., Messerschmitt, D.: Designing a high-throughput VLC decoder. I. Parallel decoding methods. IEEE Trans. Circuits Syst. Video Technol. 2(2), 197–206 (1992). doi: 10.1109/76.143419 CrossRefGoogle Scholar
  33. 33.
    Sevcenco, A.M., Lu, W.S.: Adaptive down-scaling techniques for JPEG-based low bit-rate image coding. In: 2006 IEEE International Symposium on Signal Processing and Information Technology, pp. 349–354 (2006). doi: 10.1109/ISSPIT.2006.270824
  34. 34.
    Lin, W., Dong, L.: Adaptive downsampling to improve image compression at low bit rates. IEEE Trans. Image Process. 15(9), 2513–2521 (2006). doi: 10.1109/TIP.2006.877415 CrossRefGoogle Scholar
  35. 35.
    Ahangar, A.I., Agarwal, R., Lakhotia, K.: Real time low complexity VLSI decoder for prefix coded images. In: 2016 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1694–1697 (2016). doi: 10.1109/ISCAS.2016.7538893
  36. 36.
    Lee, E.S., Lee, K.C., Son, K.J., Moon, S.P., Chang, T.G.: Multi-symbol accessing Huffman decoding method for MPEG-2 AAC. J. Electr. Eng. Technol. 4(4) (2014). doi: 10.5370/JEET.2014.9.4.1411 CrossRefGoogle Scholar
  37. 37.
    Nikara, J., Vassiliadis, S., Takala, J., Sima, M., Liuha, P.: Parallel multiple-symbol variable-length decoding. In: Werner, B. (ed.) IEEE International Conference on Computer Design, pp. 126–131. IEEE Computer Society Press, 10662 Los Vaqueros Circle, P.O. Box 3014, Los Alamitos, CA 90720-1314, Freiburg, Germany (2002). ISBN: 0-7695-1700-5Google Scholar
  38. 38.
    Howard, P.G., Vitter, J.S.: Fast and efficient lossless image compression. In: Proceedings of the 1993 Data Compression Conference, (Snowbird), pp. 351–360 (1993)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  • Yousef Baroud
    • 1
    Email author
  • José Manuel Mariños Velarde
    • 1
  • Zhe Wang
    • 1
  • Steffen Kieß
    • 1
  • Seyyed Mahdi Najmabadi
    • 1
  • Jajnabalkya Guhathakurta
    • 1
  • Sven Simon
    • 1
  1. 1.Institut für Parallele und Verteilte SystemeUniversity of StuttgartStuttgartGermany

Personalised recommendations