# Error-resilient video coding with end-to-end rate-distortion optimized at macroblock level

## Abstract

Intra macroblock refreshment is an effective approach for error-resilient video coding. In this paper, in addition to intra coding, we propose to add two macroblock coding modes to enhance the transmission robustness of the coded bitstream, which are inter coding with redundant macroblock and intra coding with redundant macroblock. The selection of coding modes and the parameters for coding the redundant version of the macroblock are determined by the rate-distortion optimization. It is worth mentioning that the end-to-end distortion is employed in the optimization procedure, which considers the channel conditions. Extensive simulation results show that the proposed approach outperforms other error-resilient approaches significantly; for some video sequences, the average PSNR can be up to 4 dB higher than that of the Optimal Intra Refreshment approach.

### Keywords

H.264/AVC error resilience end-to-end distortion intra refreshment redundant coding## I. Introduction

The H.264/AVC [1] video coding standard provides higher coding efficiency and stronger network adaptation capability in comparison with all the previously developed video coding standards. However, as previous video compression standards, it is based on a hybrid coding method, which uses transform coding with Motion-Compensated Prediction (MCP). Therefore, when the hybrid-coded video bit-stream is transmitted over packet loss networks, it suffers from error propagations and this leads to the well-known drifting phenomenon [2, 3].

Due to the unreliable underlying networks, the development of error-resilient techniques is a crucial requirement for video communication over lossy networks. For applications that can tolerate long delay, channel-coding techniques, like Forward Error Correction (FEC), provide very significant reductions of transmission errors at a comparably moderate bitrate overhead. For the real-time applications, however, the effective use of FEC and re-transmission is limited. Here, the use of error resilience techniques in the source codec becomes important. Two categories of source coding approaches are promising. One category is based on intra macroblock refreshment, and another one is redundant coding.

The intra macroblock refreshment approach is standard compatible, and it is a useful tool to combat network packet losses. It can be employed to weaken the inter picture dependency due to inter prediction, and eventually, cut-off the error propagations. The early intra macroblock refreshment algorithms are based on randomly inserting intra macroblocks [4] or periodically inserting intra contiguous macroblocks [5]. However, in both [4] and [5], the intra refresh frequency is determined in a heuristic way, and as the intra coding mode is costly, the trade-off between code efficiency and error resiliency need to be balanced. Zhang et al. [6] first treated this problem as optimal coding mode selection of macroblocks and proposed the well-known Recursive Optimal Per-pixel Estimate (ROPE) approach to determine where to insert intra macroblock. In [6], the expected end-to-end distortion for each pixel is calculated in recursive way, and then in the mode selection step, the expected end-to-end distortion is used in the rate-distortion optimization process. In [7], another flexible intra macroblock update algorithm was investigated to optimize the expected rate-distortion performance. In this approach, the end-to-end distortion is calculated by emulating the real channel behavior; therefore, the computation overhead is tremendous. The work in [6, 7] is loss-aware end-to-end rate-distortion optimized intra macroblock refreshment algorithm, which is currently the best known way for determining both the correct number and placement of intra macroblocks for error resilience.

Redundant coding is another effective tool for robust video communication over lossy network. In [8], an optimal algorithm is presented to determined whether one picture needs redundant version. In [9], redundant slice is optimally allocated based on the slice position in the GOP, and the primary and redundant slices are then interleaved to generate two equal importance descriptions using the MDC [10] diagram. Whereas in [11], the two descriptions are generated by splitting the video pictures into two threads, and then redundant pictures are periodically inserted into the two threads. In both [8] and [11], redundant coding are optimized in frame level, namely all the macroblocks in one frame is encoded with the same redundant coding parameters, whereas for [9], redundant information is allocated in slice level. In [12], redundant coding is optimized in macroblock level. However, in order to optimally tune the redundancy, this approach needs all the motion vector information in one GOP, which leads to a delay of one GOP; consequently, this work cannot be applied in real-time applications, such as video conference.

Intra macroblock refreshment can stop errors in the previous frames, while redundant coding is a way of preventing errors in the future frames. In order to take advantage of the two approaches, we propose to add two new encoding modes, namely inter coding with redundant macroblock and intra coding with redundant macroblock, in addition to the conventional intra and inter coding modes. This approach is called Hybrid Redundant Macroblock and Intra macroblock Refreshment (HRMIR). The redundant version macroblock is encoded with lower quality and rate, which is implemented by scaling the quantization parameter (QP). The selection of coding modes and the parameters for coding the redundant version of the macroblock are determined by the rate-distortion optimization procedure. It is worth noticing, the loss-aware end-to-end expected distortion is used for the RD optimization, and the end-to-end distortion is calculated with the ROPE [6] method. Since calculating the end-to-end distortion with the ROPE method causes no additional delay, the proposed approach is suitable for real-time applications.

The rest of the paper is organized as follows. In Section II, the method to calculate the loss-aware end-to-end distortion is presented. In Section III, the proposed HRMIR approach is introduced. In Section IV, extensive simulation results are given, which validate our approach. Finally, some conclusions are drawn in Section V.

## II. End-to-end distortion calculation

where *λ*_{mode} is the Lagrange multiplier, *D*_{MB} and *R*_{MB} are the encoding distortion and the bitrate in different encoding modes, respectively. This optimization mode is tailored for error-free environment, and no channel packet loss is considered here.

However, when the compressed video is transmitted over error-prone network, in addition to the distortion caused by source coding, there is channel distortion, which is caused by packet loss of the underlying network. Loss-aware end-to-end distortion, which encompasses both of the two categories distortion, is used in the proposed HRMIR approach to make better RD optimization. There are many methods to calculate the end-to-end distortion, in ROPE [6], end-to-end distortion for each pixel is calculated in recursive way. Recent advances in ROPE further expand its capability to accommodate sub-pixel prediction [14] and burst packet loss [15]. In [16], a block-based approach generates and recursively updates a block-level distortion map for each frame; therefore, the end-to-end distortion is calculated in block-level. Besides calculating end-to-end in the pixel domain, compressed-domain methods are introduced in [17]. It is important to note that, for the sake of complexity reduction, we apply ROPE [6] with full-pixel level accuracy in our HRMIR approach. For the sub-pixel version ROPE method [14], the computation of the second moment needs a large amount of storage capacity and computational power, which renders the whole process utterly formidable. Furthermore, constrained intra prediction is applied, so there is no error propagation in the intra prediction.

*i*in frame

*n*, and let ${\widehat{f}}_{n}^{i}$ and ${\stackrel{\u0303}{f}}_{n}^{i}$ denote its encoder and decoder reconstruction, respectively. Because of possible packet loss in the channel, ${\stackrel{\u0303}{f}}_{n}^{i}$ can be modeled at the encoder side as a random variable. In the ROPE approach, the

*D*

_{MB}is redefined as the overall expected decoder distortion in one macroblock.

The overall expected mean-squared-error (MSE) distortion of a pixel is ${d}_{n}^{i}$; obviously, it is determined by the first and second moments of the decoder reconstruction. ROPE provides an optimal recursive algorithm to accurately calculate the two moments for each pixel in a frame.

Let us assume that packet loss events are independent for simplicity, and the packet loss rate (PLR) *p* is available at the encoder, usually the encoder can get the statistics of packet loss through RTCP [18]. To make it more general, we will not impose any limitations on the slice shape and size, so the motion vectors from neighboring macroblocks are not always available in the error concealment stage. Therefore, the decoder may not be able to use motion vector from neighboring macroblocks for concealment. Accordingly, we assume the decoder copies reconstructed pixels from the previous frame for concealment. The prediction at the encoder only employs the previous reconstructed frame. The recursive formulate of ROPE is as follows.

- Pixel in the intra macroblock$E\left\{{\stackrel{\u0303}{f}}_{n}^{i}\right\}=\left(1-p\right){\widehat{f}}_{n}^{i}+pE\left\{{\stackrel{\u0303}{f}}_{n-1}^{i}\right\}$(4)$E\left\{{\left({\stackrel{\u0303}{f}}_{n}^{i}\right)}^{2}\right\}=\left(1-p\right){\left({\widehat{f}}_{n}^{i}\right)}^{2}+pE\left\{{\left({\stackrel{\u0303}{f}}_{n-1}^{i}\right)}^{2}\right\}$(5)
- Pixel in the inter macroblock$\begin{array}{c}E\left\{{\stackrel{\u0303}{f}}_{n}^{i}\right\}=\left(1-p\right)\left({\xea}_{n}^{i}+E\left\{{\stackrel{\u0303}{f}}_{n-1}^{i+mv}\right\}\right)\\ \phantom{\rule{2em}{0ex}}\phantom{\rule{2em}{0ex}}\phantom{\rule{1em}{0ex}}+pE\left\{{\stackrel{\u0303}{f}}_{n-1}^{i}\right\}\end{array}$(6)$\begin{array}{c}E\left\{{\left({\stackrel{\u0303}{f}}_{n}^{i}\right)}^{2}\right\}=\left(1-p\right)\left({\left({\xea}_{n}^{i}\right)}^{2}+2{\xea}_{n}^{i}E\left\{{\stackrel{\u0303}{f}}_{n-1}^{i+mv}\right\}\right.\\ \phantom{\rule{1em}{0ex}}\left.+E\left\{{\left({\stackrel{\u0303}{f}}_{n-1}^{i+mv}\right)}^{2}\right\}\right)\\ \phantom{\rule{1em}{0ex}}+pE\left\{{\left({\stackrel{\u0303}{f}}_{n-1}^{i}\right)}^{2}\right\}\end{array}$(7)

where inter coded pixel *i* is predicted from pixel *i* + *mv* in the previous frame. The prediction residual ${e}_{n}^{i}$ is quantized to ${\xea}_{n}^{i}$.

## III. The proposed HRMIR approach

### A. The HRMIR rate-distortion optimization

*O** for current macroblock, so that the Lagrangian cost function is minimized.

where *D*_{MB}(*o*) is the expected end-to-end distortion for mode *o*, *R*_{MB}(*o*) is the rate for this mode and *λ*_{mode} is the Lagrangian multiplier. Γ_{HRMIR} is a set of encoding options, which includes all encoding modes. For the original ROPE approach, the available encoding modes includes intra mode *I* and inter mode *P*, so Γ_{ROPE} = {*I*, *P*}. However, in our HRMIR approach, there are two new modes. They are intra mode with redundant version macroblock and inter mode with redundant version macroblock. For simplicity, let us use ${I}_{r}^{u}$ and ${P}_{r}^{v}$ to denote the two new modes, respectively, with *r* standing for redundant coding, *u* representing the candidate QP value in the intra redundant coding and *v* representing the candidate QP value in the inter redundant coding. Therefore, for the HRMIR approach, the set of encoding options become ${\Gamma}_{\mathsf{\text{HRMIR}}}=\left\{I,\phantom{\rule{2.77695pt}{0ex}}P,\phantom{\rule{2.77695pt}{0ex}}{I}_{r}^{u},\phantom{\rule{2.77695pt}{0ex}}{P}_{r}^{v}\right\}$. In general, the QP value of redundant coding is larger than that of primary coding. Let us use *QP*_{ I } and *QP*_{ P } to denote the primary QP value of intra and inter coding, respectively. In the redundant coding, candidate QP value is *u* ∈ {*u*|*QP*_{ I } ≤ *u* ≤ 51} and *v* ∈ {*v*|*QP*_{ P } ≤ *v* ≤ 51}, where 51 is the maximum QP value in H.264/AVC [1].

### B. The HRMIR end-to-end distortion and rate

where in the primary coding ${f}_{n}^{i}$ is quantized to ${\widehat{f}}_{n}^{i}$, and in the redundant coding, it is quantized to ${\widehat{f}}_{n}^{i,u}$, here *u* is the redundant QP value.

where in the primary coding, pixel *i* is predicted from pixel *i* + *mv* in the previous frame, the prediction residual ${e}_{n}^{i}$ is quantized to ${\xea}_{n}^{i}$. In the redundant coding, the redundant QP value is *v*, pixel *i* is predicted from pixel *i* + *mv*(*v*) in the previous frame, the prediction residual ${e}_{n}^{i}$ is quantized to ${\xea}_{n}^{i,v}$.

For those intra and inter macroblocks with redundant coding, the probability of receiving the primary macroblock is 1 - *p*. The probability of receiving the redundant macroblock while losing the primary information is *p*(1 - *p*), and the probability of losing both the primary and redundant macroblocks is *p*^{2}. With all those probabilities, we can easily get Equations 9, 10, 11, 12 for macroblock with redundant version. It is important to note that when the macroblock is encoded with redundant version, namely $0\in \left\{{I}_{r}^{u},\phantom{\rule{2.77695pt}{0ex}}{P}_{r}^{v}\right\}$, the total bit rate *R*_{MB}(*o*) is calculated by summing up the bit rate used for both primary and redundant coding.

### C. Lagrange multiplier selection

_{mode}in (8) controls the rate-distortion trade-off. For the error-prone environment, extensive experimental evidence suggests that there is no significant performance difference between using the Lagrange multiplier tailored to the error-free or the error-prone environment. This argument has also been confirmed in [7]. So λ

_{mode}is set as the one tailored to error-free environment.

where *QP* is the quantization parameter.

### D. Computation complexity reduction

In the HRMIR rate-distortion optimization procedure, in order to find the optimal QP value for redundant coding, we need to calculate the rate-distortion cost for all possible redundant QP value; therefore, the computation complexity is tremendous. For example, let us assume the primary QP value is 22, in the RDO procedure described in Section III-A, the encoding options are ${\Gamma}_{\mathsf{\text{HRMIR}}}=\left\{I,\phantom{\rule{2.77695pt}{0ex}}P,\phantom{\rule{2.77695pt}{0ex}}{I}_{r}^{u},\phantom{\rule{2.77695pt}{0ex}}{P}_{r}^{v}\right\}$, then both ${I}_{r}^{u}$ and ${P}_{r}^{v}$ have (51 - 22 + 1) possible redundant QP values, here 51 is the maximum QP value in H.264/AVC. Therefore, Γ_{HRMIR} includes 62 encoding options (both ${I}_{r}^{u}$ and ${P}_{r}^{v}$ have 30 QP values plus intra/inter coding without redundant version).

By lowing the number of encoding options, the computation complexity will be reduced. Let us set the redundant QP increase step as *QP*_{ step } , then the candidate QP value would be *u* ∈ {*u*|*u* = *QP*_{ I } + *K* × *QP*_{ step } , *u* ≤ 51, *K* = 0, 1, 2,....} and *v* ∈ {*v*|*v* = *QP*_{ P } + *K* × *QP*_{ step } , *v* ≤ 51, *K* = 0, 1, 2,....}.

*QP*

_{ step }is set as 5 and 10, the PSNR is lower than that when the

*QP*

_{ step }is 1. However, the PSNR decrease is very limited. The computation overhead for the

*QP*

_{ step }= 5 case is nearly 1/5 of that for the

*QP*

_{ step }= 1 case, but the resulting decrease of PSNR is less than 0.3 dB. Even when the

*QP*

_{ step }value is set to 10, the PSNR penalty is less than 0.5 dB. The indication of this property of HRMIR is significant, which means it is possible to deploy this approach in hand-device, where the computation resource is limited, by setting relatively large

*QP*

_{ step }value.

## IV. Simulation result

Our simulation setting builds on the JM9.4 H.264 codec [19]. We use constrained intra prediction and CABAC for entropy coding, and fixed QP value is used for all of our simulations. One row of macroblocks per slice is used to create slices. For each sequence, only the first frame is coded as I-frame, and the rest are coded as P-frames; the reference frame number is 1. In order to have fair comparison with the Optimal Intra approach [6], it is assumed that the I-frame is transmitted over secure channel. We use the average luminance PSNR to assess the objective video quality; the mean squared error (mse) is averaged over 200 trials, then the value of PSNR is calculated based on the averaged mse. A random packet loss generator is used to drop the packets according to the required packet loss rate. For the lost slices, temporal replacement concealment is used, which means the pixel value of lost slice is copied from the same position in the previous frame. To evaluate the proposed HRMIR approach, extensive experiments have been conducted, and as benchmark, we use conventional Optimal Intra Refreshment [6] and RS-MDC [9] for comparison.

Percentage of intra macroblocks for HRMIR and Optimal Intra, QP is 28, first 50 frames are used, PLR is set to 3, 5, 10 and 20%

Video | Approach | 3% | 5% | 10% | 20% |
---|---|---|---|---|---|

Foreman | HRMIR | 0.71 | 1.02 | 2.14 | 5.87 |

Optimal Intra | 13.86 | 20.31 | 33.18 | 48.01 | |

Bus | HRMIR | 2.04 | 3.66 | 9.38 | 25.61 |

Optimal Intra | 53.41 | 64.91 | 78.07 | 89.49 | |

Mobile | HRMIR | 0.55 | 0.99 | 3.04 | 9.59 |

Optimal Intra | 26.53 | 41.27 | 66.72 | 84.69 |

*i.i.d*. random packet loss model, we also use burst loss model for simulation, and as indicated in [20], we set the average burst length as two. In Figure 10, the PSNR versus bitrate curves in burst loss environments are plotted. The results are similar with that in the

*i.i.d*. case, and the proposed HRMIR approach can provide best video quality among the three approaches. The error-resilient performance of proposed HRMIR approach is robust on different error distribution models.

## V. Conclusions

In this paper, a novel Hybrid Redundant Macroblock and Intra macroblock Refreshment approach has been proposed to combat packet loss. In the proposed approach, redundant coding and/or intra coding are optimally allocated in macroblock level. Whether to use redundant coding and/or intra coding and the quantization parameter of the redundant coding is all determined in the end-to-end rate-distortion optimization procedure. It is worth mentioning that, in the proposed approach, only information from the previously encoded frames is used to calculate the end-to-end distortion in the RDO process; therefore, no additional delay is caused, making the proposed approach suitable for real-time applications such as video conference. Extensive experimental results show that the proposed method provides better performance than other error-resilient source coding approaches. The performance gap between the proposed approach and the Optimal Intra Refreshment is huge, and in some simulation environments, the proposed approach can provide 4 dB higher PSNR than the conventional Optimal Intra Refreshment with the same bitrate. Our future work is to calculate the end-to-end distortion in sub-pixel accuracy; therefore, more accurate end-to-end distortion would be available, which would eventually lead to better resource allocation.

## Notes

### VII. Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 60972085, No. 60903066), the Sino-Singapore JRP (No. 2010DFA11010) and National Science Foundation of China for Distinguished Young Scholars (No. 61025013).

## Supplementary material

### References

- 1.Wiegand T, Sullivan GJ, Bjøntegaard G, Luthra A:
**Overview of the H.264/AVC video coding standard.***IEEE Trans Circuits Syst Video Technol*2003,**13**(7):560-576.CrossRefGoogle Scholar - 2.Wenger S:
**H.264/AVC over IP.***IEEE Trans Circuits Syst Video Technol CB*2003, (7):645-656.Google Scholar - 3.Stockhammer T, Hannuksela MM, Wiegand T:
**H.264/AVC in wireless environments.***IEEE Trans Circuits Syst Video Technol*2003,**13**(7):657-673. 10.1109/TCSVT.2003.815167CrossRefGoogle Scholar - 4.Cote G, Kossentini F:
**Optimal intra coding of blocks for robust video communication over the internet.***Signal Process Image commun*1999,**15:**25-34. 10.1016/S0923-5965(99)00022-3CrossRefGoogle Scholar - 5.Zhu QF, Kerofsky L:
**Joint source coding, transport processing and error concealment for H.323-based packet video.**In*Proceedings of the SPIE, VCIP 99*.*Volume 3653*. San Jose, CA; 1999:52-62.Google Scholar - 6.Zhang R, Regunathan SL, Rose K:
**Video coding with optimal inter/intra-mode switching for packet loss resilience.***IEEE J Sel Areas Commun*2000,**18**(6):966-976. 10.1109/49.848250CrossRefGoogle Scholar - 7.Stockhammer T, Kontopodis D, Wiegand T:
**Rate-distortion optimization for JVT/H.26L coding in packet loss environment.**In*Proceedings of Packet Video Workshop 2002*. Pittsburgh, PA; 2002.Google Scholar - 8.Zhu CB, Wang YK, Hannuksela MM, Li HQ:
**Error resilient video coding using redundant pictures.***IEEE Trans Circuits Syst Video Technol*2009,**19**(1):3-14.CrossRefGoogle Scholar - 9.Tillo T, Grangetto M, Olmo M:
**Redundant slice optimal allocation for H.264 multiple description coding.***IEEE Trans Circuits Syst Video Technol*2008,**18**(1):59-70.CrossRefGoogle Scholar - 10.Wang Y, Lin SA:
**Error-resilient video coding using multiple description motion compensation.***IEEE Trans Circuits Syst Video Technol*2002,**12**(6):438-452. 10.1109/TCSVT.2002.800320CrossRefGoogle Scholar - 11.Radulovic I, Frossard P, Wang YK, Hannuksela M, Hallapuro A:
**Multiple description video coding with H.264/AVC redundant pictures.***IEEE Trans Circuits Syst Video Technol*2010,**20**(1):144-148.CrossRefGoogle Scholar - 12.Lin CY, Tillo T, Zhao Y, Jeon B:
**Multiple description coding for H.264/AVC with redundancy allocation at macro block level.***IEEE Trans Circuits Syst Video Technol*2011,**21**(5):589-600.CrossRefGoogle Scholar - 13.Sullivan GJ, Wiegand T:
**Rate-distortion optimization for video compression.***IEEE Signal Process Mag*1998,**15**(6):74-90. 10.1109/79.733497CrossRefGoogle Scholar - 14.Yang H, Rose K:
**Advances in recursive per-pixel end-to-end distortion estimation for robust video coding in H.264/AVC.***IEEE Trans Circuits Syst Video Technol*2007,**17**(7):845-856.CrossRefGoogle Scholar - 15.Liao Y, Gibson JD:
**Enhanced error resilience of video communications for burst losses Using an extended ROPE algorithm.**In*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*. Taipei, Taiwan; 2009:1853-1856.Google Scholar - 16.Zhang Y, Gao W, Lu Y, Huang Q, Zhao D:
**Joint source-channel rate-distortion optimization for H.264 video coding over error-prone Networks.***IEEE Trans Multimedia*2007,**9**(3):445-454.CrossRefGoogle Scholar - 17.Li F, Liu G:
**Compressed-domain-based transmission distortion modeling for precoded H.264/AVC video.***IEEE Trans Circuits Syst Video Technol*2009,**19**(20):1908-1914.Google Scholar - 18.Schulzrinne H, Casner S, Frederick R, Jacobson V:
**RTP: a transport protocol for real-time applications.***Internet Engineering Task Force--RFC 1889*1996.Google Scholar - 19.
**H.264/AVC JM Reference Software [Online]**[http://iphome.hhi.de/suehring/tml/download] - 20.Loguinov D, Radha H:
**End-to-end internet video traffic dynamics: statistical study and analysis.***Proceedings of IEEE INFOCOM '02*2002, 723-732.Google Scholar - 21.Liang YJ, Apostolopoulos JG, Girod B:
**Analysis of packet loss for compressed video: effect of burst losses and correlation between error frames.***IEEE Trans Circuits Syst Video Technol*2008,**18**(7):861-874.CrossRefGoogle Scholar - 22.Li ZC, Chakareski J, Niu XD, Zhang YJ, Gu WY:
**Modeling and analysis of distortion caused by Markov-Model burst packet losses in video transmission.***IEEE Trans Circuits Syst Video Technol*2009,**19**(7):917-931.CrossRefGoogle Scholar

## Copyright information

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.