Advanced
Novel Rate Control Scheme for Low Delay Video Coding of HEVC
Novel Rate Control Scheme for Low Delay Video Coding of HEVC
ETRI Journal. 2016. Mar, 38(1): 185-194
Copyright © 2016, Electronics and Telecommunications Research Institute (ETRI)
  • Received : March 25, 2014
  • Accepted : September 10, 2015
  • Published : March 01, 2016
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Wei Wu
Jiong Liu
Lei Feng

Abstract
In this paper, a novel rate control scheme for low delay video coding of High Efficiency Video Coding (HEVC) is proposed. The proposed scheme is developed by considering a new temporal prediction structure of HEVC. In the proposed scheme, the relationship between bit rate and quantization step is exploited firstly to formulate an accurate quadratic rate-quantization (R-Q) model. Secondly, a method of determining the quantization parameters (QPs) for the first frames within a group of pictures is proposed. Thirdly, an accurate frame-level bit allocation method is proposed for HEVC. Finally, based on the proposed R-Q model and the target bit allocated for the frame, the QPs are predicted for coding tree units by using rate-distortion (R-D) optimization. We compare our scheme against that of three other state-of-the-art rate control schemes. Experimental results show that the proposed rate control scheme can increase the Bjøntegaard delta peak signal-to-noise ratio by 0.65 dB and 0.09 dB on average compared with the JCTVC-I0094 and JCTVC-M0036 schemes, respectively, both of which have been implemented in an HEVC test model encoder; furthermore, the proposed scheme achieves a similar R-D performance to Wang’s scheme, as well as obtaining the smallest bit rate mismatch error of all the schemes.
Keywords
I. Introduction
To meet the rapid increasing demand of video content, a new video coding standard called “High Efficiency Video Coding (HEVC)” [1] was established by the Joint Collaborative Team on Video Coding (JCT-VC) in January 2013. In contrast to previous video coding standards, HEVC not only employs a flexible quad-tree coding block partitioning structure but also improved intra-prediction and coding and adaptive motion parameter prediction and coding, both of which significantly improve the coding efficiency. Apart from the coding efficiency, rate control is also an important issue in video services [2] , particularly for real-time video communications. The objective of rate control is to achieve good video quality by adjusting encoding parameters to prevent a buffer from overflowing and underflowing under the constraint of transmission bandwidth.
Although rate control is not a normative part of any video coding standard, every video coding standard has its own recommendation on rate control for informative purposes [3] , such as reference model [4] for H.261, adaptive quantization algorithm [5] for MPEG-1, Test Model 5 [6] for MPEG-2, Test Model Near-term 8 [7] for H.263, Verification Model 18 [8] for MPEG-4, and JVT-W042 [9] , developed based on JVT-G012 [10] , for H.264/AVC. In addition to JVT-W042 adopted in Joint Model, many rate control schemes [11] [15] have also been designed for H.264/AVC.
In HEVC, a flexible quad-tree coding block partitioning structure is adopted that enables the efficient use of multiple sizes of coding units (CUs), prediction units (PUs), and transform units (TUs). The CU, PU, and TU define the regions sharing the same prediction mode, the same prediction information, and the same transformation. These new features differentiate the rate control method for HEVC from those adopted by the previous video coding standard.
For HEVC, several rate control schemes have been proposed in [16] [18] and [19] . A rate control scheme for HEVC and its improvement have been proposed in [16] and [19] , respectively, and are founded on a pixel-based unified rate quantization model. This scheme allocates the target bit for a group of picture (GOP), a frame, and a coding tree unit (CTU), respectively, and then predicts the quantization parameter (QP) value for a CTU. However, the achieved rate-distortion (R-D) performance is not perfect. In [17] , a rate control scheme based on an R- λ model is proposed for HEVC. The scheme adopts a bit allocation method to estimate the target bit for a CTU. It then uses the R- λ model and the target bit to compute a value of λ , and finally determines the QP value according to the relationship between QP and λ [20] . In [18] a ρ -domain Rate-GOP-based frame-level rate control scheme is proposed for HEVC. In this scheme, a reference picture set–based hierarchical rate control structure is designed, and then the distortion and rate of the coding frame are represented by the distortion and rate of its reference frame; finally, based on the models, the QPs of the frames are predicted. In the above three schemes, three rate models are used, including q -domain, λ -domain, and ρ -domain, respectively. The λ -domain and ρ -domain rate control schemes in [17] and [18] have better R-D performances than the q -domain scheme in [19] .
In this paper, a new q -domain rate control scheme is proposed for HEVC. The proposed scheme is developed with consideration for a new temporal prediction structure adopted into HEVC. The main contributions of the proposed scheme can be summarized as follows. Firstly, the relationship between bit rate and quantization step (QS) for frames is exploited to propose an accurate rate model for HEVC. Secondly, according to the temporal prediction structure in HEVC, a method of determining the QPs for the first frames within GOPs is proposed. Thirdly, an accurate frame-level bit allocation for HEVC is proposed. Finally, based on the proposed rate model and bit allocation, a method for optimizing the R-D performance is used to predict QPs for CTUs.
An HEVC video encoder works with three kinds of temporal prediction structure — intra-only configuration, low delay configuration, and random access configuration [21] — resulting in different rate controls for different structures. Of the three aforementioned temporal prediction structures, low delay configuration is designed for low-delay video coding, which can be widely used in real-time video communications. Thus, in this paper, a novel rate control scheme is developed particularly for the low delay configuration.
II. Rate-Quantization (R-Q) Model in HEVC
- 1. Low Delay Configuration
Figure 1 shows a graphical presentation of the low delay configuration [21] . The number associated with each frame represents the encoding order. The frame with an index of 0 is an instantaneous decoding refresh (IDR) frame; all subsequent frames are encoded to be interframes. Every group of four successive frames following on from the IDR constitutes a GOP. The interframes are divided into three layers as shown in the figure; in total, there is one frame in layer 1, one frame in layer 2, and two frames in layer 3 within every GOP.
PPT Slide
Lager Image
Graphical presentation of low delay configuration.
- 2. Proposed R-Q Model
In video coding, QS is used to compress the discrete cosine transform coefficients of prediction residual. Upon compression, a corresponding texture bit is obtained; the number associated with the texture bit changes with the QS value. Usually, the larger the QS value is, the lower the number associated with the texture bit becomes. Besides a texture bit, a non-texture bit is also included in the total bit required for encoding a current CTU. An R-Q model is often adopted in rate control to represent the relationship between the texture bit or the total bit and QS or QP. Before the proposed R-Q model is introduced, the difference between QP and QS is described below. In scalar quantization, QS is the actual step size used by a quantizer, while QP indicates the index of QS. There is a nonlinear relationship between QP and QS in HEVC.
(1) QS= 2 QP/6 ×v (QP  mod6),
where v (0) = 0.625, v (1) = 0.703, v (2) = 0.797, v (3) = 0.891, v (4) = 1.000, and v (5) = 1.125.
An HEVC adopts new coding tools and a new temporal prediction structure in the low delay configuration, which affects the relationship between bit rate and QS. In Fig. 2 , the relationships among the total bit; the encoding complexity; the width and height of the area containing samples; and QS are illustrated for the frame in level 1, the frame in level 2, and the first frame in level 3 within the same GOP, respectively. From the figure, it can be seen that the relationship can be represented by a quadratic model and that the frames in different levels have different relationships.
PPT Slide
Lager Image
Relationships for frames in levels 1 and 2, and first frame in level 3 within same GOP: (a) second GOP in “BQSquare” and (b) third GOP in “BlowingBubbles.”
Therefore, in this paper, an accurate R-Q model is proposed for HEVC as follows:
(2) T total,l W⋅H = a l ⋅ m l QS + b l ⋅ m l Q S 2 ,
where l is the layer index and T total,l , ml , al , and bl are the total bit; the mean absolute difference between the original block and the prediction block (used to indicate the encoding complexity); and two model parameters for a CTU in the layer l , respectively. Moreover, W and H are the actual width and height of the area containing samples in the CTU, respectively.
In previous literatures, a number of R-Q models have been developed based on observations and analyses. In [22] , the source statistics are assumed to be Laplacian distributed and a well-known quadratic R-Q model is proposed as
(3) T texture = a⋅m QS + b⋅m Q S 2 ,
where T texture represents the texture bit; m indicates the complexity; and a and b are the model parameters. Another form of quadratic R-Q model [19] is proposed as
(4) T total N pixel = a⋅m QS + b⋅m Q S 2 ,
where N pixel is the number of pixels. Besides the two quadratic models, a linear relationship between T and 1/QS is denoted in [23] and [24] as
(5) T total = a⋅m QS +b.
Based on an analyses of some experimental results, both a quadratic model and a linear R-Q model using QP are proposed in [25] as
(6) T total = a⋅m QP + b⋅m Q P 2 ,
(7) T total = a⋅m QP +b. 
To evaluate the performance of the proposed R-Q model, extensive experiments are performed in this paper. Essentially, four test video sequences — BQSquare, RaceHorses, BasketballDrill, and BlowingBubbles — are encoded with QP values of 10 to 40 corresponding to QS values of 2 to 64. The texture bit, the non-texture bit, and the actual complexity values are obtained.
The accuracies of the R-Q models described above are specified by an F -statistic [22] , which is a measurement for aptness of the fit and is expressed as
(8) F= ∑ i ( Y i − Y ¯ ) 2 k−1 / ∑ i ( Y i − Y ^ i ) 2 n−k ,
where Yi corresponds to the i th data point,
Y ¯
is the mean of all data points,
Y ^ i
is the estimated value of the i th data point, n is the number of data points, and k is the number of model parameters. The larger the F ratio is, the more accurate the model is. The F ratio results of all the six models are shown in Table 1 .
Fratio values of six R-Q models.
(2) (3) (4) (5) (6) (7)
BQSquare 102.3 78.3 84.6 96.5 42.3 41.3
RaceHorses 88.2 55.3 74.0 84.0 15.4 71.0
BasketballDrill 441.5 331.8 405.2 16.9 145.8 15.6
BlowingBubbles 67.5 55.3 62.3 65.0 36.3 34.0
Average 174.9 130.2 156.5 65.6 59.9 40.5
From the results in the table, for all the four sequences, model (2) has the largest F -ratio values among the experimental models. Based on this observation, of the six R-Q models, model (2) is thus the more suitable for HEVC.
In (2), ml is predicted as follows:
(9) m l = c l × m l actu + d l ,
where
m l actu
is the actual complexity of the collocated CTU in the previous frame in the same layer, and cl and dl are two model parameters.
III. Proposed Rate Control Scheme
- 1. Method of Determining QPs for First Frames within GOPs
Many GOPs exist in a video sequence, and a GOP consists of several frames. In the traditional rate control schemes [10] , [19] , the QPs of the first frames within GOPs are usually defined by deterministic values, not rate control; thus, they may not be accurate enough to obtain good R-D performances.
In the low delay configuration, only the first frame within the first GOP in a video sequence is encoded as an IDR frame, and the other first frames within GOPs are inter-coded frames for which rate control can be used. Therefore, to achieve accurate rate control, in the proposed rate control scheme, the QPs of the first frames within all the GOPs except the first and second are computed using rate control. For the first frames within the first and second GOPs, to obtain the coding information used to predict the information for the subsequent frames, their QP values are set to be defined by deterministic values, which is the same as in previous rate control schemes.
- 2. GOP-Level Rate Control
The total number of bits in a GOP is managed in the GOP-level rate control. When the j th frame within the i th GOP is encoded, the bit used for the remaining frames in the GOP is calculated as
(10) B i,j  = { R i, j f × N GOP − V i, j j=1, B i, j−1  + R i, j  −  R i, j−1 f ×( N GOP  − j +1) −  b i, j−1    j=2,  3, ...  , N GOP ,
where R i,j is the available bit rate and f represents the predefined frame rate; N GOP indicates the number of frames within a GOP; V i,j is the virtual buffer occupancy computed by using (11) and (12); and b i,j−1 is the generated bit of the ( j −1)th frame. Thus, V i,j is computed as
(11) V i, 1  ={ 0i=2, V i−1,  N GOP ​+ b i−1,  N GOP ​− R i−1,  N GOP f − A i−1,  N GOP otherwise,
where A i, j represents the adjustment bit for the j th frame within the i th GOP, which is also considered in [19] . The generated bit of the IDR frame is usually much larger than the average available bit used for encoding one frame due to the constrained channel bandwidth, which is represented by R i, j / f . When the IDR frame has been encoded, the excess bit should be offset. If the generated bit of an interframe is less than the average available bit used for encoding one frame, then a proportion of the difference between the generated bit and the average available bit is adopted to be the adjustment bit for offsetting the excess bit. The adjustment bit A i, j is defined as follows:
(13) A i, j ={ η× O i, j if j≠1,  I i, j−1 >0 ,   O i, j <0 ,  ( I i, j−1 +η× O i, j )≥0, − I i, j−1 if j≠1,  I i, j−1 >0,  O i, j <0,  ( I i, j−1 +η× O i, j )<0, η× O i, j if  j≠1,  I i, j−1 <0 ,   O i, j >0,  ( I i, j−1 +η× O i, j )≤0, − I i, j−1 if  j≠1,  I i, j−1 <0,   O i, j >0 ,  ( I i, j−1 +η× O i, j )>0, η× O i, 1 if   I i−1,  N GOP >0,   O i, 1 <0 ,  ( I i−1,  N GOP +η× O i, 1 )≥0, − I i−1,  N GOP if   I i−1,  N GOP >0 ,   O i, 1 <0 ,  ( I i−1,  N GOP +η× O i, 1 )<0, η× O i, 1 if   I i−1,  N GOP <0 ,   O i, 1 >0,  ( I i−1,  N GOP +η× O i, 1 )≤0, − I i−1,  N GOP if   I i−1,  N GOP <0,   O i, 1 >0,  ( I i−1,  N GOP +η× O i, 1 )>0, 0 otherwise,   
(14) I i, j ={ O i, j i=1,  j=1, I i−1,  N GOP + A i, j i=2, 3 , ... , j=1, I i, j−1 + A i, j otherwise,
(15) O i, j = b i, j − R i, j f ,
where I i, j represents the total bit needed to offset the excess bit, η is a parameter, and O i, j indicates the difference between the generated bit and the average available bit.
- 3. Frame-Level Bit Allocation
The scheme in [19] uses the target bit budgets based on the bit used for the remaining frames and based on the target buffer level to propose a frame-level bit allocation. However, the layers in the low delay configuration are not taken into account in [19] . In this subsection, an accurate bit allocation method for the low delay configuration is proposed considering the layers.
There are three layers of interframes in the low delay configuration. For the frames in different layers, even if they are encoded using an equivalent QS value, then the average generated bit may be different. The target bit budget based on the bit used for the remaining frames is computed as
(16) T ^ i, j = W ¯ l curr ,  p l curr −1 × B i, j ∑ l=1 m W ¯ l,  p l −1 × N l, r, i ,
where
W ¯ l, p l −1
indicates the average weighting factor of the ( pl − 1)th frame in the l th layer because the average weighting factors of the frames in different layers may be different; l curr is the layer index of the j th frame within the i th GOP, which can be also represented as the p lcurr th frame in the l curr th layer; N l,r,i is the number of remaining frames in the l th layer within the i th GOP; and m is the number of layers. The average weighting factor
W ¯ l curr ,  p l curr
is computed as
(17) W ¯ l curr ,  p l curr = QP i, j × b i, j 8 + 7× W ¯ l curr ,  p l curr −1 8 .
For the j th frame within the i th GOP, the target buffer level is determined using
(18) S i, j ={ V i, j j=1, S i, j−1 − S i, 1 N GOP −1 + ϕ l curr ,  p l curr −1 × R i, j f j=2, 3 , ... ,  N GOP ,
(19) ϕ l curr ,  p l curr −1 = W ¯ l curr ,  p l curr −1 × N GOP ∑ l=1 m W ¯ l,  p l −1 × N l, i −1,
where N l, i denotes the number of frames in the l th layer within the i th GOP, and S i,1 /( N GOP − 1) indicates that the target buffer level is expected to be zero after all frames within the GOP have been encoded. Therefore, the target bit budget based on the target buffer level is calculated as follows:
(20) T ˜ i, j = W ¯ l curr ,  p l curr −1 × N GOP ∑ l=1 m W ¯ l,  p l −1 × N l, i × R i, j f +γ×( S i, j − V i, j ),
where γ is a parameter.
After
T ^ i,j
and
T ˜ i,j
are obtained, the target bit allocated for the j th frame within the i th GOP is obtained as
(21) T i,j =β× T ^ i ,j +(1−β)× T ˜ i,j ,
where β is a parameter. The final target bit is then restricted by using U i, j and L i, j .
(22) T i,j = min( U i,j ,max( L i,j , T i,j ) ),
(23) U i,j ={ R i,j ×ϖ− V i,j j=1, U i,j−1 − b i,j−1 + R i,j−1 f otherwise,
(24) L i,j ={ R i,j f − V i,j − I i−1, N GOP f j=1, L i,j−1 − b i,j−1 + R i,j−1 f otherwise,
where ϖ is a parameter.
- 4. CTU-Level QP Prediction
After the proposed R-Q model and the proposed bit allocation method are described, an efficient QP prediction method is necessary to determine the QP value for a CTU. In this paper, a rate distortion–optimized QP prediction method is proposed.
For the first CTU in the j th frame, its QP is computed as
(25) QP i, j, 1 ={ QP ¯ i, j−1 i=3, 4, ... , j≠1, QP ¯ i−1,  N GOP i=3, 4, ... , j=1,
where QP i,j,1 is the QP value for the first CTU in the j th frame within the i th GOP, and
QP ¯ i,j
is the average QP value for the j th frame within the i th GOP.
In [15] , the sum of the inverses of the distortions of all the macroblocks in a frame is maximized to compute the QPs for the macroblocks. In this paper, a similar method is used to predict the QS values for the CTUs. When the k th CTU in the j th frame is encoded, its QS value is found as follows:
(26) maximize   1 N CTU −k+1 ∑ g=k N CTU ( D i, j, g ) −1 ,
(27) s.t.   ∑ g=k N CTU T i, j, g − T r, i, j =0,
where N CTU is the total number of CTUs in a frame; D i,j,g and T i,j,g are the distortion of and the bit required for the g th CTU in the j th frame within the i th GOP, respectively; and T r,i,j is the target bit for the remaining CTUs.
In this paper, a linear D-Q model is used to represent the relationship between the distortion and the QS, which is described as follows:
(28) D l = ρ l ×QS,
where Dl is the distortion of a CTU in the lth layer, and ρl is a model parameter.
Substituting the proposed R-Q model and the linear D-Q model into (26) and (27), respectively, (26) and (27) become
(29) maximize   1 N CTU −k+1 ∑ g=k N CTU ( ρ l curr ,  p l curr , g × QS i, j, g ) −1 ,
s.t.
(30) ∑ g=k N CTU ( W g ⋅ H g ⋅ m l curr ,  p l curr , g ⋅( a l curr ,  p l curr , g QS i, j, g   +   b l curr ,  p l curr , g QS i, j, g 2 ) ) − T r, i, j =0.
The QS for the k th CTU in the j th frame is computed by using the Lagrange multiplier method as follows:
(31) QS i, j, k = − a l curr ,  p l curr , k 2 b l curr ,  p l curr , k + ρ l curr ,  p l curr , k −1 b l curr ,  p l curr , k ⋅ m l curr ,  p l curr , k  ⋅ W k ⋅ H k × T r,i,j + ∑ g=k N CTU a l curr ,  p l curr , g 2 ⋅ m l curr , p l curr ,g ⋅ W g ⋅ H g 4 b l curr , p l curr ,g ∑ g=k N CTU ρ l curr , p l curr ,g −2 b l curr , p l curr ,g ⋅ m l curr , p l curr ,g ⋅ W g ⋅ H g .
When the k th CTU in the j th frame is encoded, the parameters
a l curr ,  p l curr , g
,
b l curr ,  p l curr , g
, and
ρ l curr ,  p l curr , g
are unavailable for k < g N CTU , Therefore,
a l curr ,  p l curr , k
,
b l curr ,  p l curr , k
, and
ρ l curr ,  p l curr , k
are used to approximate the corresponding parameters with g satisfying k < g N CTU , respectively. Then, (31) becomes
(32) QS i,j,k = − a l curr , p l curr ,k 2 b l curr , p l curr ,k + 1 m l curr , p l curr ,k ⋅ W k ⋅ H k × T r,i,j + a l curr , p l curr ,k 2 4 b l curr , p l curr ,k ( ∑ g=k N CTU m l curr , p l curr ,g ⋅ W g ⋅ H g ) b l curr , p l curr ,k ∑ g=k N CTU ( m l curr , p l curr ,g ⋅ W g ⋅ H g ) −1 .
- 5. Steps of Proposed Rate Control Scheme
The proposed rate control scheme is summarized in Fig. 3 . The detailed steps of the proposed scheme are described as follows:
  • 1) The QP values for the first frames within the first and second GOPs are set to be QPinitand QPinit+3, respectively, where QPinitis the initial QP. The QP values of the other frames within the second GOP are set to be QPinit+2, QPinit+3, and QPinit+1, respectively. Encode all five frames using these QP values. Obtain the generated bits, the buffer occupancies, and the average weighting factors of the frames. Leti= 3,j= 1, and go to step 2).
  • 2) Compute the virtual buffer occupanciesVi,j, and the bitsBi,jfor the remaining frames within theith GOP. IfBi,j≤ 0, then the value of QPi,jis set to beQP¯i,j−1+2and QPi,jis further bounded by QPi,j= max{1, min{51, QPi,j}}. Then, the CTUs in thejth frame are encoded by using QPi,j. Then go to step 8). Otherwise, go to step 3).
  • 3) CalculateT^i,jby using (16). EstimateT˜i,jand computeTi,jfor thejth frame. Letk= 1 and go to step 4).
  • 4) Obtain QPi,j, 1according to (25) and then encode the CTU. The number of the generated bit for the CTU is recorded. Letk=k+ 1, and go to step 5).
  • 5) Calculate the target bitsTr,i,jfor the remaining CTUs in thejth frame. IfTr,i,j≤ 0, then letQPi,j,k=QP¯i,j−1+2and bound QPi,j,kby using QPi,j,k= max{1, min{51, QPi,j,k}}, and then go to step 7). Otherwise, go to step 6).
  • 6) Update parametersalcurr,plcurr,k,    blcurr,plcurr,k,    clcurr,plcurr,k,anddlcurr,plcurr,kaccording to the actual encoding data of the previous CTUs in the same layer. Predictmlcurr,plcurr,g(k≤g≤NCTU) for the remaining CTUs. If the value of the expression that lies beneath the square root symbol in (32) is negative, then letQPi,j,k=QP¯i,j−1−1.Otherwise, calculate QSi,j,kfrom (32) and convert QSi,j,kinto QPi,j,k. Bound QPi,j,kby using QPi,j,k=max{QP¯i,j−1 −2,  min{QP¯i,j−1 +2,  QPi,j,k}}and QPi,j,k= max{1, min{51, QPi,j,k}}. Go to step 7).
  • 7) Encode thekth CTU using QPi,j,k. Then, obtain the generated bit and the actual complexity value of the CTU. Letk=k+ 1. Ifk≤NCTU, then go back to step 5) and encode the next CTU. Otherwise, go to step 8).
  • 8) Obtain the average QP value, the generated bit, the buffer occupancy, and the average weighting factor of thejth frame. Then, encode the next frame until the last frame in the video sequence.
PPT Slide
Lager Image
Flowchart of proposed rate control scheme.
IV. Experimental Results
The performances of the proposed rate control scheme for HEVC are evaluated in this section. The experiment is implemented on an HEVC test model encoder HM-13.0. To compare the proposed scheme with the three state-of-the-art rate control schemes in [17] , [18] , and [19] , all the video sequences in classes B, C, D, and E are tested with low delay configuration. In the three schemes, JCTVC-I0094 [19] and JCTVC-M0036 [17] have been adopted by JCTVC and implemented in HM. The tested sequences and testing configurations are detailed in JCTVC-I1100 [26] . In the experiment, QP init is set to be 32, and the low complexity setting is used. The parameters η , γ , β , and ϖ are set to be 0.2, 0.25, 0.9, and 0.9, respectively.
The R-D performance is evaluated in terms of the Bjøntegaard delta (BD-PSNR) [27] , which is used to represent the average and bit rate differences. A positive value for BD-PSNR indicates that the corresponding scheme achieves better R-D performance.
The BD-PSNR values obtained for all the video sequences in each class are averaged, and the results are shown in Tables 2 and 3 . From the results in Table 2 , it can be seen that for each of the four classes the proposed scheme can achieve better R-D performances than JCTVC-I0094 and JCTVC-M0036, and the average BD-PSNR values between the proposed scheme and JCTVC-I0094 and between the proposed scheme and JCTVC-M0036 are 0.69 dB and 0.10 dB, respectively. Compared to Wang’s scheme [18] , the proposed scheme can achieve better R-D performances for classes B and E, and the average BD-PSNR of all the classes is 0.02 dB, which shows that the R-D performance of the proposed scheme is similar to that of Wang’s scheme. Furthermore, the same conclusions can be observed from the results in Table 3 .
R-D performance comparisons with LB-main.
Video sequences Proposed vs. JCTVC-I0094 (dB) Proposed vs. JCTVC-M0036 (dB) Proposed vs. Wang’s scheme (dB)
Class B 0.91 0.05 0.08
Class C 0.46 0.10 −0.05
Class D 0.36 0.07 −0.13
Class E 1.01 0.18 0.18
Average 0.69 0.10 0.02
R-D performance comparisons with LP-main.
Video sequences Proposed vs. JCTVC-I0094 (dB) Proposed vs. JCTVC-M0036 (dB) Proposed vs. Wang’s scheme (dB)
Class B 0.85 0.02 0.08
Class C 0.44 0.04 0.05
Class D 0.35 0.02 0.08
Class E 0.94 0.04 0.15
Average 0.61 0.08 −0.01
Figure 4 shows the R-D curves of the four schemes for the “Kimono,” “Johnny,” “KristenAndSara,” and “BasketballDrive” sequences. From the figure, it can be observed that the proposed scheme has better R-D performances than the other three schemes for the four sequences.
PPT Slide
Lager Image
R-D curves of four schemes with LB-main: (a) “Kimono,” (b) “Johnny,” (c) “KristenAndSara,” and (d) “BasketballDrive.”
In addition, the accuracy of the bit rate mismatch is defined for rate control in terms of mismatch error as follows:
(33) ΔR= | R t − R b | R b ×100%,
where R t is the bit rate performance resulting from the test scheme, and R b is the target bit rate.
Tables 4 and 5 show the average bit rate mismatch errors of all the four classes for the four schemes with LB-main and LP-main configurations, respectively. From the results in the tables, it can be seen that the proposed scheme can achieve the smallest bit rate mismatch error between the target bit rate and the actual bit rate.
Bit rate mismatch comparisons with LB-main.
Video sequences Average ΔR (%)
JCTVC-I0094 JCTVC-M0036 Wang’s scheme Proposed
Class B 2.30 0.01 0.05 0.04
Class C 2.58 0.03 0.06 0.16
Class D 2.41 0.08 0.42 0.06
Class E 2.03 0.22 0.16 0.06
Average 2.33 0.09 0.17 0.08
Bit rate mismatch comparisons with LP-main.
Video sequences Average ΔR (%)
JCTVC-I0094 JCTVC-M0036 Wang’s scheme Proposed
Class B 2.33 0.01 0.05 0.04
Class C 2.47 0.03 0.07 0.16
Class D 2.41 0.08 0.41 0.06
Class E 2.03 0.21 0.15 0.07
Average 2.31 0.08 0.17 0.08
V. Conclusion
In this paper, a novel q -domain rate control scheme for low delay video coding of HEVC was proposed. In the proposed scheme, an accurate R-Q model and a method for determining the QPs of the first frames within GOPs was proposed. Subsequently, a frame-level bit allocation method was also presented. Finally, based on the proposed R-Q model and the target bit allocated, QPs were predicted for CTUs by using rate-distortion optimization. The proposed scheme can be applied to real-time video communications. Experimental results show that the proposed scheme can achieve better R-D performances than JCTVC-I0094 and JCTVC-M0036, and a similar R-D performance to Wang’s scheme; furthermore, it obtained the smallest bit rate mismatch error among all the four schemes.
The proposed scheme is developed for a low delay configuration. For a random access configuration, the proposed R-Q model and a CTU-level QP prediction can be used; the method of determining QPs for the first frames within GOPs, the GOP-level rate control, and the frame-level bit allocation should all be correspondingly adjusted, which is also our future work.
This work was supported in part by the National Natural Science Foundation of China under Grant 61471277, the Ningbo Natural Science Foundation under Grant 2015A610129, the 111 Project under Grant B08038, also supported by ISN State Key Laboratory.
BIO
Corresponding Author wwu@xidian.edu.cn
Wei Wu received his BS degree in electronic materials and elements, and his MS and PhD degrees in communication and information systems from Xidian University, Xi’an, China, in 1998, 2001, and 2005, respectively. He is currently an associate professor with the School of Telecommunication Engineering, Xidian University. From 2007 to 2008, he was a postdoctoral researcher at Sejong University, Seoul, Rep. of Korea. His research interests include video coding and video signal processing.
liujiong@mail.xidian.edu.cn
Jiong Liu received his BS and MS degrees in communication and information systems from Xidian University, Xi’an, China, in 1995 and 2001, respectively. He is currently a lecturer with the School of Telecommunication Engineering, Xidian University. His research interests include video coding and video signal processing.
fenglei@mail.xidian.edu.cn
Lei Feng received his BS and MS degrees in communication and information systems from Xidian University, Xi’an, China, in 1999 and 2002, respectively. He is currently a lecturer with the School of Telecommunication Engineering, Xidian University. His research interests include video signal processing.
References
2013 ITU-T H.265 | ISO/IEC 23008-2, High Efficiency Video Coding
Wu W. , Kim H.K. 2009 “A Novel Rate Control Initialization Algorithm for H.264,” IEEE Trans. Consum. Electron. 55 (2) 665 - 669    DOI : 10.1109/TCE.2009.5174437
Pitrey Y. , Babel M. “ρ-Domain Based Rate Control Scheme for Spatial, Temporal, and Quality Scalable Video Coding,” Proc. SPIE 7257 Visual Commun. Image Process. San Jose, CA, USA Jan. 18–19, 2009 5 - 8
1989 CCITT SG XV WP/1/Q4, Description of Reference Model 8 (RM8)
Viscito E. , Gonzales C. “A Video Compression Algorithm with Adaptive Bit Allocation and Quantization,” Proc. SPIE 1605 Visual Commun. Image Process. Boston, MA, USA Nov. 1–2, 1991 58 - 72
1993 ISO/IEC JTC/SC29/WG11, MPEG Test Model 5 (TM5)
Ribas-Corbera J. , Lei S. 1999 “Rate Control in DCT Video Coding for Low-Delay Communications,” IEEE Trans. Circuits Syst. Video Technol. 9 (1) 172 - 185    DOI : 10.1109/76.744284
2001 ISO/IEC JTC1/SC29/WG11, MPEG-4 Video Verification Model Version 18.0: Coding of Moving Pictures and Audio
Leontaris A. , Tourapis A.M. 2007 “Rate Control Reorganization in the Joint Model (JM) Reference Software,” ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6 23rd Meeting San Jose, CA, USA Doc. JVT-W042
Li Z. 2003 “Adaptive Basic Unit Layer Rate Control for JVT,” ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6 7th Meeting Pattaya, Thailand Doc. JVT-G012
Kim M.-J. , Hong M.-C. 2012 “Fast Rate Control Algorithm in Frame-Layer for H.264/AVC Video Coding,” IEEE Trans. Consum. Electron. 58 (3) 872 - 879    DOI : 10.1109/TCE.2012.6311330
Li M. 2009 “Frame Layer Rate Control for H.264/AVC with Hierarchical B-frames,” J. Image Commun. 24 (3) 177 - 199
Liu Y. , Li Z.G. , Soh Y.C. 2007 “A Novel Rate Control Scheme for Low Delay Video Communication of H.264/AVC Standard,” IEEE Trans. Circuits Syst. Video Technol. 17 (1) 68 - 78    DOI : 10.1109/TCSVT.2006.887081
Hu S. 2011 “Rate Control Optimization for Temporal-Layer Scalable Video Coding,” IEEE Trans. Circuits Syst. Video Technol. 21 (8) 1152 - 1162    DOI : 10.1109/TCSVT.2011.2138810
Wang H. , Kwong S. “A Rate-Distortion Optimization Algorithm for Rate Control in H.264,” IEEE Int. Conf. Acoust., Speech Signal Process. Honolulu, HI, USA Apr. 15–20, 2007 1149 - 1152
Choi H. 2012 “Rate Control Based on Unified RQ Model for HEVC,” JCT-VC of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 8th Meeting San Jose, CA, USA Doc. JCTVC-H0213
Li B. , Li H. , Li L. 2013 “Adaptive Bit Allocation for R-lambda Model Rate Control in HM,” JCT-VC of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 13th Meeting Incheon, Rep. of Korea Doc. JCTVC-M0036
Wang S. 2013 “Rate-GOP Based Rate Control for High Efficiency Video Coding,” IEEE J. Sel. Topics Signal Process. 7 (6) 1101 - 1111    DOI : 10.1109/JSTSP.2013.2272240
Choi H. 2012 “Improvement of the Rate Control Based on Pixel-Based URQ Model for HEVC,” JCT-VC of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 9th Meeting Geneva, Switzerland Doc. JCTVC-I0094
Li B. 2012 “QP Determination by Lambda Value,” JCT-VC of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 9th Meeting Geneva, Switzerland Doc. JCTVC-I0426
Kim I. 2013 “High Efficiency Video Coding (HEVC) Test Model 12 (HM12) Encoder Description,” JCT-VC of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 14th Meeting Vienna, Austria Doc. JCTVC-N1002
Chiang T. , Zhang Y.-Q. 1997 “A New Rate Control Scheme Using Quadratic Rate Distortion Model,” IEEE Trans. Circuits Syst. Video Technol. 7 (1) 246 - 250    DOI : 10.1109/76.554439
Ma S. , Gao W. , Lu Y. 2005 “Rate-Distortion Analysis for H.264/AVC Video Coding and its Application to Rate Control,” IEEE Trans. Circuits Syst. Video Technol. 15 (12) 1533 - 1544    DOI : 10.1109/TCSVT.2005.857300
Dong J. , Ling N. 2009 “A Context-Adaptive Prediction Scheme for Parameter Estimation in H.264/AVC Macroblock Layer Rate Control,” IEEE Trans. Circuits Syst. Video Technol. 19 (8) 1108 - 1117    DOI : 10.1109/TCSVT.2009.2020338
Liu Y. , Li Z.G. , Soh Y.C. 2007 “A Novel Rate Control Scheme for Low Delay Video Communication of H.264/AVC Standard,” IEEE Trans. Circuits Syst. Video Technol. 17 (1) 68 - 78    DOI : 10.1109/TCSVT.2006.887081
Bossen F. 2012 “Common Test Conditions and Software Reference Configurations,” JCT-VC of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 9th Meeting Geneva, Switzerland Doc. JCTVC-I1100
Bjøntegaard G. 2001 “Calculation of Average PSNR Differences between RD-Curves,” ITU-T SG16 Q.6 VCEG 13rd Meeting Austin, TX, USA Doc. VCEG-M33