The emerging high efficiency video coding (HEVC) standard adopts the quadtreestructured transform unit (TU) in the residual quadtree (RQT) coding. Each TU allows to be split into four equal subTUs recursively. The RQT coding is performed for all the possible transform depth levels to achieve the highest coding efficiency, but it requires a very high computational complexity for HEVC encoders. In order to reduce the computational complexity requested by the RQT coding, in this paper, we propose a fast TU size decision method incorporating an adaptive maximum transform depth determination (AMTD) algorithm and a full check skipping  early termination (FCSET) algorithm. Because the optimal transform depth level is highly contentdependent, it is not necessary to perform the RQT coding at all transform depth levels. By the AMTD algorithm, the maximum transform depth level is determined for current treeblock to skip those transform depth levels rarely used by its spatially adjacent treeblocks. Additionally, the FCSET algorithm is introduced to exploit the correlations of transform depth level between four subCUs generated by one coding unit (CU) quadtree partitioning. Experimental results demonstrate that the proposed overall algorithm significantly reduces on average 21% computational complexity while maintaining almost the same rate distortion (RD) performance as the HEVC test model reference software, HM 13.0.
1. Introduction
T
he High Efficiency Video Coding (HEVC) standard is the most recent joint video project of the ITUT Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) standardization organizations, working together in a partnership known as the Joint Collaborative Team on Video Coding (JCTVC)
[1]
. HEVC significantly outperforms previous standards such as MPEG2, MPEG4 Part 2 and H.264/AVC in terms of coding efficiency, and achieves half reduction of the bitrate needed by the widely used H.264/AVC while maintaining equivalent visual quality
[2
,
3]
.
To obtain higher coding efficiency, a large number of new techniques are adopted in HEVC, such as the hierarchical quadtree structure of motion compensation, large coding treeblock, coding unit (CU), prediction unit (PU) and transform unit (TU)
[4]
. A treeblock consists of 64×64 luma samples together with two corresponding blocks of chroma samples. To some extent, treeblock in HEVC is broadly analogous to macroblock (MB) in H.264/AVC. CU is the basic unit for intra/inter prediction, which allows to be subdivided into four equally sized subCUs recursively. CU is always square and of size from the size of treeblock down to 8×8. PU, which covered by the prediction residual block, can be recursively partitioned into multiple square TUs in a way known as the residual quadtree (RQT) coding. TU supports transform block with size from 4×4 to 32×32
[1

4]
. Both CUs and TUs can be recursively subdivided into quadrants as illustrated in
Fig. 1
, where solid lines indicate CU boundaries and dotted lines indicate TU boundaries.
Subdivision of a treeblock into CUs [and TUs]. Solid lines indicate CU boundaries and dotted lines indicate TU boundaries. (a) Treeblock with its partitioning. (b) Corresponding quadtree.
According to the HEVC standard
[1]
, an image is first divided into a number of treeblocks. Each treeblock is then further partitioned into multiple CUs. CU serves as the root for the TU quadtree structure as shown in
Fig. 1
. In the TU quadtree structure, current TU may be identical to the CU it belongs to, otherwise current TU will be split into four smaller TUs. It is referred to as the residual quadtree (RQT) coding which takes a combination of operations: transform, quantization, inverse quantization and inverse transform to determine the optimal TU size in a bruteforce fashion
[1
,
5

7]
. The optimal TU size, i.e., the one with the least rate distortion (RD) cost, is determined according to Lagrangian multiplier (more details can be found in
[3
,
8]
). RD cost for each TU size decision is defined as follows:
where
X
stands for the TU size to be check,
B_{X}
represents the bit cost considered for the TU size decision,
SSE_{X}
(Sum Squared Error) represents the difference between the original residual image covered by the tested TU and its reconstruction,
λ_{X}
is the Lagrangian multiplier determined by the content of the tested TU.
However, this “try all and select the best” method adopted by the RQT coding will result in high computational complexity and limit realtime HEVC encoders in applications. Therefore, fast algorithms which are expected to reduce the computational complexity requested by the RQT coding without sacrificing too much coding efficiency, are very desirable for realtime implementation of HEVC encoders.
The remainder of this paper is organized as follows. Section 2 presents a brief literature review of fast algorithms for HEVC encoders. Section 3 describes the proposed fast TU size decision method for the RQT coding in HEVC, which incorporates an adaptive maximum transform depth determination (AMTD) algorithm and a full check skipping  early termination (FCSET) algorithm. Experiments are carried out and analyzed in Section 4 to demonstrate the effectiveness of the proposed algorithms. Section 5 concludes the paper.
2. Related Work
Recently, many fast algorithms have been proposed for HEVC encoders to reduce the computational complexity. In general, these algorithms could be categorized into three main classes: fast CU size decision, fast PU mode decision and fast TU size decision. They are briefly summarized as follows.
Fast Coding Unit Size Decision:
Based on the observation that current CU will not be subdivided if its coding mode is SKIP mode, Choi and Jang proposed a tree pruning algorithm for fast CU size decision
[9]
. It was reported that a 40% reduction in encoding time can be achieved on HM 3.0. However, this fast CU size decision algorithm is applied to inter prediction rather than intra prediction.
Shen
et al.
[10]
proposed a fast CU size decision scheme based on the Bayesian decision rule. The CU size decision was made on the Bayesian risk, which can be calculated from the Lagrangian cost, the classconditional probability density functions and priori probabilities. Random access (RA) and lowdelay B (LB) configurations were used for simulations and on average 41.4% encoding time reduction was reported. But the decision on TU size and mode selection are not investigated.
A gradient based fast intra mode decision was proposed by Jiang
[11]
, where gradient directions and histogram were derived for fast CU size decision. Based on the distribution of gradient histogram, only a small part of candidate modes were chosen for rough mode decision (RMD) and rate distortion optimization (RDO) process. This approach can reduce on average 20% encoding time with negligible loss of coding efficiency on HM 4.0. Whereas this fast CU size decision is just for intra mode decision.
Shen
et al.
[12
,
13]
proposed an adaptive CU depth range determination method and several early termination schemes to speed up the CU size decision based on the spatial correlations between neighboring CUs. With these algorithms, 42% and 41% encoding time was reduced under RA and LB configurations on HM 2.0, respectively. However, those techniques such as early termination of motion estimation are applied to inter mode decision only and are not fit for intra mode decision.
Fast Prediction Unit Mode Decision:
Zhao
et al.
[14]
have studied the impact of the number of PU mode candidates after the RMD process, and proposed a fast PU mode decision algorithm for intra prediction by reducing the number of PU mode candidates. It was reported that on average 20% and 28% encoding time reduction can be achieved under High Efficiency (HE) and Low Complexity (LC) test conditions, respectively, with almost the same coding efficiency as HM 1.0. It is noted that HE and LC test conditions are used in the early phase of HEVC development and have been merged since HM 6.0.
Several fast intra prediction mode decisions have been introduced by Zhang and Ma
[15
,
16]
. In their works, three methods were proposed to speed up the PU mode decision: a modified Hadamard transform, a novel progressive rough mode search and an early termination method for rate distortion optimization quantization (RDOQ) process. It was reported 2.5x speedup with 1.0% BDRate increase on HM 10.0. However, the correlations between neighboring TUs have not yet been exploited.
Fast Transform Unit Size Decision:
The RQT coding in HEVC is applied to improve the coding efficiency, but it demands significant computational overhead
[2
,
3
,
5

8]
. Hence, Tan
et al.
proposed several fast encoding schemes for both intra prediction and inter prediction for TU size decisionmaking in the RQT coding
[17]
. It was reported that for AI case, the fast RQT coding algorithm saved 13% encoding time with 0.1% BDRate increase. For RA and LB cases, up to 9% encoding time can be reduced at the expense of 0.3% BDRate increase on HM 2.0.
Another fast RQT coding scheme was proposed by Teng
et al.
[18]
, where the original depthfirst TU size decision was replaced by a MergeandSplit decision for the RQT coding. The MergeandSplit decision process was terminated and no further TU splitting was performed when current TU was a zeroblock. It reported almost 2x speedup for RA case under HE configuration on HM 2.0, with about 0.3% BDRate increase.
Kiho Choi et al.
[19]
exploited the relationship between the determined TU size and the number of nonzero DCT coefficients to determine the TU size at an early stage. The total misprediction ratio for the given NNZ with a threshold 3 was 1.26%, which implied that there existed negligible coding efficiency compared to the original HEVC encoder. It showed 0.6% BDRate increase as well as averaged 60% computational complexity reduction for RA case under HE configuration on HM 3.0.
Furthermore, Zhang and Zhao
[20]
have developed an adaptive RQT coding algorithm for inter prediction by restricting the smaller transform depth level for larger CU size and vice verse, based on the observations from the RQT coding. It reported 0.7% BDRate increase with 7.2%~21% computational complexity reduction under HE and LC configuration on HM 4.0, respectively. However, the proposed algorithm is for inter prediction only and intra prediction is not concerned.
A piece of our earlier work presented a fast RQT coding algorithm based on the experimental observations
[21]
. When current CU was split into four quadrants, the RQT coding of upperleft subCU was performed first, followed by the individual RQT coding of remaining three subCUs with predetermining their smallest TU size as that of upperleft subCU. Therefore, a lot of TU size decisions can be skipped for the RQT coding. This fast RQT coding algorithm is updated to reduce more TU size decisions in Section 3.
The aforementioned algorithms are well developed to reduce the computational complexity for HEVC encoders. However, the TU size correlations between neighboring CUs are not fully studied. In order to reduce the computational complexity requested by the RQT coding, in this paper, we proposes an effective TU size decision method incorporating an adaptive maximum transform depth determination (AMTD) algorithm and a full check skipping  early termination (FCSET) algorithm. Considering that the optimal transform depth level, namely TU size, for a certain treeblock is highly contentdependent, it is not necessary to perform the RQT coding at all transform depth levels for all treeblocks in a given video sequence. Therefore, the AMTD algorithm is introduced to skip those transform depth levels rarely used in spatially adjacent treeblocks for current treeblock according to their spatial correlations. Meanwhile, there exist spatial and temporal correlations between four subCUs generated by one CU quadtree partitioning. By the FCSET algorithm, the unnecessary RQT coding on small CU sizes can be skipped to reduce the encoding time. Experimental results demonstrate that the proposed overall algorithm can significantly reduce the computational complexity while maintaining almost the same coding efficiency as the HEVC test model reference software, HM 13.0.
3. Overview of Fast TU Size Decision Method for HEVC
 3.1 Adaptive Maximum Transform Depth Determination (AMTD)
In HEVC the transform depth level of the treeblock is within a fixed range (i.e., 0~3), that is, the available TU sizes are from 32×32 down to 4×4. Current TU is implicitly split when its size is larger than the maximun TU size, and it will be not split when the TU splitting would result in a TU size smaller than the minimum TU size. By allowing different TU sizes, the RQT coding enables the adaptation of the transform to the varying spacefrequency characteristics of the residual signal. Larger TU sizes, which have larger spatial support, provide better frequency resolution, while smaller TU sizes, which have smaller spatial support, provide better spatial resolution
[1

5
,
7
,
8]
.
It becomes evident that the transform depth level range of current treeblock should be adaptively determined based on its motion and texture properties, so the exhaustive TU size decision adopted by the original RQT coding is inefficient. The fast RQT coding algorithms
[17

20]
mentioned in Section 2 simply utilize the characteristics of the RQT coding, such as the number of nonzero transform coefficients and the obvious correlations between TU size and CU size. However, the correlations of transform depth level between current treeblock and its spatially adjacent treeblocks are not fully utilized.
Natural video sequences have inherently strong spatial and temporal correlations, especially in the homogeneous regions. The fact that there exist a mass of homogeneous regions in the natural video sequences means that the texture between adjacent treeblocks are similar. So the optimal CU depth level and transform depth level of current treeblock is the same or very close to those of its spatially coded adjacent treeblocks due to their high spatial correlations
[22]
. Specifically, the optimal transform depth level of current treeblock
TrDepth_{pred}
can be predicted from the transform depth levels of its spatially coded adjacent treeblocks (L, U, and LU treeblocks in
Fig. 2
) as follows:
where
N
is the number of spatially coded adjacent treeblocks of current treeblock equal to 3 in our experiments,
ω_{i}
is the predetermined correlation weight for ith neighboring treeblock and
δ_{i}
is the transform depth level of ith neighboring treeblock. The correlation weights are normalized to have
Spatial relationship of treeblocks. C: current treeblock; L: left treeblock; U: upper treeblock; LU: upperleft treeblock.
Those three coded treeblocks denoted as
L
,
U
, and
L

U
treeblocks in
Fig. 2
are chosen for current treeblock for the prediction function (2) due to their higher spatially correlations. And the topright treeblock and the colocated treeblock of current treeblock are excluded in our experiments to provide a desirable balance between computational complexity and coding efficiency. In general, the spatial correlation between treeblocks is higher as they are closer. It is obvious that the upper and left treeblocks are nearer (in terms of Euclidean distance) to current treeblock relative to the upperleft treeblock, that is, the spatially adjacent treeblocks in the horizontal and vertical directions have a higher correlation with current treeblock than the treeblock in the diagonal direction. Due to the diversity of natural videos, it is impractical to assign different correlation weights for upper and left treeblocks for all video sequences. Therefore, a large correlation weight is assigned for both the upper treeblock and the left treeblock, while a small one for the upperleft treeblock in the prediction function (2). The experiments to choose the appropriate weights for these three coded treeblocks are presented in the following section.
According to the predicted value of the optimal transform depth level, each treeblock is divided into one of three types (
G
_{1}
,
G
_{2}
,
G
_{3}
) as follows:

(1) If0 ≤TrDepthpred≤ 0.5, most of spatially coded adjacent treeblocks of current treeblock tend to choose a small transform depth level “0”. Current treeblock is located in the moderate motion region and classified as typeG1;

(2) If0.5 ≤TrDepthpred≤ 1.5, most of spatially coded adjacent treeblocks of current treeblock tend to choose transform depth level “1”. Current treeblock is classified as typeG2;

(3) If1.5 ≤TrDepthpred≤ 3, most of spatially coded adjacent treeblocks of current treeblock tend to a large choose transform depth level “2” and “3”. Current treeblock is located in the fast motion region and classified as typeG3.
For type
G
_{1}
treeblock, the area its coded spatially adjacent treeblocks cover contains motionless or slowmotion content. So it is good enough to perform the RQT coding for type
G
_{1}
treeblock on large TU size, i.e., 16×16 and 32×32. For type
G
_{3}
treeblock, most of its spatially coded adjacent treeblocks are located in the fast motion region. As a result, the RQT coding remains unchanged as in the original HEVC encoder for type
G
_{3}
treeblock to achieve high coding efficiency.
Table 1
summarizes the candidate transform depth levels to be tested in the RQT coding for each type of treeblock. As shown in
Table 1
, type
G
_{1}
and
G
_{2}
treeblocks can skip 1~2 unnecessary transform depth levels. The maximum transform depth level for type
G
_{3}
treeblock is 3, consequently, the original RQT coding is used.
Candidate transform depth levels for each treeblock type
Candidate transform depth levels for each treeblock type
Extensive experiments are conducted to verify the legitimacy of the proposed AMTD algorithm. Six video sequences are selected: “BasketballDrive” and “Cactus” (1920×1080), “Johnny” and “KristenAndSara” (1280×720), “BQMall” and “PartyScene” (832×480). Among these video sequences, “BasketballDrive” and “Cactus” are with a large global motion or a large local motion; “Johnny”, “KristenAndSara”, “BQMall” and “PartyScene” are with a medium local motion. Test conditions are listed as follows: each video sequence is encoded under both RA and LB configurations; quantization parameter (QP) is set to 22, 27, 32, 37, which are indicated in
[23]
. RDOQ is enabled.
The prediction accuracy of the transform depth level determined by the AMTD algorithm using different correlation weight sets are listed in the columns 2~4 of
Table 2
. It can be seen that more than 95% treeblocks will fall in their predetermined transform depth level range. The reason is that the dominant treeblocks with TU size 8×8 and 4×4 will always be divided into type
G
_{2}
or
G
_{3}
treeblock according to prediction function (2). For type
G
_{2}
or
G
_{3}
treeblock, its RQT coding is the same as the default RQT coding in HEVC except that type
G
_{2}
treeblock skips checking transform depth level 3 in the RQT coding, which ensures the high accuracy of prediction function (2). It should be noted that the accuracy of prediction function (2) is higher as the resolution of video sequences increases, because there exist much more homogeneous regions, thus much more spatial correlation to be used for prediction in the video sequence with a higher resolution. An accuracy of 97% can even be achieved using the correlation weight set {0.4, 0.4, 0.2}. Note that the difference of prediction accuracy using different correlation weight sets is small, for which an explanation is given in the columns 5~7 of
Table 2
, where the prediction accuracy of the transform depth level for each type of treeblock is presented. The prediction accuracy of type
G
_{2}
and
G
_{3}
treeblocks is nearly 100% using the correlation weight set {0.4, 0.4, 0.2}, and it holds for other correlation weight sets such as {0.35, 0.35, 0.3} and {0.45, 0.45, 0.1}. As a consequence, the correlation weight set {0.4, 0.4, 0.2} is chosen for the AMTD algorithm. From
Table 2
we can anticipate that by the AMTD algorithm, many TU size decisions can be skipped without sacrificing too much coding efficiency. The anticipation here will be further verified in our following experiments.
Statistical analysis of transform depth level distribution accuracy (%)
Statistical analysis of transform depth level distribution accuracy (%)
 3.2 Full Check Skipping and Early Termination (FCSET) Algorithm
As previously mentioned, the original RQT coding in HEVC is conducted to determine the optimal TU size in a bruteforce fashion for current CU. At each transform depth level (i.e., TU size), full check process is performed first to obtain the RD cost, i.e., RDcost
_{full}
, for encoding the nonsplitting TU. Then RDcost
_{separate}
is calculated, which is the total RD cost needed for encoding four smaller TUs resulted from the TU splitting. When RDcost
_{full}
< RDcost
_{separate}
, the size of nonsplitting TU is chosen as the optimal TU size for current CU. Otherwise, the size of smaller TU is chosen as the optimal TU size for current CU. Hence, high computational complexity is introduced by the original RQT coding
[5]
.
In our previous work
[21]
, an early determination algorithm for the TU size decision was proposed based on the experimental observations. When current CU was split into four quadrants, the RQT coding of upperleft subCU was performed first, followed by the individual RQT coding of remaining three subCUs with predetermining their smallest TU size as that of upperleft subCU. Once the TU size of current subCU exceeds the predetermined smallest TU size, the RQT coding is early terminated and no more RD cost evaluations are further conducted.
Due to the strong correlations between four subCUs generated by one CU quadtree partitioning, the optimal transform depth level of current subCU is the same or very close to that of its spatially adjacent subCUs. By taking full advantage of these transform depth level correlations, we propose a full check skipping  early termination (FCSET) algorithm, which combines our previous early termination algorithm with a full check skipping algorithm.
The flowchart of FCSET algorithm is depicted in
Fig. 3
. Firstly, for each treeblock (i.e., CU depth level equals to 0), perform CU quadtree partitioning at each CU depth level (i.e., 0~3) as the original HEVC encoder indicated. Secondly, perform separate operations for four subCUs generated from one CU quadtree partitioning as follows. For the first subCU (i.e., upperleft subCU), its minimum and maximum transform depth levels are derived and stored after completing the RQT coding. For the remaining three subCUs including the upper subCU and left subCU and rightbottom subCU, set their minimum and maximum transform depth levels as those of the first subCU prior to performing their respective RQT coding. Finally, implement intra/inter prediction and RQT coding for the remaining three subCUs. When the transform depth level of current subCU is smaller than the specified minimum transform depth level, the full check process is automatically skipped to perform the RQT coding at the next transform depth level. On the other hand, once the transform depth level of current subCU reaches the specified maximum transform depth level, the RQT coding is early terminated and no more RD cost evaluations will be conducted.
Flowchart of the FCSET algorithm. D: CU depth level; TDMin: minimum transform depth level; TDMax: maximum transform depth level.
The FCSET algorithm is efficient for the computational complexity reduction because there exist a mass of CU quadtree partitionings in the encoding process of HEVC. The more CUs are quadtree partitioned, the more times the proposed FCSET algorithm will be performed, thus reducing more encoding time for HEVC encoders. The rationality of FCSET algorithm is verified using the same video sequences and test conditions mentioned in Section 3.1. The experimental results in
Table 3
show that more than 75% subCUs in each CU quadtree partitioning have a same transform depth level range. So it is reasonable to skip the full check process and early terminate the RQT coding for the last three subCUs in each CU quadtree partitioning. To keep consistent with the original HEVC encoder, the probability that the treeblocks, i.e., CU with size 64×64, have a same transform depth level range is also shown in
Table 3
.
The probability that subCUs in one CU quadtree partitionings have a same transform depth level range (%)
The probability that subCUs in one CU quadtree partitionings have a same transform depth level range (%)
 3.3 Overall Algorithm
Finally, the adaptive maximum transform depth determination (AMTD) algorithm and full check skipping  early termination (FCSET) algorithm are joined together for fast TU size decision for the RQT coding in HEVC. The pseudocode of the proposed overall algorithm is described in
Table 4
, where
GMaxTD
represents the global maximum transform depth level for current treeblock. As can be seen from
Table 4
, the AMTD algorithm is applied on the global level, i.e., the treeblock level, while the FCSET algorithm is used for the local level, i.e., the CU level. Hence, the proposed overall algorithm is exploited to achieve large encoding time reduction on both the treeblock level and CU level.
Pseudocode of the overall algorithm for fast TU size decision
Pseudocode of the overall algorithm for fast TU size decision
4. Experimental Results and Analysis
In order to evaluate the performance of the proposed TU size decision method for the RQT coding in HEVC, the AMTD algorithm, the FCSET algorithm, and the overall algorithm are individually implemented on the HEVC test model reference software, HM 13.0
[24]
. Experimental results are shown in the following section. Various video sequences are tested under the common test conditions defined in
[23]
: Two coding configurations are tested for all video sequences, i.e., RA and LB, which corresponds to a broadcast scenario with a maximum GOP (Group of Picture) size of 8 and to a lowdelay scenario with no picture reordering, respectively. The treeblock consists of 64×64 luma samples together with two corresponding blocks of chroma samples. CU size can vary from 64×64 down to 8×8 and TU supports transform block size from 4×4 to 32×32. Five classes of video sequences with different resolutions are tested: class A (4K×2K), class B (1080p), class C (WVGA), class D (QWVGA) and class E (720p). To show how different bitrates impact on the TU size decision, each sequence is encoded with four QPs 22, 27, 32 and 37. The coding efficiency is evaluated with peak signal noise ratio (PSNR) and bitrate, and computational complexity is measured with the consumed encoding time. BDPSNR (dB) and BDBR (%)
[25]
are used to represent the averaged PSNR and bitrate differences, and positive and negative values represent increments and decrements, respectively. The time savings are calculated as
where
T_{proposed}
denotes the overall encoding time of the proposed fast algorithm, and
T_{anchor}
denotes the total encoding time of the anchor HM 13.0.
Table 5
and
Table 6
show individual evaluation results of the AMTD algorithm and the FCSET algorithm under RA and LB configuration, respectively. As shown in
Table 5
and
Table 6
, the AMTD algorithm can achieve 5.6% and 4.9% encoding time reduction for all tested sequences under RA and LB configuration, respectively. And the coding efficiency loss introduced by the AMTD algorithm is very negligible, i.e., 0.003dB0.004dB PSNR drop and 0.08%0.12% bitrate increase. Therefore the AMTD algorithm can skip some unnecessary transform depth levels on the treeblock level to reduce the computational complexity requested by the RQT coding.
Individual evaluation results of AMTD, FCSET and[21]under RA configuration
Individual evaluation results of AMTD, FCSET and [21] under RA configuration
Individual evaluation results of AMTD, FCSET and[21]under LB configuration
Individual evaluation results of AMTD, FCSET and [21] under LB configuration
As far as the FCSET algorithm is concerned, 17.2% and 14.0% encoding time has been reduced under RA and LB configuration, respectively, with a maximum of 18.9% in “BlowingBubbles” (416×240, RA) and a minimum of 11.6% in “Johnny” (1280×720, LB). In addition, the averaged PSNR drops are 0.031dB and 0.027dB, and the averaged bitrate increase are 0.87% and 0.78% for all tested sequences under RA and LB configurations, respectively. Few PSNR drop and bitrate increase indicate the coding efficiency loss by the FCSET algorithm is negligible. It is obvious that the FCSET algorithm can reduce more encoding time than the AMTD algorithm. The reason is that the FCSET algorithm is applied on the basis of CU, while the AMTD algorithm is based on treeblock. The number of CU is much more than that of treeblock as we can see in
Fig. 1
. Therefore, the FCSET algorithm is used more frequently to save more encoding time.
The method in
[21]
is implemented on the HM 13.0 for fair comparison and the experimental results are also presented in
Table 5
and
Table 6
. It can be seen that with the method
[21]
, 11.8%~14.7% encoding time can be reduced with 0.74~0.79 bitrate increase and 0.023~0.025 PSNR drop. Therefore, the FCSET algorithm outperforms the method
[21]
by achieving 2.5% more encoding time reduction without significant coding efficiency loss.
The experimental results of the comparison of the proposed overall algorithm presented in Section 3.3 and a fast RQT algorithm for HEVC encoders (adaptive RQT mode selection algorithm, RQTAMS
[20]
) under RA and LB configuration are shown in
Table 7
and
Table 8
, respectively, where both two algorithms are implemented on HM 13.0. From
Table 7
and
Table 8
we can see that, the proposed overall algorithm can reduce on average 21.7% and 20.8% encoding time under RA and LB configuration, respectively, and outperform the RQTAMS algorithm in terms of the computational complexity with almost the same coding efficiency. The reason is that compared to the RQTAMS algorithm which is based on CU level to speed up the RQT coding, the proposed overall algorithm can be applied on both treeblock level and CU level so as to reduce a large number of encoding time. Also shown is a consistent gain in encoding time for all tested sequences with a lowest gain of 19.4% for “BasketballDrill” (832×480, LB) and a highest gain of 23.4% for“BlowingBubbles” (416×240, RA). Besides, the coding efficiency loss introduced by the proposed overall algorithm is negligible, the averaged PSNR loss is 0.0410.042 dB and the averaged bitrate increase is 1.181.24% for all tested sequences.
Results of the proposed overall algorithm compared with the RQTAMS algorithm[20]under RA configuration
Results of the proposed overall algorithm compared with the RQTAMS algorithm [20] under RA configuration
Results of the proposed overall algorithm compared with the RQTAMS algorithm[20]under LB configuration
Results of the proposed overall algorithm compared with the RQTAMS algorithm [20] under LB configuration
In
Fig. 4
(a), ratedistortion curves of the algorithms including: the AMTD algorithm, the FCSET algorithm, the proposed overall algorithm and the original RQT coding, are depicted for the “PeopleOnStreet” (2560×1600), in which the PSNR is plotted as a function of the average bitrate. All algorithms are implemented on HM 13.0. As shown in
Fig. 4
(a), all the ratedistortion curves are clustered together, which means the proposed algorithms, i.e., the AMTD algorithm, the FCSET algorithm and the proposed overall algorithm are similar to the original RQT coding in HM 13.0 encoder in terms of the coding efficiency. The bitrate savings of the proposed algorithms relative to the original RQT coding in HM 13.0 encoder under different QPs are plotted in
Fig. 4
(b). Conclusions can be drawn from
Fig. 4
(b) that the proposed algorithms can achieve a consistent encoding time saving over a large bitrate range with almost negligible loss in coding efficiency compared with the original RQT coding in HM 13.0 encoder.
Experimental results of “PeopleOnStreet” (2560×1600) under different QPs (22, 27, 32, 37). (a) RD curves of “PeopleOnStreet”. (b) Time saving curves of “PeopleOnStreet”.
In addition, the subjective quality comparisons are conducted to further validate the effectiveness of the proposed overall algorithm. Each test sequence is encoded at QP = 22, 27, 32, and 37 using both HM 13.0 and the proposed overall algorithm.
Fig. 5
and
Fig. 6
shows the subjective quality comparisons of the 126
^{th}
frame of “PeopleOnStreet” (2560×1600) and the 204
^{th}
frame of “PartyScene” (832×480), respectively. Specifically, “PeopleOnStreet” is encoded under RA configuration, while “PartyScene” is encoded under LB configuration. The averaged PSNR is calculated as the weighted sum of the PSNR per picture of the individual components (PSNRY, PSNRU, and PSNRV)
[3]
Subjective quality comparisons of the 126^{th} frame of “PeopleOnStreet” (2560×1600) under different QPs in RA case. The encoded pictures using HM 13.0 and the proposed overall algorithm are presented in the left and right columns, respectively.
Subjective quality comparisons of the 204^{th} frame of “PartyScene” (832×480) under different QPs in LB case. The encoded pictures using HM 13.0 and the proposed overall algorithm are presented in the left and right columns, respectively.
As shown in
Fig. 5
and
Fig. 6
, compared to HM 13.0, the subjective quality of the sequences (e.g., “PeopleOnStreet” and “PartyScene”) encoded using the proposed overall algorithm degrades little under different QPs (the largest PSNR loss is 0.2 dB). From
Fig. 5
and
Fig. 6
we can also conclude that, the proposed overall algorithm is effective for different video resolutions and coding configurations in terms of subjective quality.
5. Conclusion
In this paper, we proposes an effective TU size decision method for the RQT coding in HEVC, which incorporates an adaptive maximum transform depth determination (AMTD) algorithm and a full check skipping  early termination (FCSET) algorithm, to reduce the computational complexity requested by the RQT coding. The recent HEVC test model reference software, HM 13.0, is applied to evaluate the proposed overall algorithm. The experimental results show that the proposed overall algorithm can significantly reduce the computational complexity requested by the RQT coding while maintaining almost the same RD performance as HM 13.0 encoder. Meanwhile, the proposed overall algorithm outperforms a stateoftheart fast TU size decision method, i.e., RQTAMS algorithm. The experimental results indicate that our proposed overall algorithm should be considered in the design of a fast HEVC encoder.
For future work, we plan to apply the proposed algorithm to a practical video encoder. A practical video encoder needs low computational complexity, acceptable visual quality, and good channel assignment, rate allocation strategy. Note that there have been some recent developments for mobile network video coding
[26
,
27]
, which can be useful for designing a practical video coder.
BIO
Jinfu Wu received his B.S. degrees in control technology and instrument from Xidian University, Xi’an, P.R. China in 2011. He is currently a Ph.D. student with the School of Aerospace Science and Technology, Xidian University. His research interests include video coding and image processing.
Baolong Guo received his B.S., M.S. and Ph.D. degrees from Xidian University in 1984, 1988 and 1995, respectively, all in communication and electronic system. From 1998 to 1999, he was a visiting scientist at Doshisha University, Japan. He is currently the director of the Institute of Intelligent Control & Image Engineering (ICIE) at Xidian University. His research interests include neural networks, pattern recognition, intelligent information processing, and image communication.
Yunyi Yan received his B.S. degree in automation, M.S. degree in pattern recognizaion andintelligent system, and Ph.D degree in electronic science and technology from Xidian University in 2001, 2004 and 2008 respectively in China. Since 2004, he has been with ICIE Institute of Areospace Science and Technology School of Xidian University, where he is currently an associated professor. Since 2013, he served as the director of Intelligent Detection Department in Xidian University. He worked as a visiting scholar in Montreal Neurological Institute of McGill University,Canada from 2010 to 2011. His research interests include system reliablity analysis, medical image processing, and swarm intelligence.
Jie Hou obtained B.S. degree in Automation from Xidian University, P.R.China in 2011. Currently, He’s pursuing Ph.D. degree in Measuring Technology and Instruments at Xidian University. His research mainly focuses on video processing and object recognition.
Dan Zhao received his B.S. degrees in control technology and instrument from Xidian University, Xi’an, P.R. China in 2012. He is currently a Ph.D. student with the school of Aerospace Science and Technology, Xidian University. His research interests include video processing and object tracking based on online learning.
Sullivan G. J.
,
Ohm J.
,
Han W.J.
,
Wiegand T.
2012
“Overview of the high efficiency video coding (HEVC) standard,”
IEEE Transactions on Circuits and Systems for Video Technology
22
(12)
1649 
1668
DOI : 10.1109/TCSVT.2012.2221191
Ohm J.
,
Sullivan G. J.
2013
“High efficiency video coding: The next frontier in video compression [standards in a nutshell],”
IEEE Signal Processing Magazine
30
(1)
152 
158
DOI : 10.1109/MSP.2012.2219672
Ohm J.
,
Sullivan G. J.
,
Schwarz H.
,
Tan T. K.
,
Wiegand T.
2012
“Comparison of the coding efficiency of video coding standards—including high efficiency video coding (HEVC),”
IEEE Transactions on Circuits and Systems for Video Technology
22
(12)
1669 
1684\
DOI : 10.1109/TCSVT.2012.2221192
Han W.J.
,
Min J.
,
Kim I.K.
,
Alshina E.
,
Alshin A.
,
Lee T.
2010
“Improved video compression efficiency through flexible unit representation and corresponding extension of coding tools,”
IEEE Transactions on Circuits and Systems for Video Technology
20
(12)
1709 
1720
DOI : 10.1109/TCSVT.2010.2092612
Nguyen T.
,
Helle P.
,
Winken M.
,
Bross B.
,
Marpe D.
,
Schwarz H.
2013
“Transform Coding Techniques in HEVC,”
IEEE Journal of Selected Topics in Signal Processing
7
(6)
978 
989
DOI : 10.1109/JSTSP.2013.2278071
Bossen F.
,
Bross B.
,
Suhring K.
,
Flynn D.
2012
“HEVC complexity and implementation analysis,”
IEEE Transactions on Circuits and Systems for Video Technology
22
(12)
1685 
1696
DOI : 10.1109/TCSVT.2012.2221255
Budagavi M.
,
Fuldseth A.
,
Bjontegaard G.
,
Sze V.
,
Sadafale M.
2013
“Core Transform Design for the High Efficiency Video Coding (HEVC) Standard,”
IEEE Journal of Selected Topics in Signal Processing
7
(6)
1029 
1041
DOI : 10.1109/JSTSP.2013.2270429
“High Efficiency Video Coding (HEVC) Test Model 13 (HM13) Encoder Description,”
Joint Collaborative Team on Video Coding (JCTVC) of ITUT SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Document: JCTVCO1002, 15th Meeting
2013
Choi K.
,
Jang E. S.
2012
“Fast coding unit decision method based on coding tree pruning for high efficiency video coding,”
Optical Engineering
51
0305021 
0305023
Shen X.
,
Yu L.
,
Chen J.
“Fast coding unit size selection for HEVC based on Bayesian decision rule,”
in Proc. of Picture Coding Symposium (PCS)
2012
453 
456
Jiang W.
,
Ma H.
,
Chen Y.
“Gradient based fast mode decision algorithm for intra prediction in HEVC,”
in Proc. of Consumer Electronics, Communications and Networks (CECNet), 2012 2nd International Conference on
2012
1836 
1840
Shen L.
,
Liu Z.
,
Zhang X.
,
Zhao W.
,
Zhang Z.
2013
“An effective CU size decision method for HEVC encoders,”
IEEE Transactions on Multimedia
15
(2)
465 
470
DOI : 10.1109/TMM.2012.2231060
Shen L.
,
Zhang Z.
,
An P.
2013
“Fast CU size decision and mode decision algorithm for HEVC intra coding,”
IEEE Transactions on Consumer Electronics
59
(1)
207 
213
DOI : 10.1109/TCE.2013.6490261
Zhao L.
,
Zhang L.
,
Ma S.
,
Zhao D.
“Fast mode decision algorithm for intra prediction in HEVC,”
in Visual Communications and Image Processing (VCIP), 2011 IEEE
2011
1 
4
Zhang H.
,
Ma Z.
2012
“Fast intra prediction for high efficiency video coding,” inAdvances in Multimedia Information Processing–PCM 2012, ed
Springer
568 
577
Zhang H.
,
Ma Z.
2014
“Fast Intra Mode Decision for HighEfficiency Video Coding (HEVC),”
IEEE Transactions on Circuits and Systems for Video Technology
24
(4)
660 
668
DOI : 10.1109/TCSVT.2013.2290578
Tan Y. H.
,
Yeo C.
,
Tan H. L.
,
Li Z.
“On residual quadtree coding in HEVC,”
in Proc. of Multimedia Signal Processing (MMSP), 2011 IEEE 13th International Workshop on
2011
1 
4
Teng S.W.
,
Hang H.M.
,
Chen Y.F.
“Fast mode decision algorithm for residual quadtree coding in HEVC,”
in Visual Communications and Image Processing (VCIP), 2011 IEEE
2011
1 
4
Choi K.
,
Jang E. S.
2012
“Early TU decision method for fast video encoding in high efficiency video coding,”
Electronics letters
48
689 
691
DOI : 10.1049/el.2012.0277
Zhang Y.
,
Zhao M.
“An adaptive RQT mode selection algorithm for HEVC,”
in Proc. of Image and Signal Processing (CISP), 2012 5th International Congress on
2012
173 
177
Wu J. F.
,
Guo B. L.
,
Jiang J.
2014
“Fast TU Size Decision Algorithm for HEVC,”
ICIC Express Letters
8
2327 
2333
Lee B.
,
Kim M.
2011
“Modeling rates and distortions based on a mixture of Laplacian distributions for interpredicted residues in quadtree coding of HEVC,”
IEEE Signal Processing Letters
18
(10)
571 
574
DOI : 10.1109/LSP.2011.2163935
Bossen F.
“Common test conditions and software reference configurations,”
Joint Collaborative Team on Video Coding (JCTVC) of ITUT SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Document: JCTVCL1100
Geneva, CH
2013
JCTVC
(2014)
Subversion repository for the HEVC test model reference software.
https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/
Bjontegaard G.
“Calculation of average PSNR difference between RDcurves,”
VCEGM33
Austin, TX, USA
2001
Zhou L.
,
Wang X. B.
,
Muntean W. T, G.
,
Geller B.
2010
“Distributed Scheduling Scheme for Video Streaming over MultiChannel MultiRadio MultiHop Wireless Networks,”
IEEE Journal on Selected Areas in Communications
28
(3)
409 
419
DOI : 10.1109/JSAC.2010.100412
Zhou L.
,
Hu Q. Y.
,
Qian Y.
,
Chen H. H.
2013
“EnergySpectrum Efficiency Tradeoff for Video Streaming over Mobile Ad Hoc Networks,”
IEEE Journal on Selected Areas in Communications
31
(5)
981 
991
DOI : 10.1109/JSAC.2013.130516