The High Efficiency Video Coding (HEVC) is a new video coding standard that can provide much better compression efficiency than its predecessor H.264/AVC. However, it is computationally more intensive due to the use of flexible quadtree coding unit structure and more choices of prediction modes. In this paper, a fast intraframe coding scheme is proposed for HEVC. Firstly, a fast bottomup pruning algorithm is designed to skip the mode decision process or reduce the candidate modes at larger block size coding unit. Then, a low complexity rough mode decision process is adopted to choose a small candidate set, followed by early DC and Planar mode decision and mode filtering to further reduce the number of candidate modes. The proposed method is evaluated by the HEVC reference software HM8.2. Averaging over 5 classes of HEVC test sequences, 41.39% encoding time saving is achieved with only 0.77% bitrate increase.
1. Introduction
T
he High Efficiency Video Coding (HEVC) is a new video coding standard developed by the Joint Collaborative Team on Video Coding (JCTVC) formed by the ITUT Video Coding Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG). In comparison with its predecessor H.264/AVC, 50% bitrate reduction is achieved for equal perceptual quality
[1]
. While the same blockbased hybrid coding structure as in H.264/AVC is adopted in HEVC, each coding tool is improved with new technical features and characteristics. As opposed to the fixed size macroblock (MB) in H.264/AVC, the coding tree unit (CTU) in HEVC is configurable and its size can be chosen from 64x64, 32x32 and 16x16. A CTU consists of one coding unit (CU) or can be recursively split into a quadtree of CUs. The largest size of a CU is equal to that of a CTU, and the smallest is 8x8. An example of CU splitting and its corresponding quadtree is shown in
Fig. 1
. The dotted lines with arrows represent the processing order of the CUs in the CTU, the notation “1” in the quadtree indicates further splitting and “0” indicates leaf node. A CU can be further split into multiple prediction units (PU) for inter or intra prediction. In intra coding, the PU partition type can be either PART_2Nx2N or PART_NxN, as shown in
Fig. 2
. However, PART_NxN is only allowed for the minimum CU size, otherwise it is similar as four equalsize subCUs.
Example of a 64x64 CTU and its CU partition and processing order. (a) CU partitions (b) quadtree structure
PU types for intra prediction
During the transform and quantization process, a CU can also be recursively split into a quadtree of transform unit (TU). The intra prediction is applied for each TU sequentially instead of applying the intra prediction at the PU level
[2]
. In this way, the nearest neighboring reference samples from the already reconstructed TU are used for prediction. The HEVC defines 35 intra prediction modes, namley
Intra_DC, Intra_Planar
and 33 angular prediction modes denoded as
Intra_Angular[k]
, k = 2,…,34. In comparison with H.264/AVC, the intra prediction modes defined in HEVC are able to capture more directional structures.
In the HEVC test mode (HM), a ratedistortion (RD) optimized bottomup pruning algorithm
[3]
is adopted to find the best CU partition of a CTU. Given a CU, the best PU partition type, prediction mode and TU partition are found by minimizing the Lagrangian cost. While the flexible block structure and more choices of prediction mode provide better coding performance, it dramatically increases the encoder complexity. Even for intraframe coding, it is still far away from realtime application
[4]
. The computational complexity mainly comes from two folds. First, an optimal CU partition within a CTU can consist of various sizes of CU ranging from 8x8 to 64x64. Second, finding an optimal prediction mode requires a RD optimization process. Therefore, fast algorithms in both aspects are desirable. Research works regarding fast CU size decision include
[5

8]
; works targeting fast intra mode decision include
[9

15]
. In our previous work
[16]
, a novel fast bottomup pruning (FBUP) algorithm was proposed and proved to be efficient. In this paper, three techniques are proposed for fast intra mode decision and combined with the FBUP algorithm. Starting from the smallest CU size, a full quadtree is processed in a bottomup manner. First, we check whether mode decision process is needed for a current CU node. If the mode decision process is performed, then a set of intra candidate modes are derived from the coding information obtained from its subblocks (subCUs or PART_NXN PUs), followed by a rough decision process to choose a smaller set of candidates. The number of candidates in the set is adaptively chosen based on the low complexity RD cost. In this way, we can either skip the mode decision process or reduce the number of candidate modes. Thus, the computational complexity is reduced.
In Section 2, the previous works in the literature are reviewed. Then the proposed method is described in Section 3. In Section 4, the experimental results are presented to prove the efficiency of the proposed algorithm. Compared to the original HM encoder, averaging over all the test sequences and rate points, 41.39% encoding time saving is achieved while the bitrate increase is only 0.77%. Finally, the conclusion is given in Section 5.
2. Related Works
2.1 RateDistortion Optimized Bottomup Pruning
In HEVC, the CTU size is 64x64 and the maximum number of depth levels of the quadtree is
D
=4. The HM encoder adopts the ratedistortion optimized bottomup pruning algorithm to obtain the optimal CU quadtree partition. Consider the full quadtree as shown in
Fig. 3
, we locate a node by its depth level from top to bottom and its position from left to right. Let (
i
,
j
) be the index of
j
th node at depth level
i
, then
j
<4
^{i}
. Denote X
_{i,j}
, as a CU at node (
i
,
j
) without further splitting and C
_{i,j}
as the optimal tree after ratedistortion optimized pruning. Let
J
(·) be the operation to calculate the best RD cost. The bottomup pruning algorithm traverses the full quadtree in a depthfirst order. The subCUs of node (
i
,
j
) is pruned if and only if
.
Full quadtree
 2.2 Fast Intra Mode Decision Algorithm in HM
Full RD optimized search in the 35 intra prediction modes is computational complex. To reduce the encoder complexity, the HM encoder adopts a fast intra mode decision algorithm, which consists of two steps described as follows.

1. Rough mode decision: choose theNbest candidate modes by minimizingJSATD=DSATD+λRmode.

2. Given the N candidate modes in step 1 and themost probable modes(MPM) derived from neighboring PUs, do RD optimized mode decision. The best intra mode is found by minimizingJfullRD=Drec+λRtotal.
D_{SATD}
is the Sum of Absolute Transformed Difference (SATD) which calculates the sum of abolute value of hadamard transformed coefficients of the residual signal,
R
_{mode}
is the number of bits to encode mode information,
D_{rec}
is the distortion of the reconstructed block,
R_{total}
is the total number of bits to encode current block and
λ
is the Lagrangian multiplier. Calculation of
J_{SATD}
doesn't require the DCT transform, quantization and entropy coding process, thus it’s much simpler than calculation of the full RD cost
J_{fullRD}
. Therefore, the computational complexity can be reduced if
N
is set to a smaller value. In the HM encoder
N
is set to 3, 3, 3, 8 and 8 for the PU sizes of 64×64 , 32×32 , 16×16 , 8×8 and 4×4 respectively.
3. Proposed Methods
 3.1 Fast bottomup Pruning Algorithm
 3.1.1 Analysis of Bottomup Pruning Algorithm
The bottomup pruning algorithm performs mode decision at each node of the full quadtree. Therefore, a total of
mode decisions for each CTU. However, given a CU at node (
i
,
j
), the best partition and prediction modes of its subCUs
C
_{i+1,4j+k}
,
k
=0,1,2,3 are known. Thus, we can reuse the coding information in subCUs to check whether mode decision at current node is necessary.
Let
Q
(·)=0 indicates a leaf node and
Q
(·)=1 denotes a tree. Then
Q
(
C
_{i+1,4j+k}
)=1 suggests that the strutural information in the
k
th subCU is complex. Hence, the current node is likely to be split. In
Table 1
, it shows the probability
p
_{0}
of a node is split when
, i.e.
. The results are obtained by coding the first second (50 frames) of HEVC test sequence
BasketballDrill
. Similar statistics are found in coding the other sequences. Note that at level 3, it reaches the leaf node of the quadtree, thus the CU doesn’t have any subCUs. However, HEVC allows PART_NxN type PU at the bottom level, in which the CU can also be considered as a tree.
The probability
 3.1.2 Fast bottomup Pruning Algorithm
Based on the above observations, we can conclude that there is a high probability that a CU remains split if its subCUs are subtrees. In other words, lots of mode decision process in a large CU is unnecessary. Therefore, they could be avoided to decrease the computational complexity. In this paper,
is set as the necessary condition to skip the intra mode decision process at node (
i
,
j
). An example is shown in
Fig. 4
. If
C
_{1,1}
is a tree, then the subCUs of node
C
_{0,0}
are not pruned. Implementation details of the proposed fast bottomup pruning algorithmis is described by the pseudocode in
Algorithm 1
.
Example of a pruned quadtree.
Fast bottomup pruning algorithm
Fast bottomup pruning algorithm
 3.1.2 Analysis of the Performance of Fast Bottomup Pruning Algorithm
To analyse the performance of the proposed FBUP algorithm, we can evaluate how often a node (
i
,
j
) is skipped by FBUP while its RD cost
J
(
X
_{i,j}
) is actually less than
J
(
C
_{i,j}
), denoted as
, and use
as a messure of loss in RD performance. An example is shown in
Table 2
, in which the results are obtained by encoding the HEVC test sequence
BasketballDrill
. We can see that generally the values of
p
_{1}
and ∇
J
are small. Similar results are observed when coding other HEVC test sequences. It indicates that the probability of the proposed FBUP algorithm fails to predict the result of RD optimized mode decision is quite small, and the loss in RD performance is also trivial.
Analysis of the performance of proposed fast bottomup pruning algorithm
Analysis of the performance of proposed fast bottomup pruning algorithm
 3.2 Proposed Fast Intra Mode Decision Algorithm
In this paper, three techniques are proposed for fast intra mode decision, namely,
bottomup prediction
,
adaptive intra mode filtering
, and
DC and Planar early decision
.
 3.2.1 Bottomup Prediction
In the proposed FBUP algorithm, mode decision at a CU node is selectively skipped based on the coding information of its subCUs. Similarly, we can also predict the intra mode of current CU node if it’s not skipped. An example is shown in
Fig. 5
, in which the red arrows indicate the intra prediction directions. The intra prediction mode of the block is the same as the prediction modes of its two subblocks.
Example of bottomup prediction.
The straightforward way of bottomup prediction is to use the chosen best modes of subCUs, instead of all 35 modes, as the candidate modes for the parent CU. Though the number of candidate modes can be largely reduced by such method, however, we find that the coding efficiency is decreased significantly. Therefore, a more robust scheme is required. In this paper, the rough mode decision results of subCUs are employed to select the candidate modes for parent CU. Let Ω
_{i,j}
be the set of candidate modes of a CU at node (
i
,
j
) and Փ
_{i+1,4j+k}
be the set of
N
modes chosen in the rough mode decision process of its
k
th subCU, then
. By using the
N
modes that are chosen by rough mode decision instead of only one mode, prediction from subCUs becomes more robust. On the other hand,
N
is usually a small number, so it can achieve a good tradeoff between coding efficiency and encoder complexity.
 3.2.2 Adaptive Intra Mode Filtering
In the fast intra mode decision algorithm in HM, the number
N
is set to a fixed value. In this paper, an adaptive method is proposed based on the
J_{SATD}
costs in the rough mode decision process.

1. For each mode in candidate set Ω, calculate its costJSATD, sort the candidate set in the ascending order ofJSATD.

2. LetCandi,i=0,1,…,7 be the first 8 modes after sorting, calculate the mean value of their cost, denoted as.

3. The new candidate set is obtained as Ω'={CandiJSATD(Candi)<μ&i
 3.2.3 DC and Planar Early Decision
The intra mode in HEVC can be categorized into two classes: one is DC and Planar modes, the other is directional intra prediction mode. The DC and Planar modes are designed for smooth area, and the directional modes are designed for coding blocks that have directional edges. It’s assumed that the rough mode decision process can effectively distinguish these two classes. So, in this paper, if the best mode in rough mode decision process is Planar or DC mode, then the directional modes are removed from the candidate set.
 3.2.4 The Overall Algorithm
The three techniques described in the previous subsections can be combined. Similar to the original intra mode decision algorithm in HM, the most probable modes (MPM) derived from neighboring coded blocks are also included in the final candidate set for RD optimized mode decision. The overall algorithm is as follows:

1. For a CU at node (i,j), the set of candidate intra modes is

2. CalculateJSATDfor each candidate mode in Ωi,j, sort the candidate modes in the decending order ofJSATDand denoted asCandi,i=0,1,⋯,M1

3. IfCand0equals to 0 or 1, i.e. Planar or DC mode, thenm=1+(Cand1<2) and go to step 6, elsem=8

4. Calculate

5. Fori=7,⋯,0 , ifJSATD(Candi)<μthenm=m1

6. Փi,j={Candi,i=0,⋯,m1}∪{MPM}, search the best mode in Փi,jby RD optimized mode decision as described in section 2.2.
4. Experimental Results
The proposed algorithm was evaluated in the HEVC reference software HM8.2
[17]
. The main profile intraonly encoder configuration in the common test condition
[18]
is used to code the HEVC test sequences in classes A to E. The maximum CU size (CTU size) is 64x64, the maximum number of CU depth levels is 4, and maximum number of TU levels is 3. Ratedistortion optimized quantization, sample adaptive offset, transform skipping, and fast transform skipping are on. Four quantization parameters (QP) {22, 27, 32, and 37} are used to encode each sequence.
To demonstrate the encoding time saving (ETS) of the proposed method, an isolated PC with Intel Core 2 3.0GHz CPU and 2.0GB RAM was used to encode the first second of each sequence. Compared with the original fast intra mode decision algorithm in HM8.2 encoder, the ETS, in percentages, and the corresponding BDrate [19] for the proposed fast intra mode decision algorithm are shown in
Table 3
. Averaging over all test sequences, the ETS for different QP settings are around 25% and the Y BDrate is only 0.40%. It’s also observed that the ETS is consistent for different test sequences and the maximum Y BDrate is 0.70%. Thus, we can conclude that the proposed fast intra mode decision algorithm can efficiently reduce the encoder complexity without significant loss in ratedistortion performance. When the proposed fast intra mode decision algorithm is combined with the FBUP algorithm, the results are shown in
Table 4
. It shows that the combined algorithm can further reduce the encoder complexity. Averaging over all test sequences, the ETS for different QP settings are around 40% and the Y BDrate is 0.77%. The maximum average bitrate increase is only 1.05%. Thus, the overall algorithm also does not compromise the RD performance.
Encoding time saving (%) and corresponding Y BDrate (%) of the proposed fast intra mode decision algorithm.
Encoding time saving (%) and corresponding Y BDrate (%) of the proposed fast intra mode decision algorithm.
Encoding time saving (%) and corresponding Y BDrate (%) of the proposed fast intra mode decision algorithm combined with the FBUP algorithm.
Encoding time saving (%) and corresponding Y BDrate (%) of the proposed fast intra mode decision algorithm combined with the FBUP algorithm.
5. Conclusion
In this paper, a fast intra mode decision algorithm is proposed for HEVC intraframe coding. The proposed algorithm consists of three techniques to reduce the number of candidate modes for RD optimized mode decision. Compared to the original fast intra mode decision algorithm in HM encoder, the proposed algorithm can achieve about 25% encoding time saving with only 0.40% bitrate increase. Also, the proposed fast intra mode decision algorithm can be well combined with the proposed fast bottomup pruning algorithm. The encoder complexity is further reduced, about 41% encoding time savings are achieved, and the average bitrate increase is only 0.77%.
BIO
Han Huang received the B.S. degree in computer science and Ph.D. degree in signal and information processing from Beijing Jiaotong University, Beijing, China, in 2007 and 2014, respectively. He is now with MediaTek (Beijing) Inc. From 2009 to 2011, he was a visiting student in the Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA. His current research interests include video compression and processing.
Yao Zhao received the B.S. degree from Fuzhou University, Fuzhou, China, in 1989, and the M.E. degree from Southeast University, Nanjing, China, in 1992, both in radio engineering, and the Ph.D. degree from the Institute of Information Science, Beijing Jiaotong University (BJTU), Beijing, China, in 1996. In 1998, he was an Associate Professor at BJTU, where he became a Professor in 2001. From 2001 to 2002, he was a Senior Research Fellow in the Information and Communication Theory Group, Faculty of Information Technology and Systems, Delft University of Technology, Delft, The Netherlands. He is currently the Director of the Institute of Information Science, BJTU. He is currently leading several national research projects such as 973 Program, 863 Program, and the National Science Foundation of China. His current research interests include image/video coding, digital watermarking and forensics, and video analysis and understanding. Dr. Zhao serves on the editorial boards of several international journals, including as an Associate Editor of the IEEE Signal Processing Letters, an Area Editor of Signal Processing: Image Communication (Elsevier), and as an Associate Editor of Circuits, System & Signal Processing (Springer). He was the recipient of National Science Foundation of China for Distinguished Young Scholars in 2010.
Chunyu Lin was born in LiaoNing Province, China. He works as a lecturer in Beijing Jiaotong University. He obtained his doctor dgree in Beijing Jiaotong University in 2011. From 2009 to 2010, he was a visiting researcher at the ICT group of Delft University of Technology, Netherlands. From 2011 to 2012, He was a postdoc in Gent Unveristy, Belgium. His research interests are in the areas of image/video compression and robust transmission, stereo matching and 3D video coding.
Huihui Bai received the Ph.D. degree in signal and information processing from Beijing Jiaotong University (BJTU), Beijing, China, in 2008. She is currently an Associate Professor at the Institute of Information Science, BJTU. She has been involved in research and development work in video coding technologies and standards such as high efficiency video coding, 3D video compression, multiple description video coding, and distributed video coding. She is leading or participating in several research projects such as 973 Program, 863 Program, the National Natural Science Foundation of China, Beijing Natural Science Foundation, and Jiangsu Provincial Natural Science Foundation.
Sullivan G. J.
,
Ohm J.R.
,
Han W.J.
,
Wiegand T.
,
Wiegand T.
2012
"Overview of the High Efficiency Video Coding (HEVC) standard"
IEEE Trans. Circuits Syst. Video Technol.
Article (CrossRef Link)
22
(12)
1649 
1668
DOI : 10.1109/TCSVT.2012.2221191
Lainema J.
,
Bossen F.
,
Han WooJin
,
Min Junghye
,
Ugur K.
2012
“Intra coding of the HEVC standard”
IEEE Trans. Circuits Syst. Video Technol.
Article (CrossRef Link)
22
(12)
1792 
1801
DOI : 10.1109/TCSVT.2012.2221525
Sullivan G.J.
,
Baker R.L.
1994
“Efficient quadtree coding of images and video”
IEEE Trans. Circuits Syst. Image Process.
Article (CrossRef Link)
3
(3)
327 
331
DOI : 10.1109/83.287030
Bossen F.
,
Bross B.
,
Suhring K.
,
Flynn D.
2012
“HEVC complexity and implementation analysis”
IEEE Trans. Circuits Syst. Video Technol.
Article (CrossRef Link)
22
(12)
1685 
1696
DOI : 10.1109/TCSVT.2012.2221255
Shen Xiaolin
,
Yu Lu
,
Chen Jie
2012
“Fast coding unit size selection for HEVC based on bayesian decision rule”
in Proc. of Picture Coding Symposium (PCS)
May 79
Article (CrossRef Link)
453 
456
Tan Hui Li
,
Liu Fengjiao
,
Tan Yih Han
,
Yeo Chuohao
2012
“On fast coding tree block and mode decision for HighEfficiency Video Coding (HEVC)”
Speech and Signal Process.
in Proc. of IEEE Conf. Acoust.
March 2530
Article (CrossRef Link)
825 
828
Shen L.
,
Liu Z.
,
Zhang X.
,
Zhao W.
,
Zhang Z.
2013
“An effective CU size decision method for HEVC encoders”
IEEE Trans. Multimedia
Article (CrossRef Link)
15
(2)
465 
470
DOI : 10.1109/TMM.2012.2231060
Choi Kiho
,
Jang Euee S.
2012
“Fast coding unit decision method based on coding tree pruning for high efficiency video coding”
Optical Engineering
Article (CrossRef Link)
51
(3)
030502–1 
030502–3
DOI : 10.1117/1.OE.51.3.030502
Piao Yinji
,
Min Junghye
,
Chen Jianle
2010
“Encoder improvement of unified intra prediction,” in document JCTVCC207
Zhao Liang
,
Zhang Li
,
Ma Siwei
,
Zhao Debin
2011
“Fast mode decision algorithm for intra prediction in HEVC”
in IEEE Proc. Visual Commun. Image Process.
November 69
Article (CrossRef Link)
1 
4
Silva T.L. da
,
Agostini L. V.
,
Silva Cruz L. A. da
2012
“Fast HEVC intra prediction mode decision based on edge direction information”
in Proc. of The 20th European Signal Process. Conf. (EUSIPCO)
August
Article (CrossRef Link)
1214 
1218
Zhang H.
,
Ma Z.
2013
“Early termination schemes for fast intra mode decision in High Efficiency Video Coding”
in Proc. of IEEE Int. Symp. Circuits and Syst.
May 1923
Article (CrossRef Link)
25 
48
Sun Heming
,
Zhou Dajiang
,
Goto S.
2012
“A low complexity HEVC intra prediction algorithm based on level and mode filtering”
in IEEE Proc. Multimedia and Expo
July 913
Article (CrossRef Link)
1085 
1090
Jiang Wei
,
Ma Hanjie
,
Chen Yaowu
2012
“Gradient based fast mode decision algorithm for intra prediction in HEVC”
in Proc. of IEEE Proc. Consumer Electronics, Communications and Networks (CECNet)
April 2123
Article (CrossRef Link)
1836 
1840
Zhang Yongfei
,
Li Zhe
,
Li Bo
2012
“Gradientbased fast decision for intra prediction in HEVC”
in IEEE Proc. Visual Commun. Image Process.
November 2730
Article (CrossRef Link)
1 
6
Huang Han
,
Zhao Yao
,
Lin Chunyu
,
Bai Huihui
2013
“Fast bottomup pruning for HEVC intraframe coding”
IEEE Proc. Visual Commun. Image Process.
Article (CrossRef Link)
HEVC test model reference software HM8.2 [Online].
Available:
Bossen F.
2012
“Common test conditions and software reference configurations,” in document JCTVCI1100