An Early Termination Algorithm of Prediction Unit (PU) Search for Fast HEVC Encoding
An Early Termination Algorithm of Prediction Unit (PU) Search for Fast HEVC Encoding
Journal of Broadcast Engineering. 2014. Sep, 19(5): 627-630
Copyright © 2014, The Korean Society of Broadcast Engineers
  • Received : July 25, 2014
  • Accepted : August 18, 2014
  • Published : September 30, 2014
Export by style
Cited by
About the Authors
재욱 김
동현 김
재곤 김

Abstract—The latest video coding standard, high efficiency video coding (HEVC) achieves high coding efficiency by employing a quadtree-based coding unit (CU) block partitioning structure which allows recursive splitting into four equally sized blocks. At each depth level, each CU is partitioned into variable sized blocks of prediction units (PUs). However, the determination of the best CU partition for each coding tree unit (CTU) and the best PU mode for each CU causes a dramatic increase in computational complexity. To reduce such computational complexity, we propose a fast PU decision algorithm that early terminates PU search. The proposed method skips the computation of R-D cost for certain PU modes in the current CU based on the best mode and the rate-distortion (RD) cost of the upper depth CU. Experimental results show that the proposed method reduces the computational complexity of HM12.0 to 18.1% with only 0.2% increases in BD-rate.
HEVCFast encodingCUPU
I. Introduction
The new international video coding standard of HEVC aims to substantially improve coding efficiency compared to the preceding state-of-the-art standard to reduce bit rate by half with comparable quality. Comparing to H.264/AVC, HEVC reduces the bit rate by almost 40% with an equivalent objective quality and around 1.4 times coding complexity [1] .
In HEVC, the newly adopted recursive quadtree-based CU partitioning structure is one of the major tools contributing the coding efficiency improvement. A picture is partitioned into coding tree units (CTU), and a quadtree structure is utilized to partition the CTU into smaller CUs. The depth of the quadtree can be as large as four and a CTU presents a root node (depth 0) of the quadtree. The partitioning can be recursively continued until the maximum depth is reached. To achieve the best coding performance, the best quadtree-based block partitioning structure of CU and the associated PU and transform unit (TU) is searched recursively based on rate-distortion (RD) cost. However, such recursive process determining optimal block structures results in drastic increase of computational complexities. There has been a lot of work to reduce the encoder complexity caused by the exhaustive CU splitting and PU decision process [2] - [6] .
To reflecting the practical importance of reducing encoder complexity, three optional fast mode decision schemes have been adopted in HEVC reference software, called HEVC test model (HM) [7] . For fast PU decision, the early skip detection (ESD) [2] method and the coded block flag (CBF) fast method (CFM) [3] terminate remaining PU decision processes when predefined specific conditions are met. In other words, the ESD checks inter 2N×2N first and terminate if motion vector difference is zero and CBF equal to zero. The CFM terminates if PU has CBF equal to zero. On the other hand, the early CU termination (ECU) [4] terminates CU split if SKIP is determined as the best mode of current CU. In this way, a fast decision of CU depth is achieved.
In this paper, we present an early termination scheme of PU search based on the correlation between the upper depth level and the current depth level. The best PU mode and the RD cost obtained in the upper depth level are used to speed up the PU mode decision of the current depth level under the basic assumption of high correlation between the best modes of both depth levels.
II. Mode Decision Process in HM
HM encoder determines the best quadtree based block structure for each CTU through the mode decision process. For each CU, the best PU mode is chosen by evaluating the RD cost in order of MERGE, SKIP, INTER and INTRA. INTER consists of square motion partition (INTER 2N×2N and INTER N×N), symmetric motion partition (SMP) (INTER N×2N and INTER 2N×N), and asymmetric motion partition (AMP). N×N inter/intra prediction are performed only when current CU size is the smallest size. At the end of prediction process, the mode having the smallest RD cost is chosen as the best mode of the current CU, and then the CU is split into four square sized sub-CUs and the mode decision process is performed in the next depth level.
After exhaustive search of all recursive partitioning blocks at every depth level, a quadtree-based coding structure of CTU and the final prediction mode of each CU are determined based on the best cost of each depth.
III. Proposed Fast PU Decision Scheme
The process of CU partitioning and the associated mode decision is performed recursively at every depth level. It is likely that there exists considerable correlation between consecutive depth levels in terms of coding information such as the best mode and the associated RD cost [5] . However, most of the existing fast mode decision schemes utilize coding information of the current depth level without consideration of the correlation between depth levels. For instance, the ESD checks CBF and MVD after inter and skip modes search in the current depth. In addition, the CFM verifies CBF in inter modes search and the ECU confirms whether the best mode is skip or not in the current depth. The proposed scheme early terminates the mode search in the current depth by using coding information obtained in the upper depth.
The details of the proposed fast mode decision scheme are as follows. In the top depth level, the RD cost for all modes are evaluated as the HM’s decision process since there is no available upper depth. In lower depth levels after the decision process at CTU level, when a CTU or CU is divided into four sub-CUs, the best mode of upper depth (upper BestMode) and RD cost of upper INTRA mode (upper IntraCost) are used in the current depth as shown in Fig. 1 .
PPT Slide
Lager Image
현재 깊이에서 사용되는 상위 깊이의 부호화 정보 Fig. 1. Encoding information of the upper depth to be used in the current depth
In order to speed up the mode search process by skipping some prediction modes, early termination points are selected under three conditions by using those two coding information of the upper depth. The details procedure of the proposed fast mode decision scheme is shown in Fig. 2 .
PPT Slide
Lager Image
제안한 고속 PU 모드 결정 절차 Fig. 2. Proposed fast PU mode decision process
When the SKIP mode is decided as the best mode in the upper depth, it is assumed that the SKIP mode is most likely be the best mode in the current depth. As an additional condition, the RD cost of SKIP mode is compared to that of INTER 2N×2N mode that has usually relatively small RD cost [6] . In order to do this, the search order of SKIP/Merger and INTER 2N×2N specified in HM is swapped as shown in the Fig. 2 and check whether the best RD cost is updated with that of SKIP mode or not after the evaluation of RD cost of SKIP mode. Then if the upper depth’s mode is SKIP and the best RD cost (BestCost) is updated after the SKIP mode search, the search process for remaining modes in the current depth is terminated and the current CU is split into four sub-CU by condition 1 (Cond1).
When the upper best mode is the INTER modes, the second early termination skipping INTRA modes search process can be occurred depending on the condition 2 and the condition 3. In other words, after evaluation of the RD costs for all INTER modes, the condition 2 confirms whether the best RD cost is updated or not, which means if the current best mode is one of the INTER modes.
The RD cost of INTRA mode is predicted from that of upper depth based on the correlation of RD cost of INTRA modes between the consecutive depths. The correlation can assumed high since similar characteristics of reference samples are likely to be used for intra prediction in both depths. Based on such considerations, the specific value of INTRA RD cost can be predicted according to the ratio of block sizes along with the consideration of prediction accuracy. In the experiment, it is noted that 20% of the upper INTRA’s RD cost would be appropriate as the predicted RD cost in terms of trade-off between complex reduction and coding loss.
IV. Experimental Results
The proposed fast mode decision scheme is implemented into HM 12.0 [7] . We used the same test sequence set used for HEVC standard development. For experiments, random access (RA) configuration of Main Profile is used with 64×64 CTU and the maximum partition depth equal to 4. Every test sequence in classes B, C, D, and E is used, and for each sequence 100 frames are encoded per given QP values QP (22, 27, 32, and 37). As a metric for the performance measure, the average time saving given by
  • ΔT(%) =
is used, where and denote encoding time of HM 12.0 and the proposed scheme implemented on HM 12.0. Table 1 summarizes the performance of the proposed scheme.
제안 기법의 부호화 효율 및 복잡도 감소(Anchor: HM 12.0)
PPT Slide
Lager Image
Table 1. Coding efficiency and complexity reduction of the proposed scheme (Anchor: HM 12.0)
As shown in Table 1 , the proposed scheme can save 18.1% of the encoding time, on average, with just 0.2% BD-rate increase. It can also be seen that the benefit of the proposed scheme is more significant for the sequences of the class E, which have sparse content and less motion.
V. Conclusions
In this paper, a fast PU mode decision scheme was presented. The proposed scheme early terminates the PU mode search process by using the best PU mode and the RD cost of INTRA mode in the upper depth level with assumption of high correlation between consecutive depth levels. Experimental results showed that the proposed scheme sped up the encoding process of HM 18.1%, on average, with just 0.2% BD-rate increase for various test sequences. It is expected that further enhancement can be obtained when the proposed scheme is combined with the existing fast mode decision methods since new features of correlation between consecutive depths is utilized.
Vanne J. , Viitanen M. , Hämäläinen T. D. , Hallapuro A. 2012 “Comparative rate-distortion-complexity analysis of HEVC and AVC video codecs” IEEE Trans. Circuits Syst. Video Technol. 22 (12) 1885 - 1898    DOI : 10.1109/TCSVT.2012.2223013
Yang J. , Kim J. , Won K. , Lee H. , Jeon B. 2011 “Early SKIP detection for HEVC” JCT-VC document, JCTVC-G543
R. H. Gweon , Y.-L. Lee 2011 “Early termination of CU encoding to reduce HEVC complexity” JCT-VC document, JCTVC-F045
Choi K. , Park S.-H. , Jang E. S. 2011 “Coding tree pruning base CU early termination” JCT-VC document, JCTVC-F902
Lee J.-H. , Park C.-S. , Kim B.-G. 2012 “Fast coding algorithm based on adaptive coding depth range selection for HEVC” in Proc. IEEE Int. Conf. Consumer Elec. (ICCE)
Park C.-S. 2013 “Early termination algorithm of merge mode search for fast high efficiency video coding encoder” J. Broadcasting Engineering 18 (5)
Kim I. , McCann K. , Sugimoto K. , Bross B. , Han W. 2013 “High efficiency video coding (HEVC) Test Model 12 (HM12) encoder description” JCTVC document, JCTVC-M1002