In this paper, we propose a novel fast and lowcomplexity Motion Estimation (ME) algorithm for Ultra High Definition (UHD) High Efficiency Video Coding (HEVC). Motion estimation occupies 77~81% of the amount of computation in HEVC. After all, the main key of video codec implementation is to find a fast and lowcomplexity motion estimation algorithm and architecture. We analyze the previous motion estimation algorithms and propose three optimal algorithm to reduce the computation proportion for HEVC. The proposed algorithm uses only 0.36% of the amount of operations compared to full search algorithm while maintaining compression performance with slight loss of 1.1%.
Ⅰ. Introduction
High Efficiency Video Coding (HEVC) is a new video compression standard providing about 50% coding gain over H.264/AVC
[1]
. HEVC was finalized by the Joint Collaborative Team on Video Coding (JCTVC) established by ISOIEC/MPEG and ITUT/VCEG in Jan. 2013
[2]
. In order to achieve this coding gain, it includes an enlarged coding unit (called as tree block) of 64x64 and an increased interpolation filter taps of 8 and 4 for luma and chroma data, respectively. The most noticeable differences from the previous codec are coding unit size (e.g., 16x16 to 64x64) and a variety of hierarchical block partitioning. Consequently, the complexity of motion estimation (ME) of HEVC increases more than 34 times than that of H.264/AVC.
Fig. 1
shows the runtime portion of each tool in HEVC. Motion estimation named by inter prediction occupies 7781% of the amount of computation. After all, the main key of video codec implementation into a hardware is to find a fast and lowpower motion estimation algorithm and architecture, which spends the most time in codec.
HEVC 내의 각 툴에 대한 처리시간 양 Fig. 1. The runtime portion of each tool in HEVC
Fast motion estimation algorithms have been studied for a long time from MPEG2 to H.264; Significant reductions in search positions, simplification of matching criterion, bitdepth reduction, predictive search, hierarchical search and fast full search approach. First, significant reduction methods in search positions include Three Step Search (TSS)
[3]
, Twodimensional Logarithmic Search (TLS)
[4]
, orthogonal search and Diamond Search (DS)
[5

8]
. Second, simplification techniques of matching criterion are pixel selection pattern in Chan's scheme, pixel difference classification (PDC)
[9]
and minimax criterion
[10]
. Third, predictive search method is the way to search the surrounding Motion Vector Predictor (MVP). Finally, Hierarchical search uses subsampled domain search. In addition, there are the algorithm combining predictive search and hierarchical search method named by MultiResolution Motion Estimation algorithm with Multiple Candidates (MRMCS)
[11]
. Some of the important algorithms that are implemented in H.264/AVC reference software are UMHexS (Unsymmetricalcross Multihexagongrid Search), SUMHexS (Simple UMHexS) and EPZS (Enhanced Predictive Zonal Search). TZS (Test Zone Search) is also used in encoder of HEVC reference software
[12]
. However, those previous algorithms for a variety of block sizes and partition types to perform ME were not suitable. Thus, we propose three optimized scheme for HEVC.
This paper is organized as follows. In Section II, our proposed algorithm is presented. Experimental results are presented in Section III. and concluding remarks are given in Section IV.
Ⅱ. Proposed Algorithm
Fig. 2
illustrates example of the partition units in HEVC.
HEVC 내의 CU, PU, TU 파티션 분포 예제 Fig. 2. Example of the partition units in HEVC
Each frame is divided into square blocks called Coding Units (CUs) with maximum size of 64x64 and recursively subdivided into square blocks up to 8x8. The CUs are assigned to a quadtree where each CU is subdivided into quadtree based prediction blocks called Prediction Units (PUs) with intra or inter type. Each PU is again partitioned into quadtree based transform blocks called Transform Units (TUs) specifying transform size
[13]
. Because of a variety of block sizes and partition types in HEVC, the complexity of ME can be extremely increased. Therefore, a new fast algorithm is required for HEVC.
To develop a new ME algorithm for HEVC, we analyzed encode bitstream.
Fig. 3
shows the screen shot of the analyzer, which consists of block partition information, motion vector information, prediction mode distribution, the total bit amount of CU, distortion and cost. This tool should be useful to find the optimum algorithm easily.
자체 개발한 HEVC 스트림 분석기의 스크린 캡쳐 화면 Fig. 3. The screen shot of the our analyzer
We propose three fast motion estimation techniques; 1) motion vector estimation method of large block types 2) adaptive multiresolution and multicandidates (AMRMCS) 3) SAD calculation using bitdepth reduction.
Firstly, This method predicts the best motion vectors of large block types; 16x16 to 64x64, as discussed in
Fig. 4
. The topleft best motion vector (MV) of 16x16 block type is used by the best MVs of large block types. The fundamental idea is based on the fact, statistically proven, that the MVs of the small block types are similar to those of the large block types.
Fig. 5
shows the best MV in 16x16 block type.
Fig. 6
also depicts the best MV in 64x64 block type. As can be seen, two MVs may be alike. From this perspective, we find the best motion vector only up to16x16 block type and predict the best MVs of large blocks. This method can reduce 6070% of computation with slight loss.
큰 블럭 타입의 움직임 벡터 예측 방법 Fig. 4. Motion vector estimation method of large block types
16x16 블록 타입의 선택된 움직임 벡터 Fig. 5. Best Motion Vector in 16x16 block type
64x64 블록 타입의 선택된 움직임 벡터 Fig. 6. Best Motion Vector in 64x64 block type
Secondly, we propose the hierarchical search and predictive search based on AMRMCS as shown in
Fig. 7
. AMRMCS consists of two hierarchical levels, i.e., upper and lower levels. For a search range of [p, +p], in the upper level (level 2), images are 4:1 downsampled from the original in the horizontal and vertical direction and then a motion search is performed for a 4x4 subblock with a search range of [p/4, +p/4]. From the upper level search, three search points corresponding to the smallest SADs (Sum of absolute differences) are selected as initial search points for the lower level (level 0). And it is divided into eight areas of 4:1 subsampled domain and search the best candidates at each division first and get some of them whose numbers are flexible by image resolution, bandwidth and configuration to avoid local minimum problem in level 2. After that, we can get the minimum SAD candidates in 4:1 global search range of [±128, ±64]. We find final MV by executing refinement search again in local search range (Xaxis: [8, 7], Yaxis: [7, 7]), which uses the best MVs of subsampled domain and MV of the left neighbor (MVA), MV of the top neighbor (MVB) and motion vector predictor (MVP). This predictor is calculated by using the median value of the MVs of the three adjacent, on the left, top, and topright (or topleft if topright is not available) blocks to the current block.
AMRMCS의 컨셉 그림 Fig. 7. Concept of AMRMCS
Finally, bitdepth reduction is a way of using truncated reference and current pixel data.
Fig. 8
shows comparison of average bjøntegaard delta bit rate (BDBR)
[14]
by bitdepth reduction. The xaxis represents the number of bitdepth and the yaxis shows a set of measures such as BDBR.
BitDepth 수에 따른 평균 BDBR 비교 Fig. 8. Comparison of average BDBR by bitdepth reduction
As this experimental result, even though bitdepth is reduced from 8 to 4bit, the performance degradation is only less than 0.5 % (BDBR). Therefore, our proposed algorithm uses the 4bit pixel data for motion estimation computation. Then, the SAD is calculated by only MSBpart (4bit) as in
Fig. 9
. Using this algorithm, we can reduce the hardware by half and power consumption significantly.
Bitdepth 감소 방식을 사용한 SAD 계산 형태 Fig. 9. SAD calculation using bitdepth reduction
Fig. 10
shows the amount of computational complexity for each algorithm. It means the combined proportion between the number of search points and the quantity of SAD calculation. As a result of this figure, our proposed algorithm uses only 0.36% of the amount of operations compared to the general full search algorithm.
제안한 각 알고리즘에 대한 계산 양 Fig. 10. The amount of computation of each algorithm
Ⅲ. Experimental Result
We used video test sequences, including UHD crop (2560x1600), FHD (1920x1080) and so on.
The test sequence depicted in
Fig. 11
, 1920x1080 size with 30 framepersecond, has various moving objects along with global motion (e.g., camera zoom in/out and object tracking).
Fig. 12
illustrates difference of MVs between full search algorithm and our proposed algorithm. The 64x64 coding unit position of white thick square is (11, 5) and the coding unit size (yellow thick square) is 8x8 type. Then, the best MV of the two algorithm is equal to (7, 13). Therefore, it can be confirmed that our proposed algorithm has a similar performance compared to full search algorithm which cab be the most complex.
테스트 시퀀스: QuarterBackSneak_1920x1080x30p Fig. 11. Test sequence: QuarterBackSneak_1920x1080x30p
LCU(11,5) 에서의 움직임 벡터 비교 Fig. 12. Comparison of Motion Vectors in LCU (11, 5)
Table I
and
II
show comparison with the full search algorithm in HM10.0. The proposed algorithm achieved far superior performance in speed, power and cost than any legacy codecs while maintaining visual quality with loss of a bit coding efficiency.
Comparison with Full Search Algorithm in HM10.0[15](Only FHD Sequence)
(1): motion vector estimation of large block types (2): AMRMCS (3): SAD Calculation using bitdepth reduction
Comparison with Full Search Algorithm in HM10.0 (Total sequence included FHD, UHD, VGA etc)
Comparison with Full Search Algorithm in HM10.0 (Total sequence included FHD, UHD, VGA etc)
Ⅳ. Conclusion
Our proposed motion estimation uses only 0.36% of the amount of operations compared to the general full search algorithm while maintaining compression performance. And we represented that our motion estimation is definitely superior to previous scheme. Finally, the proposed motion estimation algorithm will help you to have the smallest hardware size and the lowest power consumption among existing motion estimation engines for HEVC.
BIO
김 성 오
 2008년 ~ 현재 : 삼성전자 DMC 연구소 Multimedia 연구팀 책임 연구원
 주관심분야 : 비디오 코덱, 신호처리
박 찬 식
 2002년 ~ 현재 : 삼성전자 DMC 연구소 Multimedia 연구팀 책임 연구원
 주관심분야 : 비디오 코덱, 신호처리
전 형 주
 2008년 ~ 현재 : 삼성전자 DMC 연구소 Multimedia 연구팀 선임 연구원
 주관심분야 : 비디오 코덱, 신호처리
김 재 문
 2010년 ~ 현재 : 삼성전자 DMC 연구소 Multimedia 연구팀 책임 연구원
 주관심분야 : 비디오 코덱, 신호처리
Sullivan G. J.
(2012)
Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE TCSVT 22
1649 
1668
Bross B.
(2013)
“High Efficiency Video Coding (HEVC) text specification draft 10 (for FDIS & Last Call)”
ITUT SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG Doc. JCTVCL1003
Koga T.
(1981)
“Motion compensated interframe coding for video conferencing,”
in Proc. Nat. Telecommun. Conf.
C9.6.1 
C9.6.5
Jain J.
,
Jain A.
(1981)
“Displacement Measurement and its Application in Internal Image Coding,”
IEEE Trans. Commun.
COM29
(12)
1799 
1808
DOI : 10.1109/TCOM.1981.1094950
Tham J.Y.
(1998)
“A Novel Unrestricted Centerbiased Diamond Search Algorithm for Block Motion Estimation,”
IEEE Trans. Circuits Syst. Video Technol.
8
(4)
369 
377
DOI : 10.1109/76.709403
Zhu S.
,
Ma K.K.
(2000)
“A New Diamond Search Algorithm for Fast Blockmatching Motion Estimation,”
IEEE Trans. Image Processing
9
(2)
287 
290
DOI : 10.1109/83.821744
Tourapis A.M.
(2000)
“Optimizing the mpeg4 Encoder  advanced Diamond Zonal search,”
in Proc. of IEEE Int. Symp. Circuits Syst. (ISCAS’00)
674 
677
Tourapis A.M.
,
Au O.C.
,
Liu M.L.
(2002)
“Highly Efficient Predictive Zonal Algorithms for Fast Blockmatching Motion Estimation,”
IEEE Trans. Circuits Syst. Video Technol.
12
(10)
934 
347
DOI : 10.1109/TCSVT.2002.804894
Gharavi H.
,
Mills M.
(1990)
“Block Matching Motion Estimation Algorithms  New Results,”
IEEE Trans. Circuits Syst.
37
(5)
649 
651
DOI : 10.1109/31.55010
Chen M.J.
(1995)
“A New Blockmatching Criterion for Motion Estimation and its Implementation,”
IEEE Trans. Circuits Syst. Video Technol.
5
(3)
231 
236
DOI : 10.1109/76.401100
J. H. Lee
(2001)
“A fast multiresolution block matching algorithm and its LSI architecture for low bitrate video coding”
Circuits and Systems for Video Technology, IEEE Transactions on
11
1289 
1301
DOI : 10.1109/76.974683
Purnachand N.
(2012)
“Improvements to TZ search motion estimation algorithm for multiview video coding”
IEEE IWSSIP 2012
Vienna
(2012)
“Fast Motion Estimation Algorithm for HEVC”,
(ICCEBerlin)
34 
37
Bjontegaard G.
(2001)
"Calculation of Average PSNR Differences Between RDcurves,"
document VCEGM33, ITUT Video Coding Experts Group (VCEG) Meeting, Austin, TX
(2013)
JVCVT HEVC reference software HM 11.0
https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/HM11.0