Advanced
A Fast and Low-complexity Motion Estimation for UHD HEVC
A Fast and Low-complexity Motion Estimation for UHD HEVC
Journal of Broadcast Engineering. 2013. Nov, 18(6): 808-815
Copyright © 2013, The Korean Society of Broadcast Engineers
  • Received : August 28, 2013
  • Accepted : November 14, 2013
  • Published : November 30, 2013
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
성오 김
Multimedia Research Team, DMC R&D Center, Samsung Electronics
찬식 박
Multimedia Research Team, DMC R&D Center, Samsung Electronics
형주 전
Multimedia Research Team, DMC R&D Center, Samsung Electronics
재문 김
Multimedia Research Team, DMC R&D Center, Samsung Electronics
jaemoonc.kim@samsung.com

Abstract
In this paper, we propose a novel fast and low-complexity Motion Estimation (ME) algorithm for Ultra High Definition (UHD) High Efficiency Video Coding (HEVC). Motion estimation occupies 77~81% of the amount of computation in HEVC. After all, the main key of video codec implementation is to find a fast and low-complexity motion estimation algorithm and architecture. We analyze the previous motion estimation algorithms and propose three optimal algorithm to reduce the computation proportion for HEVC. The proposed algorithm uses only 0.36% of the amount of operations compared to full search algorithm while maintaining compression performance with slight loss of 1.1%.
Keywords
Ⅰ. Introduction
High Efficiency Video Coding (HEVC) is a new video compression standard providing about 50% coding gain over H.264/AVC [1] . HEVC was finalized by the Joint Collaborative Team on Video Coding (JCT-VC) established by ISO-IEC/MPEG and ITU-T/VCEG in Jan. 2013 [2] . In order to achieve this coding gain, it includes an enlarged coding unit (called as tree block) of 64x64 and an increased interpolation filter taps of 8 and 4 for luma and chroma data, respectively. The most noticeable differences from the previous codec are coding unit size (e.g., 16x16 to 64x64) and a variety of hierarchical block partitioning. Consequently, the complexity of motion estimation (ME) of HEVC increases more than 3-4 times than that of H.264/AVC.
Fig. 1 shows the run-time portion of each tool in HEVC. Motion estimation named by inter prediction occupies 77-81% of the amount of computation. After all, the main key of video codec implementation into a hardware is to find a fast and low-power motion estimation algorithm and architecture, which spends the most time in codec.
PPT Slide
Lager Image
HEVC 내의 각 툴에 대한 처리시간 양 Fig. 1. The run-time portion of each tool in HEVC
Fast motion estimation algorithms have been studied for a long time from MPEG-2 to H.264; Significant reductions in search positions, simplification of matching criterion, bit-depth reduction, predictive search, hierarchical search and fast full search approach. First, significant reduction methods in search positions include Three Step Search (TSS) [3] , Two-dimensional Logarithmic Search (TLS) [4] , orthogonal search and Diamond Search (DS) [5 - 8] . Second, simplification techniques of matching criterion are pixel selection pattern in Chan's scheme, pixel difference classification (PDC) [9] and minimax criterion [10] . Third, predictive search method is the way to search the surrounding Motion Vector Predictor (MVP). Finally, Hierarchical search uses sub-sampled domain search. In addition, there are the algorithm combining predictive search and hierarchical search method named by Multi-Resolution Motion Estimation algorithm with Multiple Candidates (MRMCS) [11] . Some of the important algorithms that are implemented in H.264/AVC reference software are UMHexS (Unsymmetrical-cross Multi-hexagon-grid Search), SUMHexS (Simple UMHexS) and EPZS (Enhanced Predictive Zonal Search). TZS (Test Zone Search) is also used in encoder of HEVC reference software [12] . However, those previous algorithms for a variety of block sizes and partition types to perform ME were not suitable. Thus, we propose three optimized scheme for HEVC.
This paper is organized as follows. In Section II, our proposed algorithm is presented. Experimental results are presented in Section III. and concluding remarks are given in Section IV.
Ⅱ. Proposed Algorithm
Fig. 2 illustrates example of the partition units in HEVC.
PPT Slide
Lager Image
HEVC 내의 CU, PU, TU 파티션 분포 예제 Fig. 2. Example of the partition units in HEVC
Each frame is divided into square blocks called Coding Units (CUs) with maximum size of 64x64 and recursively subdivided into square blocks up to 8x8. The CUs are assigned to a quad-tree where each CU is sub-divided into quad-tree based prediction blocks called Prediction Units (PUs) with intra or inter type. Each PU is again partitioned into quad-tree based transform blocks called Transform Units (TUs) specifying transform size [13] . Because of a variety of block sizes and partition types in HEVC, the complexity of ME can be extremely increased. Therefore, a new fast algorithm is required for HEVC.
To develop a new ME algorithm for HEVC, we analyzed encode bitstream. Fig. 3 shows the screen shot of the analyzer, which consists of block partition information, motion vector information, prediction mode distribution, the total bit amount of CU, distortion and cost. This tool should be useful to find the optimum algorithm easily.
PPT Slide
Lager Image
자체 개발한 HEVC 스트림 분석기의 스크린 캡쳐 화면 Fig. 3. The screen shot of the our analyzer
We propose three fast motion estimation techniques; 1) motion vector estimation method of large block types 2) adaptive multi-resolution and multi-candidates (AMRMCS) 3) SAD calculation using bit-depth reduction.
Firstly, This method predicts the best motion vectors of large block types; 16x16 to 64x64, as discussed in Fig. 4 . The top-left best motion vector (MV) of 16x16 block type is used by the best MVs of large block types. The fundamental idea is based on the fact, statistically proven, that the MVs of the small block types are similar to those of the large block types. Fig. 5 shows the best MV in 16x16 block type. Fig. 6 also depicts the best MV in 64x64 block type. As can be seen, two MVs may be alike. From this perspective, we find the best motion vector only up to16x16 block type and predict the best MVs of large blocks. This method can reduce 60-70% of computation with slight loss.
PPT Slide
Lager Image
큰 블럭 타입의 움직임 벡터 예측 방법 Fig. 4. Motion vector estimation method of large block types
PPT Slide
Lager Image
16x16 블록 타입의 선택된 움직임 벡터 Fig. 5. Best Motion Vector in 16x16 block type
PPT Slide
Lager Image
64x64 블록 타입의 선택된 움직임 벡터 Fig. 6. Best Motion Vector in 64x64 block type
Secondly, we propose the hierarchical search and predictive search based on AMRMCS as shown in Fig. 7 . AMRMCS consists of two hierarchical levels, i.e., upper and lower levels. For a search range of [-p, +p], in the upper level (level 2), images are 4:1 downsampled from the original in the horizontal and vertical direction and then a motion search is performed for a 4x4 sub-block with a search range of [-p/4, +p/4]. From the upper level search, three search points corresponding to the smallest SADs (Sum of absolute differences) are selected as initial search points for the lower level (level 0). And it is divided into eight areas of 4:1 sub-sampled domain and search the best candidates at each division first and get some of them whose numbers are flexible by image resolution, bandwidth and configuration to avoid local minimum problem in level 2. After that, we can get the minimum SAD candidates in 4:1 global search range of [±128, ±64]. We find final MV by executing refinement search again in local search range (X-axis: [-8, 7], Y-axis: [-7, 7]), which uses the best MVs of sub-sampled domain and MV of the left neighbor (MVA), MV of the top neighbor (MVB) and motion vector predictor (MVP). This predictor is calculated by using the median value of the MVs of the three adjacent, on the left, top, and top-right (or top-left if top-right is not available) blocks to the current block.
PPT Slide
Lager Image
AMRMCS의 컨셉 그림 Fig. 7. Concept of AMRMCS
Finally, bit-depth reduction is a way of using truncated reference and current pixel data. Fig. 8 shows comparison of average bjøntegaard delta bit rate (BDBR) [14] by bit-depth reduction. The x-axis represents the number of bit-depth and the y-axis shows a set of measures such as BDBR.
PPT Slide
Lager Image
Bit-Depth 수에 따른 평균 BDBR 비교 Fig. 8. Comparison of average BDBR by bit-depth reduction
As this experimental result, even though bit-depth is reduced from 8 to 4-bit, the performance degradation is only less than 0.5 % (BDBR). Therefore, our proposed algorithm uses the 4-bit pixel data for motion estimation computation. Then, the SAD is calculated by only MSB-part (4-bit) as in Fig. 9 . Using this algorithm, we can reduce the hardware by half and power consumption significantly.
PPT Slide
Lager Image
Bit-depth 감소 방식을 사용한 SAD 계산 형태 Fig. 9. SAD calculation using bit-depth reduction
Fig. 10 shows the amount of computational complexity for each algorithm. It means the combined proportion between the number of search points and the quantity of SAD calculation. As a result of this figure, our proposed algorithm uses only 0.36% of the amount of operations compared to the general full search algorithm.
PPT Slide
Lager Image
제안한 각 알고리즘에 대한 계산 양 Fig. 10. The amount of computation of each algorithm
Ⅲ. Experimental Result
We used video test sequences, including UHD crop (2560x1600), FHD (1920x1080) and so on.
The test sequence depicted in Fig. 11 , 1920x1080 size with 30 frame-per-second, has various moving objects along with global motion (e.g., camera zoom in/out and object tracking). Fig. 12 illustrates difference of MVs between full search algorithm and our proposed algorithm. The 64x64 coding unit position of white thick square is (11, 5) and the coding unit size (yellow thick square) is 8x8 type. Then, the best MV of the two algorithm is equal to (-7, 13). Therefore, it can be confirmed that our proposed algorithm has a similar performance compared to full search algorithm which cab be the most complex.
PPT Slide
Lager Image
테스트 시퀀스: QuarterBackSneak_1920x1080x30p Fig. 11. Test sequence: QuarterBackSneak_1920x1080x30p
PPT Slide
Lager Image
LCU(11,5) 에서의 움직임 벡터 비교 Fig. 12. Comparison of Motion Vectors in LCU (11, 5)
Table I and II show comparison with the full search algorithm in HM10.0. The proposed algorithm achieved far superior performance in speed, power and cost than any legacy codecs while maintaining visual quality with loss of a bit coding efficiency.
Comparison with Full Search Algorithm in HM10.0[15](Only FHD Sequence)
PPT Slide
Lager Image
(1): motion vector estimation of large block types (2): AMRMCS (3): SAD Calculation using bit-depth reduction
Comparison with Full Search Algorithm in HM10.0 (Total sequence included FHD, UHD, VGA etc)
PPT Slide
Lager Image
Comparison with Full Search Algorithm in HM10.0 (Total sequence included FHD, UHD, VGA etc)
Ⅳ. Conclusion
Our proposed motion estimation uses only 0.36% of the amount of operations compared to the general full search algorithm while maintaining compression performance. And we represented that our motion estimation is definitely superior to previous scheme. Finally, the proposed motion estimation algorithm will help you to have the smallest hardware size and the lowest power consumption among existing motion estimation engines for HEVC.
BIO
김 성 오
- 2008년 ~ 현재 : 삼성전자 DMC 연구소 Multimedia 연구팀 책임 연구원
- 주관심분야 : 비디오 코덱, 신호처리
박 찬 식
- 2002년 ~ 현재 : 삼성전자 DMC 연구소 Multimedia 연구팀 책임 연구원
- 주관심분야 : 비디오 코덱, 신호처리
전 형 주
- 2008년 ~ 현재 : 삼성전자 DMC 연구소 Multimedia 연구팀 선임 연구원
- 주관심분야 : 비디오 코덱, 신호처리
김 재 문
- 2010년 ~ 현재 : 삼성전자 DMC 연구소 Multimedia 연구팀 책임 연구원
- 주관심분야 : 비디오 코덱, 신호처리
References
Sullivan G. J. (2012) Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE TCSVT 22 1649 - 1668
Bross B. (2013) “High Efficiency Video Coding (HEVC) text specification draft 10 (for FDIS & Last Call)” ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG Doc. JCTVC-L1003
Koga T. (1981) “Motion compensated inter-frame coding for video conferencing,” in Proc. Nat. Telecommun. Conf. C9.6.1 - C9.6.5
Jain J. , Jain A. (1981) “Displacement Measurement and its Application in Internal Image Coding,” IEEE Trans. Commun. COM-29 (12) 1799 - 1808    DOI : 10.1109/TCOM.1981.1094950
Tham J.Y. (1998) “A Novel Unrestricted Center-biased Diamond Search Algorithm for Block Motion Estimation,” IEEE Trans. Circuits Syst. Video Technol. 8 (4) 369 - 377    DOI : 10.1109/76.709403
Zhu S. , Ma K.K. (2000) “A New Diamond Search Algorithm for Fast Block-matching Motion Estimation,” IEEE Trans. Image Processing 9 (2) 287 - 290    DOI : 10.1109/83.821744
Tourapis A.M. (2000) “Optimizing the mpeg-4 Encoder - advanced Diamond Zonal search,” in Proc. of IEEE Int. Symp. Circuits Syst. (ISCAS’00) 674 - 677
Tourapis A.M. , Au O.C. , Liu M.L. (2002) “Highly Efficient Predictive Zonal Algorithms for Fast Block-matching Motion Estimation,” IEEE Trans. Circuits Syst. Video Technol. 12 (10) 934 - 347    DOI : 10.1109/TCSVT.2002.804894
Gharavi H. , Mills M. (1990) “Block Matching Motion Estimation Algorithms - New Results,” IEEE Trans. Circuits Syst. 37 (5) 649 - 651    DOI : 10.1109/31.55010
Chen M.J. (1995) “A New Block-matching Criterion for Motion Estimation and its Implementation,” IEEE Trans. Circuits Syst. Video Technol. 5 (3) 231 - 236    DOI : 10.1109/76.401100
J. H. Lee (2001) “A fast multi-resolution block matching algorithm and its LSI architecture for low bit-rate video coding” Circuits and Systems for Video Technology, IEEE Transactions on 11 1289 - 1301    DOI : 10.1109/76.974683
Purnachand N. (2012) “Improvements to TZ search motion estimation algorithm for multiview video coding” IEEE IWSSIP 2012 Vienna
(2012) “Fast Motion Estimation Algorithm for HEVC”, (ICCE-Berlin) 34 - 37
Bjontegaard G. (2001) "Calculation of Average PSNR Differences Between RD-curves," document VCEG-M33, ITU-T Video Coding Experts Group (VCEG) Meeting, Austin, TX
(2013) JVC-VT HEVC reference software HM 11.0 https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/HM-11.0