Motion estimation is a core part of most video compression systems since it directly affects the output video quality and the encoding time. The full search (FS) technique gives the highest visual quality but has the problem of a significant computational load. To solve this problem, we present in this paper a modified median (MMED) operation and advanced search strategies for fast motion estimation. The proposed MMED operation includes a temporally colocated motion vector (MV) to select an appropriate initial candidate. Moreover, we introduce a search procedure that reduces the number of thresholds and simplifies the early termination conditions for the determination of a final MV. The experimental results show that the proposed approach achieves substantial speedup compared with the conventional methods including the motion vector field adaptive search technique (MVFAST) and predictive MVFAST (PMVFAST). The proposed algorithm also improves the PSNR values by increasing the correlation between the MVs, compared with the FS method.
I. INTRODUCTION
The block matching algorithm (BMA) is an essential part of many video coding standards, such as MPEG1, 2, 4, and H.26x, to exploit the temporal correlation and reduce the redundancy that exists between frames of video sequences
[1
,
2]
. In BMA, the current frame is partitioned into nonoverlapped blocks, and the best match for these blocks is found inside the search range of a reference frame with a predefined distortion criterion. The best matching block is then used as a predictor for the block in the current frame, where the displacement between the two blocks defines a motion vector (MV) of the current block. Thus, the encoder sends the MV and a residual block, which is the difference between the current block and the predictor, instead of the current block itself. This can significantly reduce the number of bits involved in video coding procedures. The most common distortion measure is the mean absolute error or mean absolute difference, or equivalently, the sum of the absolute difference (SAD), which requires no multiplication and exhibits a performance similar to that of the mean squared error
[3
,
4]
. Let
f_{t}
and
f_{t1}
be a current frame and a reference frame, respectively. Then, for an M × N block located at (
x, y
) in the current frame, the SAD is given by
where
d_{x}
and
d_{y}
denote the displacements in the
x
 and
y
direction, respectively.
The full search (FS) technique selects the MV, which points to the best matching block that has a minimum SAD, by exhaustive search at all possible points within the search range. Since the calculation of SAD at all possible locations is computationally intensive, it is difficult to directly use FS for various applications, such as digital TV, Internet streaming, and mobile video, in spite of its simple implementation. For this reason, many fast search algorithms have been suggested in an effort to reduce the computational complexity by searching only a subset of the possible search points, such as twodimensional (2D) log search
[5]
, threestep search (TSS)
[6]
, new threestep search (NTSS)
[7]
, and diamond search (DS) strategies
[8
,
9]
. However, these methods usually incur a significant loss in the PSNR, although they can make the search faster by calculating the SAD less frequently. Moreover, these algorithms are not exactly optimal in consideration of the bits required to encode each MV, since they find MVs just by minimizing the SAD irrespective of the distribution of MVs.
In particular, the motion vector field adaptive search technique (MVFAST) and the predictive MVFAST (PMVFAST) have been introduced in order to improve the DS technique without sacrificing PSNR
[10

12]
. The MVFAST considers the MVs of the spatially adjacent blocks and the PMVFAST includes the MVs of the spatially and temporally adjacent blocks as possible candidates. The priority of the initial search point in PMVFAST is given to results of the median operation for the candidates, whereas priority in DS is given to a (0, 0) motion vector, MV
_{0}
. In addition, the early termination by a thresholding technique in the PMVFAST further reduces the computational complexity of the search procedures. These techniques of PMVFAST allow the video encoders to generate fewer bits for MVs with a negligible PSNR loss as well as significantly reduce the computations for search procedures, since the relatively high correlation between the MVs can be used effectively.
In this paper, we propose a modified median (MMED) operation and effective early termination conditions for the fast motion estimation. Since the MV of a temporally colocated block is more correlated with the final MV in many cases, our approach uses the MVs of four adjacent blocks, which include the left, top, and topright blocks of the current block and a temporally colocated block in a reference frame, whereas PMVFAST uses the MVs of three spatially adjacent blocks. To select an appropriate candidate in the initial search procedure, we introduce an MMED operator that can decrease the number of search points, and this reduces the overall computational load. Since the initial search point is chosen more appropriately, thresholds and early termination conditions can be simplified, and consequently, we achieve faster motion estimation without sacrificing PSNR compared with the case of PMVFAST. The rest of this paper is organized as follows: Section II presents a review of MVFAST and PMVFAST. In Section III, the proposed fast motion estimation technique is discussed in detail. Experimental results and conclusions are given in Sections IV and V, respectively.
II. REVIEW OF MVFAST AND PMVFAST
Most of the fast motion estimation algorithms first choose several candidates and then, calculate SAD at the candidate points. After the selection of the candidate with the minimum SAD, more search points around the selected candidate are tested. If the best candidate is not found in this procedure, the current candidate becomes the final MV. Therefore, the efficiency of the fast search algorithms depends on the determination of the candidates. In TSS, the number of search points does not change significantly according to the number of moving objects in the video sequences. On the other hand, in the DS that is biased to MV
_{0}
, better performance can be observed when the video sequences contain a few moving objects. MVFAST and PMVFAST also use DS patterns for fast motion estimation that is adaptive to the characteristics of video sequences. We will now briefly describe MVFAST and PMVFAST, which improve the search speed significantly with a negligible PSNR loss, since the proposed method is based on them.
 A. Search Procedures of MVFAST
MVFAST introduced in [
10
,
12]
improves the speedup of the DS algorithm significantly by initially considering a small set of candidates, which includes MV0 and the MVs of three spatially adjacent blocks (the left, top, and topright blocks). Unlike the DS that uses only the large diamond search pattern (LDSP) shown in
Fig. 1(a)
, MVFAST considers the small diamond search pattern (SDSP) shown in
Fig. 1(b)
.
Diamond search patterns: (a) large diamond search pattern (LDSP) and (b) small diamond search pattern (SDSP).
Both the diamond patterns (LDSP and SDSP) keep on moving until the minimum SAD is found at the center. If the center point is selected using SDSP, the search procedure is terminated at this point. If the center point is selected using LDSP, the search procedure is carried out again using SDSP around the center of LDSP.
In MVFAST, the selection between SDSP and LDSP depends on the motion activity that the current block will most likely have. This motion activity is characterized by the magnitude of the largest MV among the three adjacent MVs, i.e., MV
_{left}
, MV
_{top}
, and MV
_{topright}
, as shown in
Fig. 2
.
Motion vectors of spatially adjacent blocks. MV: motion vector.
The motion activity is described as
low
when it is below a threshold
L
_{1}
, medium when it is between
L
_{1}
and a second threshold
L
_{2}
, and high when it is larger than
L
_{2}
. If the motion activity is considered
low
, SDSP is applied with MV
_{0}
as the center. If the motion activity is
medium
, MV
_{0}
is selected again as the center and LDSP is applied at this time instead of SDSP. This is the only case in which LDSP is selected, whereas the DS algorithm always chooses LDSP. After testing with LDSP, we apply SDSP again at the point with the smallest SAD as the center. Finally, if the motion activity is characterized as
high
, the center is determined with the position of the MV that yields the minimum SAD among MV
_{0}
and the spatially adjacent MVs, and then, SDSP is applied at this point. Moreover, in order to increase the search speed, MVFAST can introduce an early termination step, which is performed initially in the algorithm, with a fixed threshold (e.g.,
T
= 512) at the MV
_{0}
position. When the SAD at the MV
_{0}
position is lower than
T,
MVFAST is stopped without examining any other locations.
 B. Search Procedures of PMVFAST
Unlike MVFAST whose search priority is given to MV
_{0,}
the priority of PMVFAST is given to the median of three spatially adjacent MVs
[11
,
12]
. If SAD at the point obtained by the median operation is sufficiently small (e.g.,
T
= 256), the search is terminated early after examining just one point. Otherwise, another termination condition is considered as follows: If the MV selected by the median operation is equal to the MV of a temporally colocated block (MV
_{t1}
) and SAD at this point is smaller than SAD
_{t1}
, which is SAD between the block in
f
_{t1}
and the block pointed by MV
_{t1}
in
f
_{t2}
, the search is stopped. When the algorithm is not terminated after checking the median operation, the final MV is searched by examining the five candidates, MV
_{0}
, MV
_{left}
, MV
_{top}
, MV
_{topright}
, and MV
_{t1}
. After the selection of the best candidate from among them, a threshold
T
_{1}
is applied; that is, the search is stopped if SAD calculated by the best candidate is smaller than
T
_{1}
, which is selected adaptively as the smallest SAD among the SADs of the left, top, and topright blocks. Otherwise, the best candidate is used as the center of the DS pattern. Furthermore, if the best candidate and MV
_{t1}
are identical and SAD at the best candidate is smaller than SAD
_{t1}
, the search is terminated.
In PMVFAST, LDSP is selected less frequently to reduce the total number of search points. To determine which DS pattern should be used, another threshold
T
_{2}
, defined as
T
_{1}
+ 256 is applied. LDSP is chosen when
T
_{2}
is larger than 1536 and the result of the median operation is MV
_{0}
, whereas SDSP is used in all other cases. When the smallest SAD is obtained at the center of the pattern, the algorithm is terminated. Additionally, PMVFAST has another early termination condition as follows: If all of the MVs of three spatially adjacent blocks are identical and MV
_{t1}
is equal to the result of the median operation, the candidates (three spatially adjacent MVs and MV
_{0}
) are highly correlated with each other. In this case, the selected search pattern is applied just once.
Fig. 3
shows an example of searching the best matching block where the square shapes represent the points that have the smallest SAD in each step.
An example of the search procedure for the best matching block in predictive motion vector field adaptive search technique. MV: motion vector.
III. PROPOSED MOTION ESTIMATION
We introduce a MMED operation to predict the appropriate starting search point, which significantly affects the performance of a fast motion estimation technique. Furthermore, the proposed algorithm reduces the number of thresholds and early termination conditions, and it consequently prevents the degradation in PSNR.
 A. Modified Median Operation
PMVFAST feeds three spatially adjacent MVs to the median operator in order to determine the initial search point. However, our investigation presents that a temporally colocated MV, MV
_{t1}
shown in
Fig. 4
, can be a good candidate for the starting search point, since it is highly correlated with the final MV
Temporally colocated motion vector, where f_{t} and f_{t1} represent the current frame and the reference frame, respectively. MV: motion vector.
Table 1
describes the average distances in pixel values between each candidate MV and the final MV as obtained by using PMVFAST. We can infer from the table that MV
_{t1}
is more highly correlated to the final MV than the others in most of the test sequences. Therefore, we include MV
_{t1}
in the median operation to determine the starting search point. Let (
p_{x}, p_{y}
) be a starting search point; then, the MMED operation is given by
Average distances in pixels between each candidate and the final MV in the case of PMVFAST
MV: motion vector, PMVFAST: predictive motion vector field adaptive search technique.
where
i
∈ {left, top, topright,
t
1}, i.e.,
denotes the
x
component of the starting search point selected among the MVs of the left, top, topright, and temporally colocated blocks.
The proposed MMED operation can deal with evennumbered elements, whereas the conventional median operation used in PMVFAST can process only oddnumbered elements. A median operation eliminates the uncorrelated values more effectively than an average operation. On the other hand, the proposed MMED operation reduces the prediction error by averaging the spatially and temporally adjacent MVs and excluding the maximum and minimum values among them, which are considered to be less correlated to the final MV.
The starting search point for the current block at the typical position is determined as follows: When the current block is located at the origin (i.e., topleftmost position) of the frame, MV
_{t1}
is chosen directly as the predicted MV, since it is the only available candidate MV. When the current block lies in the first row except at the origin of the frame, two candidates, MV
_{left}
and MV
_{t1}
, are available. Since the proposed MMED operator cannot deal with two elements, MV
_{0}
is added in this case. When the current block lies in the first column, except at the origin of the frame, or in the rightmost column, three candidates are available (MV
_{top}
, MV
_{topright}
, and MV
_{t1}
for the first column, or MV
_{left}
, MV
_{top}
, and MV
_{t1}
for the rightmost column) and the starting search point is selected by the median rule instead of (2).
After the starting search point is determined, the early termination condition is applied. That is, if SAD at the point from (2) is less than
T
(e.g., 256), the search procedure is terminated. Furthermore, when the point from (2) is equal to MV
_{t1}
and SAD at this point is smaller than SAD
_{t1}
, the search procedure is also stopped.
Table 2
, which describes the frequency of the final MV determined after only one point is examined, presents that the MV from the proposed MMED operation is selected as the final MV more frequently than in the case of PMVFAST. This implies that the proposed algorithm can select the best matching block faster with a lower PSNR loss than PMVFAST.
The number of final MVs after the examination of only one point
MV: motion vector, PMVFAST: predictive motion vector field adaptive search technique.
 B. Determination of the Final Motion Vector
PMVFAST examines five candidates including MV
_{0}
after the selection of the starting search point by the median operation. In fact, as shown in
Table 3
, MV
_{0}
is rarely selected as the final MV when all the other candidates are not equal to MV
_{0}
. This implies that there is at least one MV
_{0}
among the spatially and temporally adjacent MVs when MV
_{0}
is selected as the final MV in most cases. Thus, MV
_{0}
does not need to be explicitly included in the search procedure, since it is possibly included in the spatially and temporally adjacent MVs.
Frequency and ratio that MV0 is selected when all of the spatially and temporally adjacent MVs are not equal to MV0
Consequently, the proposed algorithm includes just four candidates of MV
_{left}
, MV
_{top}
, MV
_{topright}
, and MV
_{t1 }
in the search procedure for the final MV. Since MV
_{0}
is not tested, the total number of search points is reduced and the correlation between MVs can increase. After an examination of these four candidates, the point that has the smallest SAD is selected as the center of SDSP. This further reduces the number of search points, since the proposed algorithm does not use LDSP. When the MVs are more correlated with each other in a video frame, the number of bits needed to encode each MV is reduced, since the video encoders usually encode the difference between the current MV and the adjacent MVs. We apply an early termination condition to the selected MV in order to further increase the correlation between the MVs as follows: A threshold
T
_{1}
, which is limited between 512 and 1024, is set to the smallest SAD among the three spatially adjacent MVs (MV
_{left}
, MV
_{top}
, and MV
_{topright}
). If SAD at the selected MV is smaller than
T
_{1}
, the search procedure is stopped. When the selected MV is equal to MV
_{t1}
and SAD at this point is smaller than SAD
_{t1}
, the search is also terminated. For the other cases, the proposed search procedures apply SDSP to find the smallest SAD. In this case, the thresholds and early termination conditions are no longer used.
IV. EXPERIMENTAL RESULTS
The performance of the proposed method was evaluated on the basis of
[13]
by using the MPEG4 verification model encoder
[14]
. Test sequences of QCIF (176 × 144), CIF (352 × 288), and SIF (352 × 240) were coded as IPPP; that is, all frames were intercoded and only the first frame was intracoded. TM5 was adopted as the rate control algorithm. Thresholds for motion activity in MVFAST,
L
_{1}
and
L
_{2}
were set to 1 and 2, respectively, and the threshold for early termination in PMVFAST and the proposed method,
T
was set to 256. The maximum displacements of the search range were set to ±16, ±32, and ±48 pixels, respectively, in both the horizontal and the vertical direction.
Table 4
summarizes the performance of the proposed algorithm compared with MVFAST and PMVFAST in terms of both the PSNR and the speedup ratio on average. The speedup ratio is obtained by comparing the number of search points of each method with that of FS. Detailed results of the proposed approach and conventional methods for each experimental condition are described in
Table 5
.
Results of the proposed algorithm in terms of average PSNR and speedup ratio
FS: full search, MVFAST: motion vector field adaptive search technique, PMVFAST: predictive MVFAST, PSNR: peak signaltonoise ratio.
It can be inferred from
Tables 4
and
5
that the proposed algorithm significantly reduces the number of search points compared with those of conventional methods and exhibits similar or higher PSNRs than those of the FS. When the correlation between adjacent MVs increases, the video encoder can reduce the number of bits needed to encode each MV, and thus, the extra bits can be assigned for increasing PSNR.
Fig. 5
shows the distribution of MV fields for the
Foreman
sequence from the FS and the proposed method. As shown in
Fig. 5
, the final MVs of the proposed algorithm are distributed more uniformly than those of the FS.
Distribution of motion vector fields of the Foreman sequence for each search method: (a) full search and (b) proposed algorithm.
The proposed scheme is superior to PMVFAST, which is a widely used technique for fast motion estimation, in terms of both the speedup ratio and the PSNR, as shown in
Table 5
. The performance of PMVFAST can be limited, since many thresholds without considering the characteristics of MVs possibly search for inappropriate points. The proposed algorithm searches faster than conventional algorithms and its average PSNRs are higher than those of the FS.
Experimental results of the proposed algorithm compared with the conventional methods
FS: full search, NTSS: new threestep search, DS: diamond search, MVFAST: motion vector field adaptive search technique, PMVFAST: predictive MVFAST, PSNR: peak signaltonoise ratio.
V. CONCLUSIONS
This paper presents a fast motion estimation method based on a MMED operation and advanced strategies for the selection of the search points. The proposed MMED operation includes the temporally colocated MV for an initial candidate, which is more correlated with the final MV, as observed in our experiments. Moreover, we reduce the number of thresholds and simplify the early termination conditions. With these improvements, the proposed approach achieves a faster search than MVFAST and PMVFAST. The proposed algorithm also improves PSNRs compared with those of the FS, since the MVs obtained from our approach are more correlated with each other and the bitsavings in coding them can be used for improvements in PSNR. Based on the better performance in terms of fast search and PSNR, the proposed algorithm can be employed in many applications, such as digital TV, Internet streaming, and mobile video by simply modifying the existing algorithms.
BIO
Jongho Kim received his B.S., M.S., and Ph.D. in Electronic Communications Engineering from Hanyang University, Seoul, Korea, in 1998, 2000, and 2008, respectively. From 2008 to 2009, he worked at Samsung Electronics Company and was involved in the development of video codecs and an IPbased video telephony system for mobile devices. He is currently an associate professor with the Department of Multimedia Engineering, Sunchon National University, Suncheon, Korea. His research interests include image and video compression, processing, and enhancement for multimedia applications, multimedia communication systems, and digital signal processing.
Ghanbari M.
2003
“Principles of video compression,” in Standard Codecs: Image Compression to Advanced Video Coding
Institution of Electrical Engineers
London, UK
25 
61
Richardson I. E.
2003
H.264 and MPEG4 Video Compression
John Wiley & Sons
Chichester, UK
Kim J. S.
,
Kim J. G.
2011
“A performance comparison of blockbased matching cost evaluation models for FRUC techniques,”
International Journal of Information and Communication Engineering
9
(6)
671 
675
Sorwar G.
,
Murshed M.
,
Dooley L.
2004
“Fast blockbased true motion estimation using distance dependent thresholds,”
Journal of Research and Practice in Information Technology
36
(3)
157 
169
Jain J. R.
,
Jain A. K.
1981
“Displacement measurement and its application in interframe image coding,”
IEEE Transactions on Communications
http://dx.doi.org/10.1109/TCOM.1981.1094950
29
(12)
1799 
1808
Koga T.
,
Hirano K.
,
Iijima Y.
,
Ishiguro T.
1981
“Motioncompensated interframe coding for video conferencing,”
in Proceedings of the National Telecommunications Conference
G5.3.1 
G5.3.5
Li R.
,
Zeng B.
,
Liou M. L.
1994
“A new threestep search algorithm for block motion estimation,”
IEEE Transactions on Circuits and Systems for Video Technology
http://dx.doi.org/10.1109/76.313138
4
(4)
438 
442
Zhu S.
,
Ma K. K.
2000
“A new diamond search algorithm for fast block matching motion estimation,”
IEEE Transactions on Image Processing
http://dx.doi.org/10.1109/83.821744
9
(2)
287 
290
Tham J. Y.
,
Ranganath S.
,
Ranganath M.
,
Kassim A. A.
1998
“A novel unrestricted centerbased diamond search algorithm for block motion estimation,”
IEEE Transactions on Circuits and Systems for Video Technology
http://dx.doi.org/10.1109/76.709403
8
(4)
369 
377
Hosur P. I.
,
Ma K. K.
1999
“Motion vector field adaptive fast motion estimation,”
in Proceedings of the 2nd International Conference on Information, Communications and Signal Processing
7 
10
Tourapis A. M.
,
Au O. C.
,
Liou M. L.
2001
“Predictive motion vector field adaptive search technique (PMVFAST): enhancing blockbased motion estimation,”
in Proceedings of the Visual Communication and Image Processing
883 
892
2001
“Optimized visual reference software,” Text of 144967 PDTR, ISO/IEC N4057
1999
“Experimental conditions for evaluating encoder motion estimation algorithms,” ISO/IEC JTC1/SC29/WG11 N3141
2001
“The MPEG4 video verification model v.18.0,” ISO/IEC JTC1/SC29/WG11 N3908