In many moving object detection problems of an aerial video, accurate and robust stabilization is of critical importance. In this paper, a novel accurate image alignment algorithm for aerial electronic image stabilization (EIS) is described. The feature points are first selected using optimal derivative filters based Harris detector, which can improve differentiation accuracy and obtain the precise coordinates of feature points. Then we choose the Delaunay Triangulation edges to find the matching pairs between feature points in overlapping images. The most “useful” matching points that belong to the background are used to find the global transformation parameters using the projective invariant. Finally, intentional motion of the camera is accumulated for correction by SageHusa adaptive filtering. Experiment results illustrate that the proposed algorithm is applied to the aerial captured video sequences with various dynamic scenes for performance demonstrations.
1. Introduction
W
hen cameras are mounted on the unstable airplane platforms, it is barely possible to obtain a smooth motion because of undesired camera motions. Electronic image stabilization (EIS) is, therefore, becoming an indispensable technique in advanced digital cameras and camcorders. EIS can be defined as the process of removing unwanted video vibrations and obtaining stabilized image sequences
[1

3]
. It has been widely used in the areas of video surveillance, panorama stitching, robot localization and moving objects tracking
[4

7]
. However, making a stable video is a very challenging task especially when an motion of both camera (egomotion and highfrequency motion) and foreground objects is present,The stabilization accuracy profoundly affects the stabilization quality and impedes the subsequent processes for various applications
[8
,
9]
.
The EIS mainly consists of two parts: motion estimation (ME) and motion compensation (MC). The ME is responsible for estimating the reliable global camera movement through three processing steps on the acquired image sequences (see
[4]
for an alternative scheme): feature detection, feauture matching, and transformation model estimation
[10]
; the MC can preserve the panning motion of the camera while correcting the undesired fluctuation motions due to an unsteady and vibrating platform. Compared with MC, motion estimation (ME) plays the most important role in EIS and its estimation precision is a decisive step toward video stabilization
[11]
. In order to enable the use of aerial video in stabilization and reconnaissance missions, the motion estimation (ME) algorithms have to be robustness with respect to different conditions. The block matching algorithm (BMA)
[12]
, bit plane matching (BPM)
[13]
and phase correlation
[14]
are the most common ways to stabilize the translational jitter. In this paper, a special class of ME methods is considered that aligns the frames in an aerial video of a dynamic scene captured by a moving camera.
A number of papers have been proposed to realize background motion estimation (ME). The common approaches for background motion estimation include the directbased methods
[15
,
16]
and featurebased methods
[17

20]
. Directbased methods aim to find the unknown transforms using raw pixel intensities. Featurebased methods, on the other hand, first identify the feature correspondences between image pairs and then recover the transform considering the correspondence pairs. A method developed by
[17]
uses the stable relative distance between point sets to delete the local features like the local moving objects, covered points or the inevitable mismatches.
[18]
uses scaleinvariant feature transform (SIFT)
[21]
points to obtain a crude estimate of the projective transformation, and identifies the features from the moving objects by the difference in moving velocities between objects and the background.
[19]
achieves ME through detecting SIFT points and calculating the parameters of the projective transformation in a RANSAC process. A Gaussian distribution is used to create a background model and detect the distant moving objects. However, this method only applicable to runway scene.
[20]
develops a 3D camera motion model, which can be applied to general case.
We propose a method for estimating aerial video motion of a scene consisting of planar background, foreground moving objects and static 3D structures. This system, as shown in
Fig. 1
, aims at presenting a novel optimal derivative filters and projective invariant based ME algorithm, which can generate the accurate locations of the corner points and distinguish more accurate matching points from the less accurate ones, and the most accurate points that belong to the planar background will be used to estimate transformation model. The proposed method 1)selects a set of accurate feature points in each frame based on optimal derivative filters method, 2)finds the matching points between the feature points in the frames, 3)uses the projective invariant method that can distinguish more accurate matching points from the less accurate ones, and the most accurate points that belong to the planar background will be used to estimate transformation model, 4)performs the motion compensation with SageHusa Kalman filter
[22]
to stabilize the sequence.
Flowchart of the proposed aerial video stabilization algorithm.
In the following sections, related work in aerial video stabilization is reviewed, details of an automatic stabilization system are provided, and experimental results of the performance of the algorithm are presented and discussed.
2. Related Work
Image stabilization has been studied extensively over the past decade. In this paper, a special class of video stabilization is taken into account that the camera motion in a video of a dynamic scene is obtained by the moving camera. The challenge of image stabilization in such video is how to track the camera motion accurately without the influence caused by the moving object and static 3D structures in the images. A number of papers have been proposed to realize background stabilization. The common approaches for background stabilization include the directbased methods
[16
,
15]
and featurebased methods
[18
,
19
,
20]
. Directbased methods aim to find the unknown transforms using raw pixel intensities. Featurebased methods, on the other hand, first identify the feature correspondences between image pairs and then recover the transform considering the correspondence pairs. A method developed by Zhu J.J et al.
[17]
uses the stable relative distance between point sets to delete the local features like the local moving objects, covered points or the inevitable mismatches. Yang. J et al.
[18]
uses scaleinvariant feature transform points to obtain a crude estimate of the projective transformation, and identifies the features from the moving objects by the difference in moving velocities between objects and the background. Cheng.H.P et al.
[19]
achieves aerial video stabilization through detecting SIFT
[22]
points and calculating the parameters of the projective transformation in a RANSAC process, then a Gaussian distribution is used to create a background model and detect the distant moving objects, but this method only applicable to runway scene. Wang. J.M et al.
[20]
develops a 3D camera motion model, which can be applied to general case. In this paper, our electronic image stabilization is featurebased and a method for choosing the most accurate background feature points from a moving camera is proposed to stabilize the frames.
Various methods for detecting control points in an image have been developed, Schmid et al.
[23]
has surveyed and compared various point detectors, finding the Harris detector
[24
,
25]
to be most repeatable. Mikolajczyk
[26]
has proposed the scaleadapted Harris with automatic scale selection. However, this algorithm computes the image gradient based on discrete pixel differences, and finite differences can provide a very poor approximation to a derivative. In this work we resolve the above two shortcomings by applying optimal derivative filters. Optimal derivative filters method in general has emerged as an optimization of the rotationinvariance of the gradient operator
[27]
. It is aim to minimize the errors of in the estimated direction. We extend the application of optimal derivative filters to realize the accurate locations of the corners.
Feature matching is another important step in featurebased motion estimation. In recent years, several feature matching methods have successfully applied in image sequences motion estimation, such as invariant block matching
[28]
and feature point matching
[29]
. However, due to the noise and occlusion, some feature points displace even when their positions are detected with high accuracy. As a result, some matching points will be more accurate than others, which will affect the accuracy of video stabilization. The projective invariant
[30]
is a means to evaluate the geometrical invariability between images. This paper develops the projective invariant method that can distinguish more accurate matching points from the less accurate ones, and the most accurate points that belong to the planar background will be used to estimate transformation model.
The transformation model can be used to stabilize the video sequence by repositioning image frames in inverse direction of transformation model. However, digital image sequences acquired by airplane video camera are usually affected by unwanted positional fluctuations, which will affect the visual quality and impede the subsequent processes for various applications. Kalman filters
[14]
has been used to compensate the unwanted shaking of camera without the intentional camera motion. We will adopt a SageHusa Kalman filter
[31]
where the correction vector for each image frame is obtained as the difference between the filtered and original positions. This assumption helps to distinguish and preserve the intended camera motion.
2. Approach
When a video of a dynamic scene is captured by a moving aerial camera, knowing that two overlapping images are related by the projective transformation, a 2D model can well trade off the accuracy and computational complexity for EIS. Assuming (
x
,
y
) represents a point in the base image and (
X
,
Y
) represents the same point in the image overlapping the base image, the projective transformation between the two points in the images can be written as:
Having the coordinates of 4 corresponding points in the images, the unknown parameters
h
_{11}

h
_{33}
of the transformation can be determined by substituting the corresponding points into (1) and solving the obtained system of the linear equations.
 2.1 Feature Points Extraction Based on Optimal Derivative Filters
Feature detection is the first and critical step for image stabilization, and has been studied extensively in recent years
[32]
. We use Harris corner detector
[24]
in our stabilization framework because of the abundance of corners in aerial images. Harris detector, which uses the secondmoment matrix as the basis of its corner detections, describes the curvature of the autocorrelation function in the neighborhood. For an image
I
(
x
,
y
) , the Harris detector based on the secondmoment matrix can be expressed as:
where
h
is the Gaussian smoothing function.
G
is the traditional image gradient, which are given as follow:
where
is the general form of a linear phase Finite Impulse Response (FIR) and can be written as:
The Harris detector provides good repeatability under rotations and various illuminations; unfortunately, computing derivatives is sensitive to quantization noise, and the Harris corner detector has poor localization performance, particularly at certain junction types
[32]
. In this section, instead of using the traditional image gradients, we designed a new optimally firstorder derivative filter with more accurate location and rotationinvariance. Optimal derivative filters method in general have emerged as an optimization of the rotationinvariance of the gradient operator
[27]
. It aims to minimize the errors in the estimated direction. We extended the application of optimal derivative filters to realize the accurate locations of the corners.
The Fourier transform of
d
(
n
) is:
Ideally, our objective is to obtain the firstorder derivative transfer function
D
(
ω
) =
jω
. We can design the coefficients
to meet this function as closely as possible. Because the signal
I
(
x
) and its derivative
I_{x}
(
x
) are hard to get accurately in discrete domain, Here a pair of filters
P
and
d
is designed, and let [
I
*
p
](
x
) and [
I
*
d
](
x
) as an original and its derivative in a accurate form, respectively. We denote the filters pair by
P
(
ω
) and
D
(
ω
) in frequency domain. Then the error
jωP
(
ω
) 
D
(
ω
) can be minimized by a more accurate method. Then the weighted leastsquares error criterion for the designed filter is defined as:
This function is the form of Rayleigh quotient, so the result can be found using the Singular Value Decomposition(SVD). Then
p
(
n
) *
d
(
n
) is called rotationequivariant derivative filter
[27]
, which shows good accuracy and rotation invariance. We choose 5tap pair of filters to get good performance of precision. The resulting filter pair values are given in
Table. 1
.
Matched pairs of prefilterpand derivativedkernels for a 5tap
Matched pairs of prefilter p and derivative d kernels for a 5tap
A standard image as show in
Fig. 2
(a) is distorted by rotations with angles ranging from 10˚ to 90˚, and by zooms with scales ranging from 0.6 to1.5, Assume that we choose a feature point (
x
_{0}
,
y
_{0}
) in the standard image, it is straightforward to know that the actual positions of feature point have changed to (
x
_{1}
,
y
_{1}
) and (
x
_{2}
,
y
_{2}
), respectively, in the rotated and scaled images. Knowing the correspondence point (
x
'
_{i}
,
y
'
_{i}
), we can get the Euclidean distance
D
between the point (
x
'
_{i}
,
y
'
_{i}
) and the actual point position (
x
_{i}
,
y
_{i}
). Simulation results are given in
Fig.2
(d) and (e). It can be seen from the simulation results that the use of optimal derivative filters has consistently produced more accurate results than the use of Harris detector without optimal derivative filters.
Original image and the image after rotation and zooming. (a) Original image (126×126). (b) Image rotation 45° . (c) Image zoom (250×250) . (d) Corner error of the rotation angle. (e) Corner error of the scaling factor.
 2.2 Correspondence between Points
After feature points have been detected, the next step is to find the correspondences between two point sets. The process involves removing the outliers and estimating the parameters of transformation. RANSAC algorithm is introduced by Fishler and Bolles in 1981
[33]
, and this algorithm uses a distance threshold to find the transformation matrix which maps the greatest number of point pairs between two images. Due to its ability to tolerate a large fraction of outliers, the algorithm is a popular choice for robust estimation of transformation matrix. Its lowerbound computation complexity is very low. However, it may not find the correspondences until after a large number of iterations computed, so the upperbound computational cost of the RANSAC is substantial.
In a comparison study that involved several well known topological structures, the Delaunay Triangulation (DT)
[34]
was found to have the best structural stability under random positional perturbations.
For a set of points
p
_{1}
,
p
_{2}
,···
p_{n}
, we obtain the DT by first calculating its Voronoi Diagram (VD). The VD of a set of points is the division of a plane or space into regions for each point. The regions contain the part of the plane or space which is closer to that point than any other. With a given VD, the DT is the straight line dual of the VD. A set of points are shown in
Fig. 3
(a), their VD is shown in
Fig. 3
(b), while their DT is shown in
Fig. 3
(c).
(a) A set of points. (b) Voronoi Diagram. (c) Delaunay Triangulation.
In this research, we choose delaunay triangulation edges to find the matching pairs between feature points in overlapping images. An initial match between two point sets is obtained by selecting disjoint edge pairs in each DT that have the same transformation parameters. Before computing the parameters of the projective transformation, we will do some work to reduce the computation time. (i) For Delaunay Triangulation edges, the long DT edges are supposed to be more distinctive, and matching on long edges is considered to be more stable than matching the short ones. Therefore, we use only the longest 50 to 100 edges for feature matching. (ii)For aerial video images with general perspective changes (the viewpoints for the two images are not significantly different), the orientations of the corresponding edges in local areas should be relatively consistent. Therefore, we will use the edges with similar orientations (for example, the angle difference between the edge pairs is less than 10°). (iii) Strength contrast of corner response in a local region along both sides of the line can be used to further remove wrong candidates in the searching image. Assuming the equation expression of a line is
Ax
+
By
+
C
= 0 , for a local region centered at the line, the average corner response on one side of the line is
l
_{1}
, and the average corner response on the other side of the line is
l
_{2}
. A strength contrast
S
for each line is assigned as
l
_{2}

l
_{1}
, and then we have:
If the strength contrast
S
of two matching edges are not equal, then the candidate edge is not considered as a possible matching edge and is excluded from further matching process.
An example of point matching in this method is given in
Fig. 4
. The Delaunay Triangulation edges obtained from the 100 stable corner points are shown in each image. We can see that many of the same DT edges are found in two images.
Two aerial frames show 100 stable corner points along with the Delaunay Triangulation.
 2.3 Motion Parameters Estimation Based on Projective Invariant
The feature points detected by the optimized Harris detector are determined up to subpixel accuracy. Due to noise, 3D structures or moving objects in image sequences, some feature points displace when their positions are detected by optimal derivative filters. As a result, there are certain matching points remain more invariant than others.
There exist some image properties that remain invariant under projective transformation. For projective transformation, the most fundamental invariant is called the crossratio invariant. The crossratio can be defined for four collinear points or five coplanar points, the five coplanar points is most suitable to our problem as we already have matching points in the image.
The fivepoint crossratio invariance is defined as follows
[35]
. Given five points
A_{i}
,
i
= 1...5 in an image, The crossratio of five points is defined as:
where (Δ
A
_{1}
A
_{2}
A
_{4}
) is the oriented area of the triangle with vertices
A
_{1}
,
A
_{2}
and
A
_{4}
. Note that one point is shared by all four triangles and it is called the common point of the crossratio. It was shown
[36]
that the projective invariant of five points can be written as linear combination of four expressions:
The nontrivial projective invariant are unbounded function and can be written as:
If the feature point (
x
,
y
) in one frame and the coordinate point (
X
,
Y
) are related by the projective transformation, then by replacing (
x_{i}
,
y_{j}
) with (
X_{i}
,
Y_{j}
) in (8)(10), we expect
J
(
x
,
y
) =
J
(
X
,
Y
). If
J
(
x
,
y
) and
J
(
X
,
Y
) are not the same, the smaller their distance
Is, the higher the accuracy of the five matching points will be. Then we can select the best combination if the combination gives the smallest distance, and the best 5 matching points out of
n
will be selected to determine the parameters of the projective transformation.
An example using the projective constraint in image alignment is given in
Fig. 5
. As shown in
Fig. 5
(a)(b), the marked points on the local moving objects and on the 3D structures are obviously inaccurate matching points, and these feature points will probably result in transformation model estimation error. The distance
D
in (11) is calculated for combination of 5 most accurate points that belong to the background, and the combination can produce the smallest distance.
Fig. 5
(c) and (d) show absolute intensity difference of images registered using all the correspondences and using the best five correspondences, respectively. The difference between the two is significant.
(a), (b) Two images showing the matching feature points. (c) alignment result using all feature points. (d) alignment result using best five matching points.
 2.4 Interframe Motion Compensation
When a video of a dynamic scene is captured by a moving camera, two types of motion will be present in the video: one type is caused by the camera jitter and the second one is caused by the camera pan. Before motion compensation, it is clear that only the unwanted camera jitter should be removed by applying a low pass filter. Similar to Kalman filter
[1]
, SageHusa filter [37] is based on the following assumption: the intentional camera scan is in a smooth motion in a fixed direction; by contrast, the unwanted camera jitter’s variation and direction is more random. We can obtain the smooth motion component
x_{filter}
using adaptive filter, then the final jitter component
x_{jilter}
is the difference between original motion vector
x_{raw}
and smooth motion component, that is
x_{jilter}
=
x_{filter}

x_{raw}
.
SageHusa adaptive filter is designed on the basis of typical discrete Kalman filter. It takes advantage of measurement data to constantly modify system noise and measurement noise. The basic state estimate and update equations of SageHusa adaptive filter are given by:
where
K
represents the filter gain matrix; F is the state transfer matrix;
H
denotes the measurement matrix;
R
is the equivalent measurement noise matrix;
Q
refers to the equivalent state noise variance matrix;
P
(
k
1) is the prior state covariance matrix;
P
(
kk
1) is the state predicting covariance matrix.
The estimating equations of
are given by:
where
d
(
k
)=(1
b
)/(1
b^{k}
),
b
is the fading factor, and 0 <
b
< 1.
SageHusa’s adaptive Kalman filtering algorithm cannot estimate
Q
and
R
simultaneously when
R
and
Q
are all unknown. Possibility, measurement noise covariance matrix can easily cause filtering divergence phenomenon because of losing both positive definite form and semipositive definite form, so the stability and convergence cannot be fully guaranteed.
In this article, the innovation sequence
[31]
is chosen to predict the residual error. Measurement of residual error is as follow:
where
γ
represents reserve coefficient. When
γ
= 1, filtering algorithm achieves the optimal estimation result:
When the formula (16) is not satisfied, it’s indicating that the actual error is
γ
times over the theoretical value. Here, the weighted coefficient
C
(
k
) is considered to correct
P
(
k
｜
k
1):
Substituting the formula (17) into (16), we obtain:
From formula (12) to formula (18), we can get sagehusa adaptive Kalman filtering algorithm with innovation sequence.
3. Experimental Results
This section presents some examples and quantitative results of the proposed stabilization algorithm for aerial video sequences. The algorithm has been implemented in C++ and all experiments have been carried out on DELL Intel Xeon E5410 2.33GHz desktop computer with 9GB of RAM, with Windows 7 Enterprise Professional Edition.
Fig. 6
shows 8 sets of the unmanned aerial vehicle (UAV) video sequences that come from the predator data of VSAM at Carnegie Mellon University (
Fig.6
,No.1No.4) ,DARPA video sequence (
Fig.6
,No.5) and our aerial video data (
Fig.6
,No.6No.8) , with size 320×240, including rural roads, fields and urban buildings, Etc.
UAV video sequences.
In order to demonstrate the accuracy of the motion estimation method that uses optimal derivative filters and projective invariant technique, we applied proposed method on two different types of video images: planar background and complex urban scene, which are shown in
Fig. 7
and
Fig. 8
.
Alignment of planar background. (a) Two aerial images. (b) The matching point pairs. (c)Alignment result using HDAC. (d) Alignment result using ODFAC. (e) Alignment result using ODFOI.
Alignment of complex urban scene. (a) Two aerial images. (b) The matching point pairs (c) Alignment result using HDAC. (d) Alignment result using ODFAC. (e) Alignment result using ODFOI.
An example using images of planar background is given in
Fig. 7
. The feature matching between the two frames is shown in
Fig. 7
(b). The total number of correspondences is 12. The identified correspondences are marked using the same numbers (drawn in red) in both images. The most accurate five correspondence points are shown by yellow lines.
Fig. 7
. (c) shows the absolute intensity differences of images using Harris detector and all correspondence points (HDAC),
Fig. 7
(d) shows the alignment result using optimal derivative filters based Harris detector and all correspondence points (ODFAC),
Fig. 7
(e) shows the alignment result using optimal derivative filters and best five correspondences obtained by projective invariant technique (ODFOI). Rootmeansquared (RMS) difference between aligned images when not using optimal derivative filters and projective invariant is 12.856. When using the optimal derivative filters method, the RMS difference between the images is 11.964, while RMS difference between images using optimal derivative filters and five matching points best satisfying the projective invariant is 11.018. We can find that high values show moving cars in all three alignment results, and the difference among the three results is small, because the scene is mainly composed of planar background and moving objects, but the result using optimal derivative filters and projective invariant is better than other two results.
The second example using images of complex urban scene is shown in
Fig. 8
. Most of the images have local distortion. The matching points are also shown in
Fig. 8
(b). The total number of correspondences was 40. The difference image after alignment using HDAC,ODFAC and ODFOI are shown in
Fig. 8
(c), (d) and (e), respectively. The RMS differences when using HDAC,ODFAC and ODFOI are 15.031,13.426 and 10.995, respectively. We can see that our algorithm produced more accurate transformation model parameters. Highest values show moving cars in
Fig. 8
(e). High values are also found at some rectangular areas and parked cars in
Fig. 8
(d) and such errors can confuse the motion detection and tracking.
To examine the geometric fidelity of the motion estimation method, the crossratio invariance of four collinear points was used to determine the accuracy of motion estimation. The crossratio of four collinear points is the projective invariant of a quadruple of points. Given four collinear points
p
_{1}
,
p
_{2}
,
p
_{3}
and
p
_{4}
in one image, the crossratio is calculated by the following:
where Δ
_{ij}
denotes the Euclidean distance between two points
p_{i}
and
p_{j}
.
Supposing a line is drawn in the image, and four points are lying on the line. Firstly we calculate the crossratio
C_{r}
using the four collinear points, and then from three of the points in the aligned image, we calculate the location of the fourth point in the aligned image using crossratio invariance of four collinear points. Then, the distance between the calculated fourth point and the actual fourth point in the aligned image is used as error between the original image and the aligned image.
In order to evaluate the geometric fidelity of the motion estimation method, 8 sets of aerial video sequences were selected, as shown in
Fig. 6
, and each sequence contains 50 frames.
Table 2
shows more detailed test results of the feature points numbers and geometric fidelity errors using HDAC, ODFAC and ODFOI after registering frames. it is obvious that the average error using ODFOI is smaller than using HDAC and ODFAC, so we can obtain the more accurate alignment result.
Comparison of the HDAC, ODFAC and ODFOI algorithms
Comparison of the HDAC, ODFAC and ODFOI algorithms
To illustrate the stabilization results of the proposed algorithm, a comparison of video stabilization based on HDAC and ODFOI is depicted in
Fig. 9
and
Fig. 10
. Snapshot images of the planar background video sequence corresponding to Frames 30, 60, 90 and 120 are illustrated in
Fig. 9
(a). Frames 90, 118, 160 and 189 of the complex urban scene video sequence are shown in
Fig. 10
(a). It’s observed that the proposed algorithm well corrected the rotational and translational motions considering that there is intentional motion in the horizontal direction.
Comparison of video stabilization for the planar background video sequence: (a) Original image. (b) Stabilization result using HDAC. (c) Stabilization result using ODFOI.
Comparison of video stabilization for complex urban video sequence: (a) Original image. (b) Stabilization result using HDAC. (c) Stabilization result using ODFOI.
As discussed in section 2.4, intended aerial video is removed using SageHusa adaptive filter so that only unwanted jitter is removed during the stabilization process.
Fig. 11
shows how the method of estimating intended video motion presented in this paper performs on the UAV video of complex urban scene. The ODFOI method has a more steady change than HDAC method in the vertical direction, and successfully removes highfrequency jitter and smoothly follows the global motion trajectory.
The result of filtered y translation motion vectors
To make an objective evaluation of the image stabilization method between the stabilized image and the reference image, the peak signaltonoise ratio (PSNR) can be used as a measure. The larger the value of PSNR is, the smaller the interframe error is. The PSNR between consecutive images (
M
×
N
) It and It+1, called global transformation fidelity (GTF) , is defined as
where
M
and
N
are the width and height of the images, respectively; MSE denotes the mean square error calculated for the considered images.
The GTF index was used to evaluate motion compensation with respect to an initial reference image.
Fig. 12
displays the PSNR curves of
Fig. 6
. no 6 for considered system. Each point of the curves was calculated by varying the motion range of the image to be stabilized. The lower PSNR curve and upper PSNR curve are the GTF of the original and stabilized video sequences, respectively. As can be seen from the GTF, the curve that represents the uncompensated sequence is always below the compensated line. This means that the proposed method is able to compensate for unwanted motion. The PSNR values for the planar background video sequence and complex urban video sequence are listed in
Table. 3
, which are computed over 100 frames. It is observed that the PSNR of the ODFOI method is smaller than that of the HDAC, which means that the proposed method more robust to irregular conditions than that of the HDAC.
Comparison of interframe PSNR curve
PSNR of test video sequence
PSNR of test video sequence
The computation complexity of the proposed stabilization method is a funciton of image size, the number of video frames, the number of feautre points detected in each iamge, and the nmber of obtained correspondences. For the same number of feature points, the larger the image size, the more computation time will be needed to calculate the corners. Given an image of size
M
×
N
pixels, the computational complexity of the corner detector is on the order of
O
(
MN
). If
A
and
B
feature points are obtained in two frames, the computational complexity of the matching algorithm to find the correspondences is on the order of
O
(
A
^{2}
,
B
^{2}
). If
p
correspondences are found, the computational complexity of the projective constraint that finds the best 5 correspondences is on the order of
O
(
p
^{5}
). The best 5 correspondences are then used to calculate the projective transformation parameters to register the images.
The images used in this study had
M
=320 rows and
N
=240 columns. 100 feature points were selected in each image. These parameters produced about a dozen correspondences by the matching algorithm. The processing time of the proposed method is less than 50 ms per frame based on a 3.2 GHz computer.
4. Conclusion
A new alignment method for stabilizing video frames captured by a moving aerial camera was described. The effectiveness of our approach has been demonstrated through a series of experiments in critical conditions and the experimental results show that the proposed scheme carries out realtime aerial video stabilization under complex environments with change of scenes, and achieves precision stabilization. Future work will be devoted to extend the proposed method to stabilize very large dynamic scenes with nonplanar background, for example, apply the algorithm to stabilize subimages within the frames. Moreover, further investigations will incorporate proposed method into various applications such as textured image classification and moving objects tracking.
BIO
Meng yi (S’12) received the M.S. degree in Electrical Engineering from Northwestern Polytechnical University, Xi’an, China, in March 2008. Since 2009, he has been a Ph.D. of Electric circuit and systematic at Xidian University. Currently, he is a visiting doctoral candidate in Department of Electrical and Computer Engineering and Center for Automation Research, University of Maryland, College Park, maryland, USA. His research interests include computer vision, pattern recognition, signal processing and biometrics.
Baolong Guo received the M.S. and Ph.D. degrees from Xidian University in 1988 and 1995, respectively, all in communication and electronic system. From 1998 to 1999, he was a visiting scientist at Doshisha University, Japan. He is currently a full professor with the Institure of Intelligent Control and Image Engineering(ICIE) at Xidian University.His research interests include neural networks, pattern recognition, and image processing.
Litvin Andrey
,
Konrad Janusz
,
Karl William C.
2003
“Probabilistic video stabilization using Kalman filtering and mosaicking”
IS&T/SPIE Symposium on Electronic Imaging, Image and Video Communications and Proc
Article (CrossRef Link)
Matsushita Yasuyuki
,
Ofek Eyal
,
Tang Xiaoou
,
Shum HeungYeung
2005
“Fullframe Video Stabilization”
Microsoft Research Asia. CVPR
Article (CrossRef Link)
Liu Feng
,
Jin Hailin
2009
“Contentpreserving warps for 3D video stabilization”
Proceeding SIGGRAPH’09 ACM SIGGRAPH
vol.28, no.3, Article (CrossRef Link)
Mercenaro Lucio
,
Vernazza Gianni
,
Regazzoni Carlo S.
2001
“Image stabilization algorithms for videosurveillance applicaion”
In Proc. ICIP
Article (CrossRef Link)
349 
352
Zhao Tao
,
Nevatia Ram
2003
“Car detection in low resolution aerial images”
Image and vision computing
Article (CrossRef Link)
21
(8)
693 
703
DOI : 10.1016/S02628856(03)000647
Templeton Gary. F.
2006
“Video image stabilization and registration technology”
communications of the ACM
Article (CrossRef Link)
49
(2)
15 
18
DOI : 10.1145/1113034.1113053
Chuang ChiHan
,
Chang ChinChun
2010
“Multiple object motion detection for robust image stabilization using blocak based hough transform”
IIHMSP
Article (CrossRef Link)
623 
625
Hsu ShengChe
,
Liang ShengFu
,
Lin ChinTeng
2005
“A robust digital image stabilization technique based on inverse triangle method and background detection”
Transactions on Consumer Electronics
Article (CrossRef Link)
51
(2)
335 
345
DOI : 10.1109/TCE.2005.1467968
Giovanni Puglisi
,
Sebastiano Battiato
2011
“A robust image alignment algorithm for video stabilization purposes”
Transactions on circuits and systems for video technology
Article (CrossRef Link)
21
(10)
1390 
1401
DOI : 10.1109/TCSVT.2011.2162689
Sebastiano Battiato
,
Lukac Rastislav
2008
“Video stabilization techniques” Encyclopedia of Multimedia.
SpringerVerlag
New York
Article (CrossRef Link)
941 
945
Tordoff Ben
,
Murray David W
2002
“Guided sampling and consensus for motion estimation”
European Conference n Computer Vision
Article (CrossRef Link)
Vella Filippo
,
Castoorina Alfio
,
Mancuso Massimo
,
Messina Giuseppe
2002
“Digital image stabilization by adaptive block motion vectors filtering”
IEEE Trans. on Consumer Electronics
Article (CrossRef Link)
48
(3)
796 
801
DOI : 10.1109/TCE.2002.1037077
Ko SungJea
,
Lee SungHee
,
Lee KyungHoon
1998
“Digital image stabilizing algorithms based on bitplane matching”
IEEE Trans. on Consumer Electronics
Article (CrossRef Link)
44
(3)
617 
622
DOI : 10.1109/30.713172
Erturk S.
2003
“Digital image stabilization with subimage phase correlation based global motion estimation”
IEEE Trans. on Consumer Electronics
Article (CrossRef Link)
49
(4)
1320 
1325
DOI : 10.1109/TCE.2003.1261235
Chen H.
,
Liang C.K.
,
Peng Y.C.
,
Chang H.A
2007
“Integration of digital stabilizer with video codec for digital video cameras”
Trans.Circuits Syst. Video Technol
Article (CrossRef Link)
17
(7)
801 
813
DOI : 10.1109/TCSVT.2007.897113
Battiato Sebastiano
,
Bruna Arcangelo Ranieri
,
Puglisi Giovanni
2010
“A robust block based image/video registraiton approach for mobile imaging devices”
Trans.Multimedia
Article (CrossRef Link)
12
(7)
622 
635
DOI : 10.1109/TMM.2010.2060474
Zhu JuanJuan
,
Guo BaoLong
2009
“Global point tracking based panoramic image stabilization system”
Optoelectronics Letters
Article (CrossRef Link)
5
(1)
61 
63
DOI : 10.1007/s1180100980822
Yang Junlan
,
Schonfelda Dan
,
Mohamed Magdi
2009
“Robust video stabilization based on particle filter tracking of projected camera motion”
Trans.Circuits Syst.Video Technol
Article (CrossRef Link)
19
(7)
945 
954
DOI : 10.1109/TCSVT.2009.2020252
Pai ChengHua
,
Lin YuPing
,
Medioni Gerard G.
2007
“Moving Object Detection on a Runway Prior to Landing Using an Onboard Infrared Camera”
CVPR’07
Article (CrossRef Link)
17 
22
Wang J.M.
,
Chou H.P.
,
Chen S.W.
,
Fuh C.S.
2009
“Video stabilization for a handheld camera Based on 3D Motion Model”
ICIP
Article (CrossRef Link)
7 
10
Lowe David G.
2004
“Distinctive image features from scaleinvariant keypoints”
International Journal of Computer Vision
Article (CrossRef Link)
60
(2)
91 
110
DOI : 10.1023/B:VISI.0000029664.99615.94
Xu Lianming
,
Deng Zhongliang
,
Fang. Ling
2011
“Research on an AntiPerturbation Kalman Filter Algorithm”
Journal of Networks
Article (CrossRef Link)
6
(10)
1430 
1436
DOI : 10.4304/jnw.6.10.14301436
Schmid Cordelia
,
Mohr Roger
,
Bauckhage Christian
2000
“Evaluation of interest point detectors”
Int. J. Comput. Vis.
Article (CrossRef Link)
37
(2)
151 
172
DOI : 10.1023/A:1008199403446
Harris Chris
,
Stephens. Mike
1988
“A combined corner and edge detector”
Alvey vision conference
Article (CrossRef Link)
147 
152
Zhu Qing
,
Wu Bo
,
Wan Neng
2007
“A Subpixel location method for interest points by means of the harris interest strength”
Photogrammetric Record
Article (CrossRef Link)
22
(120)
321 
335
DOI : 10.1111/j.14779730.2007.00450.x
Mikolajczyk Krystian
,
Schmid Cordelia
2002
“An affine invariant interest point detector”
in Proc. of European Conference on Computer Vision, ECCV
Article (CrossRef Link)
128 
142
Farid Hany
,
Simoncelli Eero. P.
2004
Differentiation of discrete multidimensional signals
IEEE Transctions on Image Processing
Article (CrossRef Link)
13
(4)
496 
508
DOI : 10.1109/TIP.2004.823819
Xu Lidong
,
Lin Xinggang
2006
“Digital image stabilization based on circular block matching”
IEEE Trans. on Consumer Electronics
Article (CrossRef Link)
52
(2)
566 
574
DOI : 10.1109/TCE.2006.1649681
Zhu Qing
,
Zhang Yunsheng
,
Wu Bo
,
Zhang Yeting
2010
“Multiple Closerange Image Matching Based on a Selfadaptive Triangle Constraint”
Photogrammetric Record
Article (CrossRef Link)
25
(132)
437 
453
DOI : 10.1111/j.14779730.2010.00603.x
Trucco Emanuele
1992
“Geometric Invariance in Computer Vision. Cambridge”
MIT Press
MA
Article (CrossRef Link)
Xu Lianming
,
Deng Zhongliang
,
Fang Ling
2011
Research on an AntiPerturbation Kalman Filter Algorithm”
Journal of Networks
Article (CrossRef Link)
6
(10)
1430 
1436
DOI : 10.4304/jnw.6.10.14301436
Tuytelaars Tinne
,
Mikolajczyk Krystian
2008
“Local invariant feature detectors: a survey”
Foundations and Trends in Computer Graphics and Vision
Article (CrossRef Link)
3
(3)
177 
280
DOI : 10.1561/0600000017
Fischler Martin A.
,
Bolles Robert C.
1981
“Random sample consensus:a paradigm for model fitting with applications to image analysis and automated cartography”
Commun.ACM
Article (CrossRef Link)
381 
395
DOI : 10.1145/358669.358692
Preparata Franco P.
,
Shamos Michael I.
1985
“Computational geometry: An introduction”
SpringerVerlag
New York
Article (CrossRef Link)
95 
226
Meer Peter
,
Ramakrishna Sudhir
,
Lenz Reiner
1994
“Correspondence of coplanar features through P2invariant representations”
Applications of Invariance in Computer Vision
Article (CrossRef Link)
825
473 
492
Xie Chunming
,
Zhao Yan
,
Wang Jinan
2008
“Application of an improved adaptive Kalman filter to transfer alignment of airborne missile INS”
Proc. SPIE
Article (CrossRef Link)
7
(129)
260 
266