Advanced
Improved image alignment algorithm based on projective invariant for aerial video stabilization
Improved image alignment algorithm based on projective invariant for aerial video stabilization
KSII Transactions on Internet and Information Systems (TIIS). 2014. Sep, 8(9): 3177-3195
Copyright © 2014, Korean Society For Internet Information
  • Received : July 16, 2013
  • Accepted : November 06, 2013
  • Published : September 28, 2014
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Meng Yi
School of Electronic and Control Engineering, Chang’an University, Xi’an 710064, China
Bao-long Guo
Institute of Intelligent Control and Image Engineering, Xidian University Xi’an 710071, China
Chun-man Yan
College of Physics and Electronic Engineering, Northwest Normal University, Lanzhou, Gansu 730070, P. R. China

Abstract
In many moving object detection problems of an aerial video, accurate and robust stabilization is of critical importance. In this paper, a novel accurate image alignment algorithm for aerial electronic image stabilization (EIS) is described. The feature points are first selected using optimal derivative filters based Harris detector, which can improve differentiation accuracy and obtain the precise coordinates of feature points. Then we choose the Delaunay Triangulation edges to find the matching pairs between feature points in overlapping images. The most “useful” matching points that belong to the background are used to find the global transformation parameters using the projective invariant. Finally, intentional motion of the camera is accumulated for correction by Sage-Husa adaptive filtering. Experiment results illustrate that the proposed algorithm is applied to the aerial captured video sequences with various dynamic scenes for performance demonstrations.
Keywords
1. Introduction
W hen cameras are mounted on the unstable airplane platforms, it is barely possible to obtain a smooth motion because of undesired camera motions. Electronic image stabilization (EIS) is, therefore, becoming an indispensable technique in advanced digital cameras and camcorders. EIS can be defined as the process of removing unwanted video vibrations and obtaining stabilized image sequences [1 - 3] . It has been widely used in the areas of video surveillance, panorama stitching, robot localization and moving objects tracking [4 - 7] . However, making a stable video is a very challenging task especially when an motion of both camera (ego-motion and high-frequency motion) and foreground objects is present,The stabilization accuracy profoundly affects the stabilization quality and impedes the subsequent processes for various applications [8 , 9] .
The EIS mainly consists of two parts: motion estimation (ME) and motion compensation (MC). The ME is responsible for estimating the reliable global camera movement through three processing steps on the acquired image sequences (see [4] for an alternative scheme): feature detection, feauture matching, and transformation model estimation [10] ; the MC can preserve the panning motion of the camera while correcting the undesired fluctuation motions due to an unsteady and vibrating platform. Compared with MC, motion estimation (ME) plays the most important role in EIS and its estimation precision is a decisive step toward video stabilization [11] . In order to enable the use of aerial video in stabilization and reconnaissance missions, the motion estimation (ME) algorithms have to be robustness with respect to different conditions. The block matching algorithm (BMA) [12] , bit plane matching (BPM) [13] and phase correlation [14] are the most common ways to stabilize the translational jitter. In this paper, a special class of ME methods is considered that aligns the frames in an aerial video of a dynamic scene captured by a moving camera.
A number of papers have been proposed to realize background motion estimation (ME). The common approaches for background motion estimation include the direct-based methods [15 , 16] and feature-based methods [17 - 20] . Direct-based methods aim to find the unknown transforms using raw pixel intensities. Feature-based methods, on the other hand, first identify the feature correspondences between image pairs and then recover the transform considering the correspondence pairs. A method developed by [17] uses the stable relative distance between point sets to delete the local features like the local moving objects, covered points or the inevitable mismatches. [18] uses scale-invariant feature transform (SIFT) [21] points to obtain a crude estimate of the projective transformation, and identifies the features from the moving objects by the difference in moving velocities between objects and the background. [19] achieves ME through detecting SIFT points and calculating the parameters of the projective transformation in a RANSAC process. A Gaussian distribution is used to create a background model and detect the distant moving objects. However, this method only applicable to runway scene. [20] develops a 3D camera motion model, which can be applied to general case.
We propose a method for estimating aerial video motion of a scene consisting of planar background, foreground moving objects and static 3D structures. This system, as shown in Fig. 1 , aims at presenting a novel optimal derivative filters and projective invariant based ME algorithm, which can generate the accurate locations of the corner points and distinguish more accurate matching points from the less accurate ones, and the most accurate points that belong to the planar background will be used to estimate transformation model. The proposed method 1)selects a set of accurate feature points in each frame based on optimal derivative filters method, 2)finds the matching points between the feature points in the frames, 3)uses the projective invariant method that can distinguish more accurate matching points from the less accurate ones, and the most accurate points that belong to the planar background will be used to estimate transformation model, 4)performs the motion compensation with Sage-Husa Kalman filter [22] to stabilize the sequence.
PPT Slide
Lager Image
Flowchart of the proposed aerial video stabilization algorithm.
In the following sections, related work in aerial video stabilization is reviewed, details of an automatic stabilization system are provided, and experimental results of the performance of the algorithm are presented and discussed.
2. Related Work
Image stabilization has been studied extensively over the past decade. In this paper, a special class of video stabilization is taken into account that the camera motion in a video of a dynamic scene is obtained by the moving camera. The challenge of image stabilization in such video is how to track the camera motion accurately without the influence caused by the moving object and static 3-D structures in the images. A number of papers have been proposed to realize background stabilization. The common approaches for background stabilization include the direct-based methods [16 , 15] and feature-based methods [18 , 19 , 20] . Direct-based methods aim to find the unknown transforms using raw pixel intensities. Feature-based methods, on the other hand, first identify the feature correspondences between image pairs and then recover the transform considering the correspondence pairs. A method developed by Zhu J.J et al. [17] uses the stable relative distance between point sets to delete the local features like the local moving objects, covered points or the inevitable mismatches. Yang. J et al. [18] uses scale-invariant feature transform points to obtain a crude estimate of the projective transformation, and identifies the features from the moving objects by the difference in moving velocities between objects and the background. Cheng.H.P et al. [19] achieves aerial video stabilization through detecting SIFT [22] points and calculating the parameters of the projective transformation in a RANSAC process, then a Gaussian distribution is used to create a background model and detect the distant moving objects, but this method only applicable to runway scene. Wang. J.M et al. [20] develops a 3D camera motion model, which can be applied to general case. In this paper, our electronic image stabilization is feature-based and a method for choosing the most accurate background feature points from a moving camera is proposed to stabilize the frames.
Various methods for detecting control points in an image have been developed, Schmid et al. [23] has surveyed and compared various point detectors, finding the Harris detector [24 , 25] to be most repeatable. Mikolajczyk [26] has proposed the scale-adapted Harris with automatic scale selection. However, this algorithm computes the image gradient based on discrete pixel differences, and finite differences can provide a very poor approximation to a derivative. In this work we resolve the above two shortcomings by applying optimal derivative filters. Optimal derivative filters method in general has emerged as an optimization of the rotation-invariance of the gradient operator [27] . It is aim to minimize the errors of in the estimated direction. We extend the application of optimal derivative filters to realize the accurate locations of the corners.
Feature matching is another important step in feature-based motion estimation. In recent years, several feature matching methods have successfully applied in image sequences motion estimation, such as invariant block matching [28] and feature point matching [29] . However, due to the noise and occlusion, some feature points displace even when their positions are detected with high accuracy. As a result, some matching points will be more accurate than others, which will affect the accuracy of video stabilization. The projective invariant [30] is a means to evaluate the geometrical invariability between images. This paper develops the projective invariant method that can distinguish more accurate matching points from the less accurate ones, and the most accurate points that belong to the planar background will be used to estimate transformation model.
The transformation model can be used to stabilize the video sequence by repositioning image frames in inverse direction of transformation model. However, digital image sequences acquired by airplane video camera are usually affected by unwanted positional fluctuations, which will affect the visual quality and impede the sub-sequent processes for various applications. Kalman filters [14] has been used to compensate the unwanted shaking of camera without the intentional camera motion. We will adopt a Sage-Husa Kalman filter [31] where the correction vector for each image frame is obtained as the difference between the filtered and original positions. This assumption helps to distinguish and preserve the intended camera motion.
2. Approach
When a video of a dynamic scene is captured by a moving aerial camera, knowing that two overlapping images are related by the projective transformation, a 2-D model can well trade off the accuracy and computational complexity for EIS. Assuming ( x , y ) represents a point in the base image and ( X , Y ) represents the same point in the image overlapping the base image, the projective transformation between the two points in the images can be written as:
PPT Slide
Lager Image
Having the coordinates of 4 corresponding points in the images, the unknown parameters h 11 - h 33 of the transformation can be determined by substituting the corresponding points into (1) and solving the obtained system of the linear equations.
- 2.1 Feature Points Extraction Based on Optimal Derivative Filters
Feature detection is the first and critical step for image stabilization, and has been studied extensively in recent years [32] . We use Harris corner detector [24] in our stabilization framework because of the abundance of corners in aerial images. Harris detector, which uses the second-moment matrix as the basis of its corner detections, describes the curvature of the autocorrelation function in the neighborhood. For an image I ( x , y ) , the Harris detector based on the second-moment matrix can be expressed as:
PPT Slide
Lager Image
where h is the Gaussian smoothing function. G is the traditional image gradient, which are given as follow:
PPT Slide
Lager Image
where
PPT Slide
Lager Image
is the general form of a linear phase Finite Impulse Response (FIR) and can be written as:
PPT Slide
Lager Image
The Harris detector provides good repeatability under rotations and various illuminations; unfortunately, computing derivatives is sensitive to quantization noise, and the Harris corner detector has poor localization performance, particularly at certain junction types [32] . In this section, instead of using the traditional image gradients, we designed a new optimally first-order derivative filter with more accurate location and rotation-invariance. Optimal derivative filters method in general have emerged as an optimization of the rotation-invariance of the gradient operator [27] . It aims to minimize the errors in the estimated direction. We extended the application of optimal derivative filters to realize the accurate locations of the corners.
The Fourier transform of d ( n ) is:
PPT Slide
Lager Image
Ideally, our objective is to obtain the first-order derivative transfer function D ( ω ) = . We can design the coefficients
PPT Slide
Lager Image
to meet this function as closely as possible. Because the signal I ( x ) and its derivative Ix ( x ) are hard to get accurately in discrete domain, Here a pair of filters P and d is designed, and let [ I * p ]( x ) and [ I * d ]( x ) as an original and its derivative in a accurate form, respectively. We denote the filters pair by P ( ω ) and D ( ω ) in frequency domain. Then the error jωP ( ω ) - D ( ω ) can be minimized by a more accurate method. Then the weighted least-squares error criterion for the designed filter is defined as:
PPT Slide
Lager Image
This function is the form of Rayleigh quotient, so the result can be found using the Singular Value Decomposition(SVD). Then p ( n ) * d ( n ) is called rotation-equivariant derivative filter [27] , which shows good accuracy and rotation invariance. We choose 5-tap pair of filters to get good performance of precision. The resulting filter pair values are given in Table. 1 .
Matched pairs of prefilterpand derivativedkernels for a 5-tap
PPT Slide
Lager Image
Matched pairs of prefilter p and derivative d kernels for a 5-tap
A standard image as show in Fig. 2 (a) is distorted by rotations with angles ranging from 10˚ to 90˚, and by zooms with scales ranging from 0.6 to1.5, Assume that we choose a feature point ( x 0 , y 0 ) in the standard image, it is straightforward to know that the actual positions of feature point have changed to ( x 1 , y 1 ) and ( x 2 , y 2 ), respectively, in the rotated and scaled images. Knowing the correspondence point ( x ' i , y ' i ), we can get the Euclidean distance D between the point ( x ' i , y ' i ) and the actual point position ( x i , y i ). Simulation results are given in Fig.2 (d) and (e). It can be seen from the simulation results that the use of optimal derivative filters has consistently produced more accurate results than the use of Harris detector without optimal derivative filters.
PPT Slide
Lager Image
Original image and the image after rotation and zooming. (a) Original image (126×126). (b) Image rotation 45° . (c) Image zoom (250×250) . (d) Corner error of the rotation angle. (e) Corner error of the scaling factor.
- 2.2 Correspondence between Points
After feature points have been detected, the next step is to find the correspondences between two point sets. The process involves removing the outliers and estimating the parameters of transformation. RANSAC algorithm is introduced by Fishler and Bolles in 1981 [33] , and this algorithm uses a distance threshold to find the transformation matrix which maps the greatest number of point pairs between two images. Due to its ability to tolerate a large fraction of outliers, the algorithm is a popular choice for robust estimation of transformation matrix. Its lower-bound computation complexity is very low. However, it may not find the correspondences until after a large number of iterations computed, so the upper-bound computational cost of the RANSAC is substantial.
In a comparison study that involved several well known topological structures, the Delaunay Triangulation (DT) [34] was found to have the best structural stability under random positional perturbations.
For a set of points p 1 , p 2 ,··· pn , we obtain the DT by first calculating its Voronoi Diagram (VD). The VD of a set of points is the division of a plane or space into regions for each point. The regions contain the part of the plane or space which is closer to that point than any other. With a given VD, the DT is the straight line dual of the VD. A set of points are shown in Fig. 3 (a), their VD is shown in Fig. 3 (b), while their DT is shown in Fig. 3 (c).
PPT Slide
Lager Image
(a) A set of points. (b) Voronoi Diagram. (c) Delaunay Triangulation.
In this research, we choose delaunay triangulation edges to find the matching pairs between feature points in overlapping images. An initial match between two point sets is obtained by selecting disjoint edge pairs in each DT that have the same transformation parameters. Before computing the parameters of the projective transformation, we will do some work to reduce the computation time. (i) For Delaunay Triangulation edges, the long DT edges are supposed to be more distinctive, and matching on long edges is considered to be more stable than matching the short ones. Therefore, we use only the longest 50 to 100 edges for feature matching. (ii)For aerial video images with general perspective changes (the viewpoints for the two images are not significantly different), the orientations of the corresponding edges in local areas should be relatively consistent. Therefore, we will use the edges with similar orientations (for example, the angle difference between the edge pairs is less than 10°). (iii) Strength contrast of corner response in a local region along both sides of the line can be used to further remove wrong candidates in the searching image. Assuming the equation expression of a line is Ax + By + C = 0 , for a local region centered at the line, the average corner response on one side of the line is l 1 , and the average corner response on the other side of the line is l 2 . A strength contrast S for each line is assigned as l 2 - l 1 , and then we have:
PPT Slide
Lager Image
If the strength contrast S of two matching edges are not equal, then the candidate edge is not considered as a possible matching edge and is excluded from further matching process.
An example of point matching in this method is given in Fig. 4 . The Delaunay Triangulation edges obtained from the 100 stable corner points are shown in each image. We can see that many of the same DT edges are found in two images.
PPT Slide
Lager Image
Two aerial frames show 100 stable corner points along with the Delaunay Triangulation.
- 2.3 Motion Parameters Estimation Based on Projective Invariant
The feature points detected by the optimized Harris detector are determined up to sub-pixel accuracy. Due to noise, 3-D structures or moving objects in image sequences, some feature points displace when their positions are detected by optimal derivative filters. As a result, there are certain matching points remain more invariant than others.
There exist some image properties that remain invariant under projective transformation. For projective transformation, the most fundamental invariant is called the cross-ratio invariant. The cross-ratio can be defined for four collinear points or five coplanar points, the five coplanar points is most suitable to our problem as we already have matching points in the image.
The five-point cross-ratio invariance is defined as follows [35] . Given five points Ai , i = 1...5 in an image, The cross-ratio of five points is defined as:
PPT Slide
Lager Image
where (Δ A 1 A 2 A 4 ) is the oriented area of the triangle with vertices A 1 , A 2 and A 4 . Note that one point is shared by all four triangles and it is called the common point of the cross-ratio. It was shown [36] that the projective invariant of five points can be written as linear combination of four expressions:
PPT Slide
Lager Image
The nontrivial projective invariant are unbounded function and can be written as:
PPT Slide
Lager Image
If the feature point ( x , y ) in one frame and the coordinate point ( X , Y ) are related by the projective transformation, then by replacing ( xi , yj ) with ( Xi , Yj ) in (8)-(10), we expect J ( x , y ) = J ( X , Y ). If J ( x , y ) and J ( X , Y ) are not the same, the smaller their distance
PPT Slide
Lager Image
Is, the higher the accuracy of the five matching points will be. Then we can select the best combination if the combination gives the smallest distance, and the best 5 matching points out of n will be selected to determine the parameters of the projective transformation.
An example using the projective constraint in image alignment is given in Fig. 5 . As shown in Fig. 5 (a)-(b), the marked points on the local moving objects and on the 3-D structures are obviously inaccurate matching points, and these feature points will probably result in transformation model estimation error. The distance D in (11) is calculated for combination of 5 most accurate points that belong to the background, and the combination can produce the smallest distance. Fig. 5 (c) and (d) show absolute intensity difference of images registered using all the correspondences and using the best five correspondences, respectively. The difference between the two is significant.
PPT Slide
Lager Image
(a), (b) Two images showing the matching feature points. (c) alignment result using all feature points. (d) alignment result using best five matching points.
- 2.4 Inter-frame Motion Compensation
When a video of a dynamic scene is captured by a moving camera, two types of motion will be present in the video: one type is caused by the camera jitter and the second one is caused by the camera pan. Before motion compensation, it is clear that only the unwanted camera jitter should be removed by applying a low pass filter. Similar to Kalman filter [1] , Sage-Husa filter [37] is based on the following assumption: the intentional camera scan is in a smooth motion in a fixed direction; by contrast, the unwanted camera jitter’s variation and direction is more random. We can obtain the smooth motion component xfilter using adaptive filter, then the final jitter component xjilter is the difference between original motion vector xraw and smooth motion component, that is xjilter = xfilter - xraw .
Sage-Husa adaptive filter is designed on the basis of typical discrete Kalman filter. It takes advantage of measurement data to constantly modify system noise and measurement noise. The basic state estimate and update equations of Sage-Husa adaptive filter are given by:
PPT Slide
Lager Image
where K represents the filter gain matrix; F is the state transfer matrix; H denotes the measurement matrix; R is the equivalent measurement noise matrix; Q refers to the equivalent state noise variance matrix; P ( k -1) is the prior state covariance matrix; P ( kk -1) is the state predicting covariance matrix.
The estimating equations of
PPT Slide
Lager Image
are given by:
PPT Slide
Lager Image
PPT Slide
Lager Image
where d ( k )=(1- b )/(1- bk ), b is the fading factor, and 0 < b < 1.
Sage-Husa’s adaptive Kalman filtering algorithm cannot estimate Q and R simultaneously when R and Q are all unknown. Possibility, measurement noise covariance matrix can easily cause filtering divergence phenomenon because of losing both positive definite form and semi-positive definite form, so the stability and convergence cannot be fully guaranteed.
In this article, the innovation sequence [31] is chosen to predict the residual error. Measurement of residual error is as follow:
PPT Slide
Lager Image
where γ represents reserve coefficient. When γ = 1, filtering algorithm achieves the optimal estimation result:
PPT Slide
Lager Image
When the formula (16) is not satisfied, it’s indicating that the actual error is γ times over the theoretical value. Here, the weighted coefficient C ( k ) is considered to correct P ( k k -1):
PPT Slide
Lager Image
Substituting the formula (17) into (16), we obtain:
PPT Slide
Lager Image
From formula (12) to formula (18), we can get sage-husa adaptive Kalman filtering algorithm with innovation sequence.
3. Experimental Results
This section presents some examples and quantitative results of the proposed stabilization algorithm for aerial video sequences. The algorithm has been implemented in C++ and all experiments have been carried out on DELL Intel Xeon E5410 2.33-GHz desktop computer with 9GB of RAM, with Windows 7 Enterprise Professional Edition. Fig. 6 shows 8 sets of the unmanned aerial vehicle (UAV) video sequences that come from the predator data of VSAM at Carnegie Mellon University ( Fig.6 ,No.1-No.4) ,DARPA video sequence ( Fig.6 ,No.5) and our aerial video data ( Fig.6 ,No.6-No.8) , with size 320×240, including rural roads, fields and urban buildings, Etc.
PPT Slide
Lager Image
UAV video sequences.
In order to demonstrate the accuracy of the motion estimation method that uses optimal derivative filters and projective invariant technique, we applied proposed method on two different types of video images: planar background and complex urban scene, which are shown in Fig. 7 and Fig. 8 .
PPT Slide
Lager Image
Alignment of planar background. (a) Two aerial images. (b) The matching point pairs. (c)Alignment result using HDAC. (d) Alignment result using ODFAC. (e) Alignment result using ODFOI.
PPT Slide
Lager Image
Alignment of complex urban scene. (a) Two aerial images. (b) The matching point pairs (c) Alignment result using HDAC. (d) Alignment result using ODFAC. (e) Alignment result using ODFOI.
An example using images of planar background is given in Fig. 7 . The feature matching between the two frames is shown in Fig. 7 (b). The total number of correspondences is 12. The identified correspondences are marked using the same numbers (drawn in red) in both images. The most accurate five correspondence points are shown by yellow lines. Fig. 7 . (c) shows the absolute intensity differences of images using Harris detector and all correspondence points (HDAC), Fig. 7 (d) shows the alignment result using optimal derivative filters based Harris detector and all correspondence points (ODFAC), Fig. 7 (e) shows the alignment result using optimal derivative filters and best five correspondences obtained by projective invariant technique (ODFOI). Root-mean-squared (RMS) difference between aligned images when not using optimal derivative filters and projective invariant is 12.856. When using the optimal derivative filters method, the RMS difference between the images is 11.964, while RMS difference between images using optimal derivative filters and five matching points best satisfying the projective invariant is 11.018. We can find that high values show moving cars in all three alignment results, and the difference among the three results is small, because the scene is mainly composed of planar background and moving objects, but the result using optimal derivative filters and projective invariant is better than other two results.
The second example using images of complex urban scene is shown in Fig. 8 . Most of the images have local distortion. The matching points are also shown in Fig. 8 (b). The total number of correspondences was 40. The difference image after alignment using HDAC,ODFAC and ODFOI are shown in Fig. 8 (c), (d) and (e), respectively. The RMS differences when using HDAC,ODFAC and ODFOI are 15.031,13.426 and 10.995, respectively. We can see that our algorithm produced more accurate transformation model parameters. Highest values show moving cars in Fig. 8 (e). High values are also found at some rectangular areas and parked cars in Fig. 8 (d) and such errors can confuse the motion detection and tracking.
To examine the geometric fidelity of the motion estimation method, the cross-ratio invariance of four collinear points was used to determine the accuracy of motion estimation. The cross-ratio of four collinear points is the projective invariant of a quadruple of points. Given four collinear points p 1 , p 2 , p 3 and p 4 in one image, the cross-ratio is calculated by the following:
PPT Slide
Lager Image
where Δ ij denotes the Euclidean distance between two points pi and pj .
Supposing a line is drawn in the image, and four points are lying on the line. Firstly we calculate the cross-ratio Cr using the four collinear points, and then from three of the points in the aligned image, we calculate the location of the fourth point in the aligned image using cross-ratio invariance of four collinear points. Then, the distance between the calculated fourth point and the actual fourth point in the aligned image is used as error between the original image and the aligned image.
In order to evaluate the geometric fidelity of the motion estimation method, 8 sets of aerial video sequences were selected, as shown in Fig. 6 , and each sequence contains 50 frames. Table 2 shows more detailed test results of the feature points numbers and geometric fidelity errors using HDAC, ODFAC and ODFOI after registering frames. it is obvious that the average error using ODFOI is smaller than using HDAC and ODFAC, so we can obtain the more accurate alignment result.
Comparison of the HDAC, ODFAC and ODFOI algorithms
PPT Slide
Lager Image
Comparison of the HDAC, ODFAC and ODFOI algorithms
To illustrate the stabilization results of the proposed algorithm, a comparison of video stabilization based on HDAC and ODFOI is depicted in Fig. 9 and Fig. 10 . Snapshot images of the planar background video sequence corresponding to Frames 30, 60, 90 and 120 are illustrated in Fig. 9 (a). Frames 90, 118, 160 and 189 of the complex urban scene video sequence are shown in Fig. 10 (a). It’s observed that the proposed algorithm well corrected the rotational and translational motions considering that there is intentional motion in the horizontal direction.
PPT Slide
Lager Image
Comparison of video stabilization for the planar background video sequence: (a) Original image. (b) Stabilization result using HDAC. (c) Stabilization result using ODFOI.
PPT Slide
Lager Image
Comparison of video stabilization for complex urban video sequence: (a) Original image. (b) Stabilization result using HDAC. (c) Stabilization result using ODFOI.
As discussed in section 2.4, intended aerial video is removed using Sage-Husa adaptive filter so that only unwanted jitter is removed during the stabilization process. Fig. 11 shows how the method of estimating intended video motion presented in this paper performs on the UAV video of complex urban scene. The ODFOI method has a more steady change than HDAC method in the vertical direction, and successfully removes high-frequency jitter and smoothly follows the global motion trajectory.
PPT Slide
Lager Image
The result of filtered y translation motion vectors
To make an objective evaluation of the image stabilization method between the stabilized image and the reference image, the peak signal-to-noise ratio (PSNR) can be used as a measure. The larger the value of PSNR is, the smaller the inter-frame error is. The PSNR between consecutive images ( M × N ) It and It+1, called global transformation fidelity (GTF) , is defined as
PPT Slide
Lager Image
PPT Slide
Lager Image
where M and N are the width and height of the images, respectively; MSE denotes the mean square error calculated for the considered images.
The GTF index was used to evaluate motion compensation with respect to an initial reference image. Fig. 12 displays the PSNR curves of Fig. 6 . no 6 for considered system. Each point of the curves was calculated by varying the motion range of the image to be stabilized. The lower PSNR curve and upper PSNR curve are the GTF of the original and stabilized video sequences, respectively. As can be seen from the GTF, the curve that represents the uncompensated sequence is always below the compensated line. This means that the proposed method is able to compensate for unwanted motion. The PSNR values for the planar background video sequence and complex urban video sequence are listed in Table. 3 , which are computed over 100 frames. It is observed that the PSNR of the ODFOI method is smaller than that of the HDAC, which means that the proposed method more robust to irregular conditions than that of the HDAC.
PPT Slide
Lager Image
Comparison of inter-frame PSNR curve
PSNR of test video sequence
PPT Slide
Lager Image
PSNR of test video sequence
The computation complexity of the proposed stabilization method is a funciton of image size, the number of video frames, the number of feautre points detected in each iamge, and the nmber of obtained correspondences. For the same number of feature points, the larger the image size, the more computation time will be needed to calculate the corners. Given an image of size M × N pixels, the computational complexity of the corner detector is on the order of O ( MN ). If A and B feature points are obtained in two frames, the computational complexity of the matching algorithm to find the correspondences is on the order of O ( A 2 , B 2 ). If p correspondences are found, the computational complexity of the projective constraint that finds the best 5 correspondences is on the order of O ( p 5 ). The best 5 correspondences are then used to calculate the projective transformation parameters to register the images.
The images used in this study had M =320 rows and N =240 columns. 100 feature points were selected in each image. These parameters produced about a dozen correspondences by the matching algorithm. The processing time of the proposed method is less than 50 ms per frame based on a 3.2 GHz computer.
4. Conclusion
A new alignment method for stabilizing video frames captured by a moving aerial camera was described. The effectiveness of our approach has been demonstrated through a series of experiments in critical conditions and the experimental results show that the proposed scheme carries out real-time aerial video stabilization under complex environments with change of scenes, and achieves precision stabilization. Future work will be devoted to extend the proposed method to stabilize very large dynamic scenes with non-planar background, for example, apply the algorithm to stabilize sub-images within the frames. Moreover, further investigations will incorporate proposed method into various applications such as textured image classification and moving objects tracking.
BIO
Meng yi (S’12) received the M.S. degree in Electrical Engineering from Northwestern Polytechnical University, Xi’an, China, in March 2008. Since 2009, he has been a Ph.D. of Electric circuit and systematic at Xidian University. Currently, he is a visiting doctoral candidate in Department of Electrical and Computer Engineering and Center for Automation Research, University of Maryland, College Park, maryland, USA. His research interests include computer vision, pattern recognition, signal processing and biometrics.
Baolong Guo received the M.S. and Ph.D. degrees from Xidian University in 1988 and 1995, respectively, all in communication and electronic system. From 1998 to 1999, he was a visiting scientist at Doshisha University, Japan. He is currently a full professor with the Institure of Intelligent Control and Image Engineering(ICIE) at Xidian University.His research interests include neural networks, pattern recognition, and image processing.
References
Litvin Andrey , Konrad Janusz , Karl William C. 2003 “Probabilistic video stabilization using Kalman filtering and mosaicking” IS&T/SPIE Symposium on Electronic Imaging, Image and Video Communications and Proc Article (CrossRef Link)
Matsushita Yasuyuki , Ofek Eyal , Tang Xiaoou , Shum Heung-Yeung 2005 “Full-frame Video Stabilization” Microsoft Research Asia. CVPR Article (CrossRef Link)
Liu Feng , Jin Hailin 2009 “Content-preserving warps for 3D video stabilization” Proceeding SIGGRAPH’09 ACM SIGGRAPH vol.28, no.3, Article (CrossRef Link)
Mercenaro Lucio , Vernazza Gianni , Regazzoni Carlo S. 2001 “Image stabilization algorithms for video-surveillance applicaion” In Proc. ICIP Article (CrossRef Link) 349 - 352
Zhao Tao , Nevatia Ram 2003 “Car detection in low resolution aerial images” Image and vision computing Article (CrossRef Link) 21 (8) 693 - 703    DOI : 10.1016/S0262-8856(03)00064-7
Templeton Gary. F. 2006 “Video image stabilization and registration technology” communications of the ACM Article (CrossRef Link) 49 (2) 15 - 18    DOI : 10.1145/1113034.1113053
Chuang Chi-Han , Chang Chin-Chun 2010 “Multiple object motion detection for robust image stabilization using blocak based hough transform” IIH-MSP Article (CrossRef Link) 623 - 625
Hsu Sheng-Che , Liang Sheng-Fu , Lin Chin-Teng 2005 “A robust digital image stabilization technique based on inverse triangle method and background detection” Transactions on Consumer Electronics Article (CrossRef Link) 51 (2) 335 - 345    DOI : 10.1109/TCE.2005.1467968
Giovanni Puglisi , Sebastiano Battiato 2011 “A robust image alignment algorithm for video stabilization purposes” Transactions on circuits and systems for video technology Article (CrossRef Link) 21 (10) 1390 - 1401    DOI : 10.1109/TCSVT.2011.2162689
Sebastiano Battiato , Lukac Rastislav 2008 “Video stabilization techniques” Encyclopedia of Multimedia. Springer-Verlag New York Article (CrossRef Link) 941 - 945
Tordoff Ben , Murray David W 2002 “Guided sampling and consensus for motion estimation” European Conference n Computer Vision Article (CrossRef Link)
Vella Filippo , Castoorina Alfio , Mancuso Massimo , Messina Giuseppe 2002 “Digital image stabilization by adaptive block motion vectors filtering” IEEE Trans. on Consumer Electronics Article (CrossRef Link) 48 (3) 796 - 801    DOI : 10.1109/TCE.2002.1037077
Ko Sung-Jea , Lee Sung-Hee , Lee Kyung-Hoon 1998 “Digital image stabilizing algorithms based on bit-plane matching” IEEE Trans. on Consumer Electronics Article (CrossRef Link) 44 (3) 617 - 622    DOI : 10.1109/30.713172
Erturk S. 2003 “Digital image stabilization with sub-image phase correlation based global motion estimation” IEEE Trans. on Consumer Electronics Article (CrossRef Link) 49 (4) 1320 - 1325    DOI : 10.1109/TCE.2003.1261235
Chen H. , Liang C.K. , Peng Y.C. , Chang H.A 2007 “Integration of digital stabilizer with video codec for digital video cameras” Trans.Circuits Syst. Video Technol Article (CrossRef Link) 17 (7) 801 - 813    DOI : 10.1109/TCSVT.2007.897113
Battiato Sebastiano , Bruna Arcangelo Ranieri , Puglisi Giovanni 2010 “A robust block based image/video registraiton approach for mobile imaging devices” Trans.Multimedia Article (CrossRef Link) 12 (7) 622 - 635    DOI : 10.1109/TMM.2010.2060474
Zhu Juan-Juan , Guo Bao-Long 2009 “Global point tracking based panoramic image stabilization system” Optoelectronics Letters Article (CrossRef Link) 5 (1) 61 - 63    DOI : 10.1007/s11801-009-8082-2
Yang Junlan , Schonfelda Dan , Mohamed Magdi 2009 “Robust video stabilization based on particle filter tracking of projected camera motion” Trans.Circuits Syst.Video Technol Article (CrossRef Link) 19 (7) 945 - 954    DOI : 10.1109/TCSVT.2009.2020252
Pai Cheng-Hua , Lin Yu-Ping , Medioni Gerard G. 2007 “Moving Object Detection on a Runway Prior to Landing Using an Onboard Infrared Camera” CVPR’07 Article (CrossRef Link) 17 - 22
Wang J.M. , Chou H.P. , Chen S.W. , Fuh C.S. 2009 “Video stabilization for a hand-held camera Based on 3D Motion Model” ICIP Article (CrossRef Link) 7 - 10
Lowe David G. 2004 “Distinctive image features from scale-invariant keypoints” International Journal of Computer Vision Article (CrossRef Link) 60 (2) 91 - 110    DOI : 10.1023/B:VISI.0000029664.99615.94
Xu Lianming , Deng Zhongliang , Fang. Ling 2011 “Research on an Anti-Perturbation Kalman Filter Algorithm” Journal of Networks Article (CrossRef Link) 6 (10) 1430 - 1436    DOI : 10.4304/jnw.6.10.1430-1436
Schmid Cordelia , Mohr Roger , Bauckhage Christian 2000 “Evaluation of interest point detectors” Int. J. Comput. Vis. Article (CrossRef Link) 37 (2) 151 - 172    DOI : 10.1023/A:1008199403446
Harris Chris , Stephens. Mike 1988 “A combined corner and edge detector” Alvey vision conference Article (CrossRef Link) 147 - 152
Zhu Qing , Wu Bo , Wan Neng 2007 “A Subpixel location method for interest points by means of the harris interest strength” Photogrammetric Record Article (CrossRef Link) 22 (120) 321 - 335    DOI : 10.1111/j.1477-9730.2007.00450.x
Mikolajczyk Krystian , Schmid Cordelia 2002 “An affine invariant interest point detector” in Proc. of European Conference on Computer Vision, ECCV Article (CrossRef Link) 128 - 142
Farid Hany , Simoncelli Eero. P. 2004 Differentiation of discrete multi-dimensional signals IEEE Transctions on Image Processing Article (CrossRef Link) 13 (4) 496 - 508    DOI : 10.1109/TIP.2004.823819
Xu Lidong , Lin Xinggang 2006 “Digital image stabilization based on circular block matching” IEEE Trans. on Consumer Electronics Article (CrossRef Link) 52 (2) 566 - 574    DOI : 10.1109/TCE.2006.1649681
Zhu Qing , Zhang Yunsheng , Wu Bo , Zhang Yeting 2010 “Multiple Close-range Image Matching Based on a Self-adaptive Triangle Constraint” Photogrammetric Record Article (CrossRef Link) 25 (132) 437 - 453    DOI : 10.1111/j.1477-9730.2010.00603.x
Trucco Emanuele 1992 “Geometric Invariance in Computer Vision. Cambridge” MIT Press MA Article (CrossRef Link)
Xu Lianming , Deng Zhongliang , Fang Ling 2011 Research on an Anti-Perturbation Kalman Filter Algorithm” Journal of Networks Article (CrossRef Link) 6 (10) 1430 - 1436    DOI : 10.4304/jnw.6.10.1430-1436
Tuytelaars Tinne , Mikolajczyk Krystian 2008 “Local invariant feature detectors: a survey” Foundations and Trends in Computer Graphics and Vision Article (CrossRef Link) 3 (3) 177 - 280    DOI : 10.1561/0600000017
Fischler Martin A. , Bolles Robert C. 1981 “Random sample consensus:a paradigm for model fitting with applications to image analysis and automated cartography” Commun.ACM Article (CrossRef Link) 381 - 395    DOI : 10.1145/358669.358692
Preparata Franco P. , Shamos Michael I. 1985 “Computational geometry: An introduction” Springer-Verlag New York Article (CrossRef Link) 95 - 226
Meer Peter , Ramakrishna Sudhir , Lenz Reiner 1994 “Correspondence of coplanar features through P2-invariant representations” Applications of Invariance in Computer Vision Article (CrossRef Link) 825 473 - 492
Xie Chun-ming , Zhao Yan , Wang Ji-nan 2008 “Application of an improved adaptive Kalman filter to transfer alignment of airborne missile INS” Proc. SPIE Article (CrossRef Link) 7 (129) 260 - 266