Autonomous vehicle technology based on information technology and software will lead the automotive industry in the near future. Vehicle localization technology is a core expertise geared toward developing autonomous vehicles and will provide location information for control and decision. This paper proposes an effective vision-based localization technology to be applied to autonomous vehicles. In particular, the proposed technology makes use of numerical maps that are widely used in the field of geographic information systems and that have already been built in advance. Optimum vehicle ego-motion estimation and road marking feature extraction techniques are adopted and then combined by an extended Kalman filter and particle filter to make up the localization technology. The implementation results of this paper show remarkable results; namely, an 18 ms mean processing time and 10 cm location error. In addition, autonomous driving and parking are successfully completed with an unmanned vehicle within a 300 m × 500 m space.
Researches into autonomous vehicles by numerous companies and research institutions are on the rise. Owing to the achievements of such researches, it is clear that most vehicles will become either partially or fully unmanned in the near future. Localization is a core technology for an autonomous vehicle and belongs to the recognition part of the recognition, decision, and control technologies. Localization technology is particularly needed for the completion of other technologies [1]. This paper proposes an effective localization technology using only numerical maps and image information.Researches on vehicle localization conducted thus far have been focused on general roads [2]–[10]. Meanwhile, this paper aims at providing an autonomous valet parking (AVP) service. AVP is an extended technology of an already commercialized parking assistance system and is used to park in a surrounding parking space on behalf of a driver who just finished driving. AVP has been forecasted to reach a commercial model at the initial stage of autonomous vehicles.For AVP, precise location and quick response times are required to safely park a vehicle in a narrow parking space. Because vehicles frequently move between buildings or in underground spaces and park in a single spot within a wide parking lot, the use of an existing global positioning system (GPS) may be impossible. For commercialization, a low cost is also required.The purpose of this research is to develop a fast and precise localization technology that does not require the aid of a GPS, while at the same time keeping its cost to an absolute minimum to commercialize it in the near future. Toward this end, this study developed an efficient localization technology that makes use of only numerical maps, has a small capacity, uses low-cost image sensors (cameras), and that can accommodate large amounts of data with small storage.Numerical maps are widely used in a geographic information system (GIS) or navigation system and have vector-type data reflecting the real world through measurements; therefore, they can be expressed with a small amount of data. With standardization, numerical maps can be compatible with most systems. Moreover, they can be updated immediately through wireless communications [11]; thus, their utilization value is significantly high.This paper proposes both a visual odometry (VO)–based vehicle ego-motion estimation (EE) technology and a road-marking extraction technology for the area surrounding a vehicle. In addition, this study proposes a method for finding and optimizing the location of a vehicle using numerical maps. By applying the localization technology to the autonomous vehicle shown in Fig. 1, a mean processing time of 18 ms and a 10 cm location error were acquired, which are remarkable performance metrics. This study verified the usefulness of localization technology by carrying out an unmanned autonomous driving and parking experiment.
Our autonomous vehicles: the front one is a sport utility vehicle, and the other one is an electrical vehicle. Each vehicle is equipped with six cameras (front stereo, rear stereo, and each side), two LIDAR sensors, and a high-performance GPS/INS system.
The main contribution of this paper is that it introduces a precision localization method using only image sensors and numerical maps. In addition, it proposes a new method for combining an extended Kalman filter (EKF) and particle filter, (PF) fast vehicle ego-motion estimation, and road feature extraction.The technology for measuring the location of a vehicle or mobile robot can be divided into map-based localization technology that estimates the location from a pre-constructed map or land-marker [12], and simultaneous localization and mapping (SLAM) technology that automatically produces a map and simultaneously estimates the location [13].In this paper, relevant studies are examined, focusing on the notable researches that have a direct bearing on this paper. Because localization technology is a foundation technology for mobile robots and autonomous vehicles, research has been widely conducted on the subject [12].Levinson described typical research on vehicle localization technology covering precise location estimation applied to a Google driverless car. The results were acquired using a PF [2] and histogram filters [3] through the input of the refraction intensity of the light detection and ranging (LIDAR) sensor and inertia measurement unit (IMU) sensor on the probability map produced using a LIDAR sensor.Pink developed the first localization technology using a precise map generated by aerial images and road markings [4]. Since then, localization technology that recognizes road markings and that uses precise image maps has been partially attempted.These researches were produced based on precise maps made from front-camera images and the road shape including road markers [5] or road markers only [6], as well as maps produced by LIDAR and road markers [7]. Some researches show that position accuracy is improved by combining a GPS [8].However, producing precise rasterized maps for all spaces is not only difficult for map building, but may cause problems in terms of physical storage space.For the first time, Laneurit proposed a localization method that made use of numerical maps. He demonstrated a technique to improve the accuracy of GPS on a numerical map through using multi-sensor fusion with a Kalman filter [10].Brubaker recently attempted, for the first time, to locate the global position of a vehicle using only numerical maps and image information [9]. He acquired about a three-meter precision matching between the vehicle’s driving path acquired from the VO and the road shape, using a probability method. This system performs similar to a commercial GPS and, although it lacks control, has been judged to be highly meaningful in that his attempt identified a vehicle’s global position with only image information and numerical map information in a vastly wide area.This paper consists of the following sections. Section II describes the localization technology framework, and Sections III, IV, and V describe each technology in detail. Finally, Sections VI and VII conclude this paper with our experimental results and findings.
II. Framework for Vehicle Localization Using Numerical Maps
The purpose of this study is to develop a location recognition technology having low price, high speed, and high performance for unmanned vehicle driving and parking functions. Even though precise VO or SLAM technology is implemented, the location result acquired from such a technology is a relative location with regard to the reference position [13]–[14]. Actually, an accumulation of errors may occur; therefore, it is difficult to fully trust the results.This paper proposes a map-based localization technique using the numerical maps widely used in a GIS as a reference point. The location of a vehicle within a numerical map is predicted using vehicle ego-motion and is verified using the road marking (RM) feature and PF. The ego-motion and map-based vehicle location have many estimation errors (noise); thus, these noises make it impossible to detect vehicle poses stably, which can be addressed by estimating a robust vehicle state ultimately using an EKF. Radically, the localization technology proposed in this paper can be said to be based on simple EKF-based vehicle state estimation, and its observation is a PF.An EKF is a self-recursive filter for estimating a nonlinear system state including noise [15]. An EKF consists of prediction and update stages, as illustrated in the following equation, where the nonlinear state transition and observation equations are provided as x_{k} = f(x_{k−1}, u_{k−1}, w_{k−1}) and z_{k} = h(x_{k}, v_{k}), respectively.$$\text{Prediction:\hspace{0.05em}}\{\begin{array}{l}{\widehat{x}}_{k|k-1}=f\left({\widehat{x}}_{k-1|k-1},\text{}{u}_{k-1}\right),\\ {P}_{k|k-1}={F}_{k-1}{P}_{k-1|k-1}{F}_{k-1}^{T}+{Q}_{k-1},\end{array}$$$$\text{Update:}\{\begin{array}{l}{\widehat{x}}_{k|k}={\widehat{x}}_{k|k-1}+{K}_{k}{\tilde{y}}_{k},\\ {P}_{k|k}=\left(I-{K}_{k}{H}_{k}\right){P}_{k|k-1},\\ {K}_{k}={P}_{k|k-1}{H}^{T}{}_{k}{S}^{-1}{}_{k},\\ {\tilde{y}}_{k}={\widehat{z}}_{k}-h\left({\widehat{x}}_{k|k-1}\right),\\ {S}_{k}={H}_{k}{P}_{k|k-1}{H}_{k}^{T}+{R}_{k}.\end{array}$$In these equations, x_{k}, u_{k}, z_{k}, w_{k}, and v_{k} are the state, input, output, process noise, and observation noise; Q_{k} and R_{k} are the covariance of w_{k} and v_{k}; and K_{k}, ỹ_{k}, and S_{k} are the Kalman gain, innovation, and innovation covariance, respectively. Further, F_{k} and H_{k} are the Jacobean of the state transition and observation function, respectively. Here, if the vehicle pose is defined as state x_{k} = [x y θ]^{T}, and the ego-motion of the vehicle is defined as input u_{k} = [δx δy δθ]^{T}, then the kinematical nonlinear motion equation of a vehicle can be defined as follows:$$\left(\begin{array}{l}{x}_{t+1}\\ {y}_{t+1}\\ {\theta}_{t+1}\end{array}\right)=\left(\begin{array}{c}{x}_{t}\\ {y}_{t}\\ {\theta}_{t\text{}}\end{array}\right)+\left(\begin{array}{l}\delta x\mathrm{sin}\left({\theta}_{t}\right)-\delta y\mathrm{cos}({\theta}_{t})\\ \delta xc\mathrm{o}s\left({\theta}_{t}\right)+\delta y\text{\hspace{0.17em}}\mathrm{sin}({\theta}_{t})\\ \text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}\delta {\theta}_{\text{}}\end{array}\right).$$Although the environment has some difference in elevation, this assumption is well suited because the system uses only a 2D map. Figure 2 shows the framework of the algorithm proposed in this paper. The vehicle ego-motion (input u_{k}) is calculated by front stereo images, and RM are detected by surrounding images. Here, u_{k} is used for a prediction of EKF, and RM, together with numerical maps, is used to calculate the vehicle pose; namely, observation value ẑ_{k}. A PF is basically used; thus, the level of robustness against any sensing noise is enhanced, and the localization accuracy is improved through optimization after additionally including the previous path RM information. From the observation value ẑ_{k}, calculated as such, an update of EKF is carried out. Additionally, Kalman filtering (KF) based on vehicle dynamics is conducted for processing the time-delay compensation.
Algorithm of vehicle localization using numerical maps.
III. Ego-Motion Estimation
Ego-motion is the 3D motion of a camera acquired between consecutive times, and VO is the processing of ego-motion using images [14]. In his latest study, Badino developed a multi-frame feature integration (MFI) technology for effective VO [16]. Even though MFI delivers excellent performance and respectable real-time capability, the speed of MFI was slightly insufficient to the desired performance. Therefore, this paper proposes an algorithm highly improving the real-time performance of MFI by combining a high-speed feature tracking and fast convergence from solving the least square optimization.
- 1. Multi-frame Feature Integration
MFI is a technique proposing an augmented feature, which can be an integrated feature having the history of all tracked features and reducing the tracking errors by additionally including all of them [16]. From the stereo images, the ith feature set tracked from time t is defined as m_{i.t} = [u v d]^{T}, where (u, v) is the position of the feature in the left image, and d is its disparity. If the rotation matrix of the ego-motion is set as R, and the translation vector is set as T, then the rigid body motion of the tracked feature needs to meet the following:$$\bm{g}\left({m}_{i,t}\right)=\bm{R}\bm{g}\left({m}_{i,t-1}\right)+\bm{T}.$$Here, g(·) is a stereo triangulation equation. An objective function is calculated by multiplying both sides by an inverse function, h(·) = g(·)^{−1}.$${m}_{i,t}=\bm{h}\left(\bm{R}\bm{g}\left({m}_{i,t-1}\right)+\bm{T}\right)=\bm{r}\left({m}_{i,t-1}\right).$$Calculating the ego-motion is a least square optimization problem that looks for R and T that minimize the following error function:$$\bm{e}={\displaystyle \sum}_{i=1}^{n}{\omega}_{i}{\Vert \bm{r}\left({m}_{i,t-1}\right)-{m}_{i,t}\Vert}_{2}.$$In this equation, ω_{i} is the weighting. MFI extends the error function additionally as follows:$$\widehat{\bm{e}}=\alpha {\displaystyle \sum}_{i=1}^{n}{\omega}_{i}{\Vert \bm{r}\left({m}_{i,t-1}\right)-{m}_{i,t}\Vert}_{2}\text{}+\beta {\displaystyle \sum}_{i=1}^{n}{a}_{i}{\omega}_{i}{\Vert \bm{r}\left({\overline{m}}_{i,t-1}\right)-{m}_{i,t}\Vert}_{2}.$$Here, α and β are normalized weighting factors, a_{i} is the age of a feature at time t−1, and
m ¯ i,t−1
is an integrated feature and is defined as
m ¯ i,t−1 = 𝒓( m i,t−1 )+ a i 𝒓( m ¯ i,t−1 ) 1+ a i .
MFI can expect quite a performance improvement through the two following directions. The first is reducing errors occurring upon 3D coordinate conversion, through the direct use of the 2D feature position from the objective function definition in (5) [14], and the other is including the previous information of the tracked features in the optimization stage in (7) [16]. However, the heavy operation process of feature tracking and obtaining the disparity from stereo images still remains.
- 2. Speed-Up Multi-frame Feature Integration
We propose a four-step technique to improve the processing speed of MFI. First, input images are transformed into multi-scale images through a pyramidal technique. Second, new features of every scaled image are extracted through FAST [17] feature detection, Shi–Tomasi [18] cost filtering, sub-pixel accuracy refinement [19], and a FREAK [20] descriptor. Third, feature matching and ego-motion are updated as follows: feature matching on a narrow search area, starting from the smallest layer, is performed. This is made possible by predictions of a new feature position using the ego-motion of the previous stage and (4). The disparity of the matched feature is then computed and the ego-motion is updated using (6). Using (7), the final ego-motion is computed when the largest layer is reached. At this state, the initial values for solving the least square optimization problem are the previous stage values. Finally, new tracked features are selected. Every layer is divided into a grid and within this, we select the features up to a predefined number in descending order of Shi–Tomasi cost to maintain a better feature distribution.
IV. RM Feature Extraction
RM is the measurement feature of the proposed system. Studies on RM feature detection have been steadily carried out for the last two decades and have been mainly conducted for an advanced driver assistance system [21]. Lately, some researches have used an RM feature for localization [4], [6]–[8].
- 1. Extended Symmetrical Local Threshold
Through previous studies, the symmetrical local threshold (SLT) method is well known in that it demonstrates a good performance with a small number of operations for RM feature extraction [21]. SLT uses the mean pixel intensity within a certain range placed at the right and left as a reference value for the target pixel threshold (Fig. 3(a)). Therefore, SLT can effectively detect a bright RM having a small amount of operations and drawn on a dark road. The threshold results of the image coordinate (u, v) in SLT are defined through the following equation:$${\widehat{i}}^{u,v}=\{\begin{array}{l}1\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}if\hspace{0.17em}\hspace{0.17em}}{i}^{u,v}>{\mu}_{l}^{u,v}\left(x,y\right)+\tau \wedge {i}^{u,v}>{\mu}_{r}^{u,v}+\tau ,\text{\hspace{0.17em}}\\ 0\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}otherwise}.\end{array}$$Here, i^{u,v} is the pixel intensity at the image coordinate (u, v), and μ_{l} and μ_{r} are the mean of the left and right sections, and are defined as
μ l = 1 N ∑ k=u−1−N u−1 i(k,v)
and
μ r = 1 N ∑ k=u+1 u+1+N i(u,v) ,
respectively. Here, τ is the given threshold value. From our experience, the SLT method poses difficulties in detecting a horizontally placed straight line, and an error took place at the shadow boundary because SLT sets the threshold value using only the means of the pixel intensities of the horizontal axis.
Averaging block for a local threshold: (a) original SLT and (b) extended symmetrical local threshold (ESLT).
To solve this problem, the current paper proposes an extended symmetrical local threshold (ESLT) technique, as shown in Fig. 3(b). ESLT uses four directional block reference values, which are the means of the pixel intensities within the block surrounding the target pixel.If the radius of the image coordinate (u, v) block is set as σ, then it can be defined through the following equation:$${\tilde{i}}^{u,v}=\{\begin{array}{l}1\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}if\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}{i}^{u,v}>{\mu}_{E}^{u,v,\sigma}+\tau \wedge {i}^{u,v}>{\mu}_{W}^{u,v,\sigma}+\tau ,\\ 1\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}if\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}{i}^{u,v}>{\mu}_{N}^{u,v,\sigma}+\tau \wedge {i}^{u,v}>{\mu}_{S}^{u,v,\sigma}+\tau ,\\ 0\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}otherwise}.\end{array}$$Here, μ_{E}, μ_{W}, μ_{N}, and μ_{S} are the means of the pixel intensities within each directional block and are defined as follows:$$\begin{array}{l}{\mu}_{E}^{u,v,\sigma}=\frac{1}{{\left(2\sigma +1\right)}^{2}}{\displaystyle {\sum}_{k=y-\sigma}^{y+\sigma}{\displaystyle {\sum}_{l=x-\left(2\sigma +2\right)}^{x-1}{i}^{\left(l,k\right)}}},\\ {\mu}_{W}^{u,v,\sigma}=\frac{1}{{\left(2\sigma +1\right)}^{2}}{\displaystyle {\sum}_{k=y-\sigma}^{y+\sigma}{\displaystyle {\sum}_{l=x+1}^{x+\left(2\sigma +2\right)}{i}^{\left(l,k\right)}}},\\ {\mu}_{N}^{u,v,\sigma}=\frac{1}{{\left(2\sigma +1\right)}^{2}}{\displaystyle {\sum}_{k=y-\left(2\sigma +2\right)}^{y-1}{\displaystyle {\sum}_{l=x-\sigma}^{x+\sigma}{i}^{\left(l,k\right)}}},\\ {\mu}_{S}^{u,v,\sigma}=\frac{1}{{\left(2\sigma +1\right)}^{2}}{\displaystyle {\sum}_{k=y+1}^{y+\left(2\sigma +2\right)}{\displaystyle {\sum}_{l=x-\sigma}^{x+\sigma}{i}^{\left(l,k\right)}}}.\end{array}$$Although a great number of operations are required to calculate the means of block intensity, this can be resolved by using the box-filter technique, which is a O(1) algorithm [22]. If the resulting value of the box-filter is set as b, then equation (10) can be modified briefly as follows:$$\begin{array}{l}{\mu}_{E}^{u,v,\sigma}={b}^{u-\left(\sigma +1\right),v},\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}{\mu}_{W}^{u,v,\sigma}={b}^{u+\left(\sigma +1\right),v},\\ {\mu}_{N}^{u,v,\sigma}={b}^{u,v-\left(\sigma +1\right)},\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}{\mu}_{S}^{u,v,\sigma}={b}^{u,v+\left(\sigma +1\right)}.\end{array}$$
- 2. Size Relative Region Resampling
Not only is the size of the RM greatly diverse, but the road situation is also greatly variable owing to weather, changes in the amount of sunshine, and the presence of shadows. Furthermore, in an outdoor parking lot, where many vehicles are parked in a narrow space, the impact of a parked vehicle’s shadow is vast. In such a situation, it is extremely difficult to set the proper threshold value and filter size.We reviewed the performance of all threshold values and filter sizes, but a common value appropriate to all situations was not found (Fig. 4). In addition, although we attempted maximally stable extremal regions (MSER) [23], not only did we not find remarkable improvements owing to highly sensitive threshold values in some particular images, the real-time performance also did not meet our expectation, as described in Section VI.
Results of road marking feature detectors in a variety of difficult situations: the first row is a wide dark mark on the front camera view, the second row is a narrow lane marking between a packed car on the side camera view, and the other is a narrow marking in shadows. Each column is the (a) original, (b) ground truth, (c) best SLT [21], (d) best ESLT, (e) S3R image, and (f) DCS [21] of ESLT.
Basically, the filter size (σ) is related to the size of the marking, and the threshold value (τ) is concerned with the intensity of the marking. This paper proposes an algorithm for optimal RM feature extraction that is based upon this filter size — the filter size uses a low threshold value so that an RM can be detected in all possible cases and a stable region, in which region changes based on the filter size become minimized, can be found. In this paper, this is called size relative region resampling (S3R).The difference between S3R and MSER is that S3R targets a stable region according to the filter size change and successively compares only two neighboring images for a fast performance speed, and does not find a stable region according to the change in threshold values. From our experience, the best results were acquired when S3R was performed by reducing σ in an exponential function manner with four or five steps. Consequently, each step σ is defined with the following equation if the (N + 1)th step is selected from the σ-value range of σ = [σ_{max}, σ_{min}] in terms of the method used to define the value of σ.$${\sigma}_{n}={\sigma}_{\text{min}}+\left({\sigma}_{\text{max}}-{\sigma}_{\text{min}}\right){\left(\frac{N-n}{N}\right)}^{2},\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}n=\text{\hspace{0.17em}}0,\text{\hspace{0.17em}\hspace{0.17em}}1,\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{...}\text{\hspace{0.17em}\hspace{0.17em}},\text{\hspace{0.17em}\hspace{0.17em}}N.$$
V. Vehicle Pose Measurement
This paper proposes a method to measure the vehicle location using the RM feature and a PF, an optimization method to enhance the location precision, and a compensation method to decrease the processing time-delay.
- 1. Vehicle Pose Detection Using PF
A. PFA PF is a Bayesian sequential-importance-sampling tool that calculates the approximate value of a posterior distribution by recursively using limited weight samples. The robustness of the localization using the PF is known to be excellent [12].A PF is divided into prediction and update stages. When all observable inputs are given as z_{1:t−1} = {z_{1}, ... , z_{t−1}} up to t−1 time, the posterior distribution of time t can be predicted as follows:$$p\left({x}_{t}\text{|}{z}_{1:t-1}\right)={{\displaystyle \int}}^{\text{}}p\left({x}_{t}\text{|}{x}_{t-1},{u}_{t-1}\right)p\left({x}_{t-1}\text{|}{z}_{1:t-1}\right){\text{dx}}_{t-1}.$$If z_{t} is observable at time t, then the posterior distribution can be updated as follows through Bayes’ rule:$$p\left({x}_{t}\text{|}{z}_{1:t}\right)=\frac{p\left({z}_{t}\text{|}{x}_{t}\right)p\left({x}_{t}\text{|}{z}_{1:t-1}\right)}{p\left({z}_{t}\text{|}{z}_{1:t-1}\right)}.$$In this equation, likelihood p(z_{t} | x_{t}) is defined by the observation equation, and if the probability distribution of predicted particle
x ˜ t i
in the case of PF is calculated by q(x_{t} | x_{1:t−1}, z_{1:t}), then the likelihood can be approximated by the following weight:$${w}_{t}^{i}={w}_{t-1}^{i}\frac{p\left({z}_{t}|{\tilde{x}}_{t}^{i}\right)p\left({\tilde{x}}_{t}^{i}|{x}_{1:t-1}^{i}\right)}{q\left({\tilde{x}}_{t}|{x}_{1:t-1},\text{\hspace{0.17em}\hspace{0.17em}}{z}_{1:t}\right)}.$$Therefore, the posterior distribution can be calculated through a calculation of the predicted particle’s weight. In this case, state x is the vehicle state; namely, the vehicle pose, and measurement z is the location of the RM.B. Vehicle Pose DetectionThis study uses a PF to detect the approximated vehicle pose at a current point in time. The vehicle pose is detected by obtaining the result of (14). To this end, each particle’s prediction value,
x ˜ t i
, should first be calculated. This value is used to compute the nonlinear vehicle kinematic function of (3), where the ego-motion value is inputted on all particles. The RM feature should be changed into vector-type data, such as a numerical map. A line segment is calculated for the line model from the outer contour of the critical point of each RM feature. The line model
S l i = [ x l y l θ l l l ] T
is defined with the starting point, angle, and length of each line. On the map, the line segment of the object for which images are projected to map the domain W(
x ˜ t i
) is extracted, and the object line model
M k i = [ x k y k θ k l k ] T
is calculated.This paper proposes a method to calculate the probability of a mapping hypothesis between the observable map object corresponding within the domain of the predicted vehicle pose
x ˜ t i
and the RM feature observed from the images to calculate (14). Previous ego-motion and RM information are included in the mapping hypothesis function to increase the robustness against noise. The information is updated when the vehicle moves within a specific distance.Figure 5 shows a hypothesis function, H(·)_{kl}. The function stands for the probability that the kth data of the numerical map is mapped to the lth image data and is defined as follows:$$\begin{array}{c}H{\left({\tilde{x}}_{t}^{i}\right)}_{kl}={\displaystyle \sum _{n=0}^{p}{a}_{n}^{i}}{\Vert M{\left({\tilde{x}}_{t-n}^{i}\right)}_{k}-S{\left({\tilde{x}}_{t-n}^{i}\right)}_{l}\Vert}_{2}\\ \text{}={\displaystyle \sum _{n=0}^{p}{a}_{n}^{i}}\sqrt{{d}_{kl}^{2}+{\left({\mu}_{1}\frac{\delta {\theta}_{kl}}{\text{\pi}}\right)}^{2}+{\left({\mu}_{2}\frac{\delta {l}_{kl}}{{l}_{Mk}}\right)}^{2}}.\end{array}$$Here, M(·) is a map’s data model function, S(·) is the image data model function, and
a n i
is the weight of the previous path. The more n increases (previous time), the less the value of the weight
a n i
decreases. Here, d_{kl} is the distance between the line of the map vector segment and the central point of the image vector. Here, δθ_{kl} is the angle between the map segment and RM segment, δl_{kl} is the difference in length between the two segments, and μ_{1} and μ_{2} are constants.
Example mapping hypothesis between a map object and road marking segment.
The observation model calculates the probability of the mapping hypothesis, and the observation probability of the ith particle in time t can be defined as follows:$$\begin{array}{l}{p}_{t}^{i}=p\left({z}_{t}|{\tilde{x}}_{t}^{i}\right)\\ =1-\frac{1}{M+N}\left({\displaystyle \sum _{m=1}^{M}F{\left({\tilde{x}}_{t}^{i},{z}_{t}\right)}_{m}+{\displaystyle \sum _{n=1}^{N}G{\left({\tilde{x}}_{t}^{i},{z}_{t}\right)}_{n}}}\right).\end{array}$$Here, M is the number of segments observed from the numerical maps, and N is the number of segments observed from the images. In addition, F(·) is the observation function of the referenced map, F(·)_{m} = min{H(·)_{kl}|k = m, l = 1, 2, ..., N}, and G(·) is the observation function of the referenced image G(·)_{n} = min{H(·)_{kl}|l = n, k = 1, 2, ..., M}, The result of the hypothetical function H(·) is a matrix of size M × N. The set of the minimum values in each row is denoted by F(·), and G(·) is the set of the minimum values in each column. When the observation probability of all particles is calculated, a new state (namely, the vehicle pose), is computed from a multiplication of the predicted particle and weight, as shown in (18).$${x}_{\text{est}}={\displaystyle {\sum}_{i=1}^{n}{w}_{t}^{i}{\tilde{x}}_{t}^{i}},\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}Here,\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}{w}_{t}^{i}=\frac{{p}_{t}^{i}}{{{\displaystyle \sum}}_{k=1}^{n}{p}_{t}^{k}}.$$Finally, particle resampling should be carried out.
- 2. Vehicle Pose Refinement
The robustness of a PF is really outstanding, but the resolution is relatively low. To enhance the resolution, the number of particles needs to increase; therefore, the number of operations should greatly increase [12]. This study proposes a refinement technique that uses the least square optimization from the approximated vehicle pose computed by a PF using a small number of particles.Nowadays, state-of-the-art optimization algorithms have been reported [24]; however, in this study, the authors chose the least square optimization technique because it can be easily implemented and works well.By inputting an approximated vehicle state calculated from (17), the observation function F(·) is calculated using (16). If most neighboring segment pairs are computed, then the noisy segments of an image (not matched segments) are removed. State x is computed through the least square optimization for the norm of (16). The value calculated as such is the vehicle state observation value ẑ_{k} and is used to update (2) for the EKF.
- 3. Processing Time-Delay Compensation
The vehicle state computed as such, (namely, the vehicle pose), is the information at the time at which the image is captured from a camera. That is, it is already past information when used for vehicle control; therefore, proper compensation is required. This is solved by estimating a vehicle’s dynamic characteristics using a linear KF.The state vector of a vehicle dynamic model can be defined as
x k = [ x x ˙ x ¨ y y ˙ y ¨ θ θ ˙ θ ¨ ] T ,
input vector as u_{k} = [δx δy δθ]^{T}, and output vector as y_{k} = [x y θ]^{T} If modeling is conducted with linear uniformly accelerated motion, then the state equation of a vehicle dynamic model is computed. Ego-motion is used for the input vector of the linear KF, while the output of the EKF is used for the observation vector. If a vehicle dynamic model state is estimated using a KF, then the vehicle state after random time δt passes can be predicted as follows:$${\tilde{x}}_{k+\delta t}={x}_{k}+{\dot{x}}_{k}\delta t+\frac{1}{2}{\ddot{x}}_{k}\delta {t}^{2}.$$The prediction of the vehicle pose as such compensates the time delay occurring during the operations, and more precise information can be offered to a vehicle controller when an unmanned vehicle is actually operated.
VI. Experimental Results
- 1. Experimental Environments
The two autonomous vehicles in this paper used a 2.6 GHz Intel Xeon E5-2670 and a 2.3 GHz Intel i7-3610QM, respectively, as the main controller, and were adapted for unmanned driving. They were also equipped with six Pointgrey blackfly cameras installed with a fisheye lens, and OxTS RT-2002 high-accuracy GPS/INS and Sick LIDAR. The six cameras were attached in a front stereo and rear stereo fashion, with one camera on both the right and left sides. For ego-motion estimation, the front stereo cameras were used, and four cameras on the front left, rear left, and left and right sides were used for RM feature extraction (a camera on the rear right side is not currently used). The algorithm under such an environment was developed using C++ language and was optimized using Single Instruction, Multiple Data (SIMD) [25] and Intel threading building blocks (TBB) [26]. An Eigen library [27] was used for the matrix operations, while a levmar library [28] was used as the optimization solver. We gathered more than 1 TB of images and GPS log information under various weather conditions for more than one year while carrying out this study. All cameras were collected at 20 frames per second using the same sync, and the GPS log was collected at 250 samples per second. The numerical map was built at 1:500 (mean error of 0.2 m) on a 300 m × 500 m space, while the size was only 530 kilobytes without compression. The test bed is located on a low hill, and the elevation of this place is 50 m to 62 m. Figure 6 shows some selected images and numerical maps from the dataset. In this study, the EKF parameters are tuned using GPS, and the number of particles is variable, up to 200, depending on their probability.
Our dataset of some selected scenes in a variety of weather conditions and a numerical map: (a) sunny day, (b) rainy day, (c) cloudy day, (d) snowy day, (e) sunny day, evening, (f) sunny day, a wide open space, and (g) the numerical map.
- 2. Performance Analysis of Each Algorithm
Table 1 shows the mean processing time according to various situations on the technology proposed in this paper. This is the mean value of 1,000 data from each sample dataset. Time measurement results were divided into a case using only a single core and a case using multiple cores. Through the results using only a single core, the efficiency of the algorithm can be verified. The RM feature and localization (LO) are the processing results for four cameras, and the processing time of one camera is about 1/4. Although the processing time slightly increases owing to an increase in noise affected by shadows on a sunny day, respectable real-time attributes are shown, nonetheless. The excellent real-time attributes of the technology proposed by this study are due to fast EE and RM feature extraction.
1) i7-3610QM, 2.3 GHz.2) Xeon E5-2670, 2.6 GHz.3) Sum of four channels.
Table 2 shows the performance related with EE. The feature matching in the framework was compared with famous SURF [29], KLT [18], and FREAK [20] algorithms. For FREAK, the Shi–Tomasi [18] was used since there was no feature extractor. The proposed technique showed the most outstanding and accurate real-time performance. The reason is the selection of a good corner and distributed features (FAST+Shi–Tomasi), as well as the excellent matching performance of FREAK.
Table 3 shows the RM performance. When only SLT [21] and ESLT are used, a very fast performance time is shown. However, the ability to cope with a difficult situation, such as line detection between vehicles, is remarkably low. Although there may be an explicit performance improvement, when MSER [23] is combined, an increase in the performance time is conspicuous. S3R proposed in this paper meets both the performance and time requirements.
Figure 7 shows an outstanding driving path used for the experiment. This paper conducted analyses on free manual driving, (Fig. 7(a)), and automatic driving and parking, (Fig, 7(b)). The analysis results are shown in Table 4. The distance root mean squared (DRMS) is a measurement method, the units of which are meters and degrees. Uniform results can be seen overall, especially when the influence of rain or snow is considerably small. With each of these, the robustness of the proposed algorithm can be ascertained. Note, the results acquired a higher accuracy than that of the map used in this study (mean error, 0.2 m). This is because the map errors (noise) are reduced through multi-stage filtering.
Selected driving paths and their results: (a) manual driving, with a vehicle speed of 10 km/h to 30 km/h and driving distance of 1 km to 3 km; (b) autonomous valet parking, with a vehicle speed of 1 km/h to 3 km/h and driving distance of 80 m to 120 m (the vehicles drive autonomously, and park in a slot, and then return to the starting position).
Figure 8 demonstrates the location errors measured upon the actual driving in real time: (a) is the case where the processing time-delay is not compensated, (b) is the opposite case, and (c) is the plotted time domain of (b). In the case of (a), about a 50 ms time delay occurs with 30 ms for capturing and preprocessing after a camera is triggered, and a 20 ms delay occurs for the processing time; thus, location errors increase owing to the delay. Meanwhile, in the case of (b), which compensates the processing time-delay, the location error is remarkably reduced. Such time-delay compensation is judged to be essential for actual control, and the need for compensation increases if the vehicle speed is higher.
Processing time-delay compensation: position error of the path of Fig. 7(a): (a) without compensation at a vehicle speed of 10 km/h to 30 km/h and a delay of 50 ms (image capture of 30 ms + processing time of 20 ms), and (b), (c) with compensation.
Table 5 shows the robustness of the proposed algorithm in the case of camera failures. The failure of one or two cameras did not cause noticeable performance degradation, in the case of three, autonomous driving and parking were possible over a short distance, but were not possible over a long distance.
This paper proposed an efficient localization technology that can be used for actual autonomous vehicles. The proposed technology was developed using only numerical maps that are widely used in a geographic information system (GIS), or in the navigation system field and computer vision technology. This study proposes an efficient vehicle ego-motion and road marking feature extraction algorithm and method to compute the vehicle pose by combining them through an extended Kalman filter (EKF) and particle filter (PF). This paper also handles a time-delay compensation method for the actual control system. This paper conducted experiments on the technology in various weather conditions; thus, the robustness and efficiency of the proposed technology were verified.The proposed technology was developed and aimed at actual unmanned autonomous vehicles, and was applied and experimented on two vehicles. The technology was verified as an effective technology and was close to its commercialization.This technology has a constraint in that it requires relatively precise numerical maps. Most facilities, however, have relevant drawings, and acquiring numerical maps from them is not a difficult task. Many countries, including the Rep. of Korea, produce and manage accurate numerical maps for national territory management. If this technology is combined with general commercial GPSs, then the following will be possible: precise driving and parking using the technology proposed in this paper, within the facilities requiring high-precision location information, while driving on the road with the low-precision location and lane-keeping technology.As such, localization technology utilization using numerical maps is expected to increase further. This research is significant in that it has developed a real-time precision vehicle localization using numerical maps for the first time.
This work was supported by the Industrial Strategic Technology Development Program (10035250, Development of Spatial Awareness and Autonomous Driving Technology for Automatic Valet Parking) funded by the Ministry of Knowledge Economy (MKE), Rep. of Korea.
BIO
Corresponding Authorjoshua@etri.re.krSeung-Jun Han received his BEng degree in control and instrumentation engineering from Pukyong National University, Busan, Rep. of Korea, in 1998, He received his MS degree in electronics engineering from Pusan National University, Busan, Rep. of Korea, in 2000. From 2000 to 2010, he worked as a principal researcher for Sane System Co., Ltd., Kyonggi, Rep. of Korea. In 2011, he worked as a research fellow at the Department of Aerospace Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Rep. of Korea. Since 2012, he has been working as a senior researcher at the Electronics and Telecommunications Research Institute, Daejeon, Rep. of Korea. In addition, he has won several awards, including the 6th Samsung Humantech Thesis prize in 2000, the outstanding paper award at the 17th ITSWC in 2010, and the outstanding employee award at ETRI in 2014. His main research interests are structure from motion; simultaneous localization and mapping; and machine learning.
jdchoi@etri.re.krJeongdan Choi received her PhD in image processing from Chungnam National University, Daejeon, Rep. of Korea, in 2006 and obtained her BS and MS degrees in computer graphics from Chung-Ang University, Seoul, Rep. of Korea, in 1993 and 1995, respectively. Since 1995, she has been working as a principal researcher at the Electronics and Telecommunications Research Institute (ETRI), Daejeon Rep. of Korea. She is also a director of the Car/Infra Fusion Research Team, Car/Ship IT Convergence Technology Department, ETRI. Her research interests are in automotive image processing and recognition; 3D modeling; and rendering.
Luettel T.
,
Himmelsbach M.
,
Wuensche H.
“Autonomous Ground Vehicles—Concepts and a Path to the Future,”
Proc. IEEE
May 13, 2012
100
(special centennial issue)
1831 -
1839
DOI : 10.1109/JPROC.2012.2189803
Levinson J.
,
Montemerlo M.
,
Thrun S.
2007
“Map-Based Precision Vehicle Localization in Urban Environments,”
Robot.: Sci. Syst. Conf.
Atlanta, GA, USA
Levinson J.
,
Thrun S.
“Robust Vehicle Localization in Urban Environments Using Probabilistic Maps,”
IEEE Int. Conf. Robot. Autom.
Anchorage, AK, USA
May 3–7, 2010
4372 -
4378
DOI : 10.1109/ROBOT.2010.5509700
Pink O.
,
Moosmann F.
,
Bachmann A.
“Visual Features for Vehicle Localization and Ego-Motion Estimation,”
IEEE Intell. Veh. Symp.
Xi’an, China
June 3–5, 2009
254 -
260
DOI : 10.1109/IVS.2009.5164287
Napier A.
,
Newman P.
“Generation and Exploitation of Local Orthographic Imagery for Road Vehicle Localisation,”
IEEE Intell. Veh. Symp.
Alcalá de Henares, Spain
June 3–7, 2012
590 -
596
DOI : 10.1109/IVS.2012.6232165
Wu T.
,
Ranganathan A.
“Vehicle Localization Using Road Markings,”
IEEE Intell. Veh. Symp.
Gold Coast, Australia
June 23–26, 2013
1185 -
1190
DOI : 10.1109/IVS.2013.6629627
Schreiber M.
,
Knoppel C.
,
Franke U.
“LaneLoc: Lane Marking Based Localization Using Highly Accurate Maps,”
IEEE Intell. Veh. Symp.
Gold Coast, Australia
June 23–26, 2013
449 -
454
DOI : 10.1109/IVS.2013.6629509
Tao Z.
“Mapping and Localization Using GPS, Lane Markings and Proprioceptive Sensors,”
IEEE Int. Conf. Intell. Robot. Syst.
Tokyo, Japan
Nov. 3–7, 2013
406 -
412
Brubake M.A.
,
Geiger A.
,
Urtasun R.
“Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization,”
IEEE. Conf. Compt. Vision Pattern Recogn.
Portland, OR, USA
June 23–28, 2013
3057 -
3064
DOI : 10.1109/CVPR.2013.393
Durrant-Whyte H.
,
Bailey T.
2006
“Simultaneous Localisation and Mapping: Part I the Essential Algorithms,”
Robot. Autom. Mag.
13
(2)
99 -
110
DOI : 10.1109/MRA.2006.1638022
Scaramuzza D.
,
Fraundorfer F.
2011
“Visual Odometry (tutorial) Part I: The First 30 Years and Fundamentals,”
Robot. Autom. Mag.
18
(4)
80 -
92
DOI : 10.1109/MRA.2011.943233
Ribeiro M.
2004
“Kalman and Extended Kalman Filters: Concept, Derivation and Properties,”
Institute Sys. Robot.;Instituto Superior Tecnico
Lisbon, Portugal
1 -
42
Rosten E.
,
Drummond T.
“Machine Learning for High-Speed Corner Detection,”
European Conf. Compt. Vision
Graz, Austria
May 7–13, 2006
430 -
443
DOI : 10.1007/11744023_34
Shi J.
,
Tomasi C.
“Good Features to Track,”
IEEE Int. Conf. Compt. Vision Pattern Recogn.
Seattle, WA, USA
June 21–23, 1994
593 -
600
DOI : 10.1109/CVPR.1994.323794
Alahi A.
,
Ortiz R.
,
Vandergheynst P.
“Freak: Fast Retina Keypoint,”
IEEE Conf. Compt. Vision Pattern. Recogn.
Providence, RI, USA
June 16–21, 2012
510 -
517
DOI : 10.1109/CVPR.2012.6247715
@article{ HJTODO_2014_v36n6_968}
,title={Real-Time Precision Vehicle Localization Using Numerical Maps}
,volume={6}
, url={http://dx.doi.org/10.4218/etrij.14.0114.0040}, DOI={10.4218/etrij.14.0114.0040}
, number= {6}
, journal={ETRI Journal}
, publisher={Electronics and Telecommunications Research Institute}
, author={Han, Seung-Jun
and
Choi, Jeongdan}
, year={2014}
, month={Oct}
TY - JOUR
T2 - ETRI Journal
AU - Han, Seung-Jun
AU - Choi, Jeongdan
SN - 1225-6463
TI - Real-Time Precision Vehicle Localization Using Numerical Maps
VL - 36
PB - Electronics and Telecommunications Research Institute
DO - 10.4218/etrij.14.0114.0040
PY - 2014
UR - http://dx.doi.org/10.4218/etrij.14.0114.0040
ER -
Han, S. J.
,
&
Choi, J.
( 2014).
Real-Time Precision Vehicle Localization Using Numerical Maps.
ETRI Journal,
36
(6)
Electronics and Telecommunications Research Institute.
doi:10.4218/etrij.14.0114.0040