Toward Accurate Road Detection in Challenging Environments Using 3D Point Clouds

ETRI Journal.
2015.
Jun,
37(3):
606-616

- Received : November 10, 2013
- Accepted : March 19, 2015
- Published : June 01, 2015

Download

PDF

e-PUB

PubReader

PPT

Export by style

Share

Article

Metrics

Cited by

TagCloud

In this paper, we propose a novel method for road recognition using 3D point clouds based on a Markov random field (MRF) framework in unstructured and complex road environments. The proposed method is focused on finding a solution for an analysis of traversable regions in challenging environments without considering an assumption that has been applied in many past studies; that is, that the surface of a road is ideally flat. The main contributions of this research are as follows: (a) guidelines for the best selection of the gradient value, the average height, the normal vectors, and the intensity value and (b) how to mathematically transform a road recognition problem into a classification problem that is based on MRF modeling in spatial and visual contexts. In our experiments, we used numerous scans acquired by an HDL-64E sensor mounted on an experimental vehicle. The results show that the proposed method is more robust and reliable than a conventional approach based on a quantity evaluation with ground truth data for a variety of challenging environments.
difference
label based on point features. However, inferring difference labels cannot be done in real time for a large number of points. C. Guo and others
[14]
used a graph-based approach for a 2D road representation of 3D point clouds with respect to the road topography. The approach describes the gradient cues of the road geometry to construct an MRF and implements a belief propagation (BP) algorithm to classify the road environment into four categories — a reachable region, a drivable region, an obstacle region, and an unknown region. However, their approach uses only a gradient value for labeling, and thus it sometimes cannot distinguish the ground from the roof of the vehicle. M. Li and Q. Li
[15]
proposed a method called Four Directions Scan Line Gradient Criterion (4DSG), in which the gradient value of the height is calculated using neighboring points. 4DSG can reflect not only the flatness of pavements but also the distinguishing features of a point cloud on curbs in four directions. J. Bohren and others
[4]
proposed a method in which road points can also be detected based on the reflectivity of the ground in Velodyne scans. However, such a method only worked well under good conditions.
Unlike previous works, the research described in this paper addresses the categorization of six varying road types — FR, UR, DR, SR, MR, BR, and IR — including not only normal roads but also practical challenging roads that may be encountered during autonomous navigation (see
Fig. 1
). In addition, we propose a robust method of road detection under these complex environments. The outline of our work is shown in
Fig. 2
. The proposed approach differs from previous related work. The main contributions are as follows:
Experimental environment.
Flow of overall detection method.
In Section I, we gave an outline of other relevant works, followed by a detailed description of our approach. Next, we give a detailed description of 3D point cloud representation and projection, in Section III. Section III describes how to select and extract the features. Section IV explains the method of classification based on MRF modeling. Then, experimental results are given in Section V. Finally, Section VI offers some concluding remarks regarding our works.
r
), distance value (
r
) returned by a laser, vertical angle (Δ
φ
), rotational correction angle (Δ
θ
), vertical correction angle (
φ
), vertical/horizontal offset (
r_{v}
and
r_{h}
, respectively)
[16]
.
Experimental vehicle.
The 3D point cloud computation required to transform the spherical point coordinates to those that are fit for a Cartesian space is as below
(1) $$p\left[\begin{array}{c}x\\ y\\ z\end{array}\right]={\left[\begin{array}{c}(r+\text{\Delta}r)\cdot \mathrm{cos}(\phi +\text{\Delta}\theta )\cdot \mathrm{cos}(\text{\Delta}\varphi )+{r}_{h}\cdot \text{cos}(\phi )\\ (r+\text{\Delta}r)\cdot \mathrm{sin}(\phi +\text{\Delta}\theta )\cdot \mathrm{cos}(\text{\Delta}\varphi )+{r}_{h}\cdot \text{sin}(\phi )\\ d\cdot \mathrm{sin}(\phi +\text{\Delta}\theta )-{r}_{v}\end{array}\right]}^{\text{T}}.$$
When using a Velodyne LiDAR mounted to the top of a car, we have to consider that the scanner takes a negligible amount of time to complete one full rotation; thus, the observed 3D point clouds are distorted by the motion of the vehicle. For instance, if the speed of the vehicle were 100 km/h (27.8 m/s), then, for each rotation of the Velodyne LiDAR (a single scan), it would be distorting the information of the 2.78 m just previously travelled, owing to the fact that the scanner is rotating with a frequency of 10 Hz (0.1 s). Furthermore, we should consider the case where the vehicle passes over a speed bump with a large rolling/pitch angle; in such a case, the road immediately in front of the vehicle is then classified as an obstacle, owing to the downward pitch of the vehicle at the time. We are able to solve these problems by estimating the ego-motion of the vehicle. To correct the aforementioned distortions, we utilize a GPS/INS unit that provides highly accurate motion information of the vehicle.
The laser measurement for each
i
th scan during one sensor rotation is referenced with respect to the vehicle’s position and orientation,
O
_{t+i}
, from the start of the sensor revolution, and is afterward transformed such that the coordinates are referenced with respect to
O
_{t+Δt}
at the end of the revolution, as shown in
Fig. 4
. For a transformation with rotation
O
_{t+Δt}
can be calculated as follows:
(2) $${p}_{{O}_{t+\text{\Delta}t}}^{i}\hspace{0.17em}={\tilde{R}}^{i}({p}_{{O}_{t}}^{i}-{\overrightarrow{t}}^{i}).$$
Correction of distorted 3D point cloud data by estimation of vehicle motion.
voxel
to represent a value on a regular grid in three-dimensional space. Various 3D structures can be described. However, this cannot guarantee a reduction in data size, because a lot of empty cells exist depending on how the density and resolution of the voxel are determined. In addition, this type of representation can consume large amounts of memory and has high complexity. Another method is to build a
range image
that can be generated from point cloud data and a given sensor position. However, this method has yet to be applied to the problem of road detection. This implies that it is more useful to recognize obstacles on a road than a road region.
The last method is a mesh grid mapping that is reconstructed in the form of a graph by connecting each node along the indices of the laser diode and the direction of the shot. The advantage of this is that there is a minimal loss of 3D data obtained from the sensor. On the other hand, the spatial relationships between a point and those points that are neighbors to it need to be confirmed, because the characteristics of these neighborhood regions are never the same. Finally, among the four methods described, our work is implemented using a modified version of the elevation map method. For this reason, the goal of our work focuses on the detection of a road using a horizontal plane, because the voxel method can lead to greater complexity and a higher cost in terms of computation. In addition, when constructing a mesh grid, we need to consider the spatial relationship at the cell level for segmentation with MRF modeling, which assumes spatial continuity constraints. Our proposed method therefore uses a projection for the construction of a 2.5D elevation map with 3D point clouds based on polar coordinates, unlike the Cartesian coordinate normally used by considering the scanning manner of a 3D LiDAR sensor. This sensor shoots 64 laser points at different tilt angles and rotates 360 degrees in such a way that point clouds that are close to the center of the sensor have a higher density. When making a polar grid, this fact can efficiently make a map by minimizing the number of empty cells. The representative value at each node is an elevation value that is the difference between the max-min
z
-coordinate values among the points in the same cell.
z
-direction of the vehicle; and the intensity value at each node, as shown in
Fig. 5
. The combination of gradient value and height value is more robust in classification than using only the gradient value. For example, it distinguishes between the roof of a vehicle and a road by the height value, as mentioned before. The cosine similarity can distinguish features such as a wall with a vertical structure and a road. In addition, the intensity value at each node is useful information for the segmentation process; for example, the property of a material such as asphalt (see
Fig. 5(d)
). The feature vector of the
i
th node on the previously introduced map can be expressed as follows:
(3) $${d}_{i}={\left[{d}_{i}^{g},{d}_{i}^{h},{d}_{i}^{n},{d}_{i}^{i}\right]}^{\text{T}},\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}D=\left\{{d}_{1},\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{...}\text{\hspace{0.17em}\hspace{0.17em}},{d}_{m}\right\},$$
where
D
is a set of feature vector, and
m
is the number of cells. First, for a calculation of the magnitude of the gradient, it is necessary to search the four-directed neighborhood nodes along the radial and circular axes, which are then the nearest points that have observation data from the current point in both directions, as shown in
Fig. 6
. We denote them as
(4) $${d}_{i}^{\text{g}}=\sqrt{{({d}_{i}^{\text{gr}})}^{2}+{({d}_{i}^{\text{gc}})}^{2}}.$$
The gradient of the radial direction is written as
(5) $${d}_{i}^{\text{gr}}=\frac{h\left({N}_{2}^{\text{r}}\right)-h\left({N}_{1}^{\text{r}}\right)}{\left|\right|{N}_{2}^{\text{r}}-{N}_{1}^{\text{r}}\left|\right|}.$$
Next, the gradient of the circular direction can be computed as
(6) $${d}_{i}^{\text{gc}}=\frac{h\left({N}_{2}^{\text{c}}\right)-h\left({N}_{1}^{\text{c}}\right)}{\left|\right|{N}_{2}^{\text{c}}-{N}_{1}^{\text{c}}\left|\right|},$$
where
h
(*) is the height value of its point. Second, the height of each grid cell is computed by median filtering using the current value and the value of the neighborhood nodes.
(7) $${d}_{i}^{\text{h}}=\underset{k\in I}{\mathrm{max}}Z(k)-\underset{k\in I}{\mathrm{min}}Z(k),$$
where
Z
(
k
) is the
z
-coordinate value of points in the
I
th cell. Third, the estimated normal vector of
i
can be approximated normal to the neighborhood surface by performing a principal component analysis (PCA) on the neighborhood covariance matrix
[17]
. The eigenvector corresponding to the smallest eigenvalue gives an estimate of the normal vector. The estimated normal vector provides a good approximation of the surface normal, but the results obtained using a PCA are, in general, not consistently oriented. However, taking advantage of the known viewpoint, we can flip the normal vector that is not pointing toward this viewpoint. To do so, we flip all normal vectors that form an angle
θ
> 90° between a normal vector and view vector, which is a vector from the center of the vehicle to the center point of the vehicle sensor. The cosine similarity can be described using the normal vector
v
_{n}
and vertical vector
v
_{z}
, which indicates the
z
-axis direction in the coordinate of the ego-vehicle.
(8) $${d}_{i}^{\text{n}}=\mathrm{cos}\theta =\frac{{v}_{\text{n}}\cdot {v}_{\text{z}}}{\left|\right|{v}_{\text{n}}\left|\right|\text{\hspace{0.17em}}\left|\right|{v}_{\text{z}}\left|\right|}.$$
When this value of similarity is zero, the point has a possibility not to be a part of the ground, but rather a vertical structure or obstacle. In addition, the intensity value can represent the characteristics of a material or surface; thus, this value is useful for distinguishing different objects.
Display of 3DGI features: (a) gradient value, (b) height value, (c) cosine similarity, and (d) intensity value.
Search of neighborhood nodes for gradient value.
d
_{i}
, such as the gradient value, height, normal vector, and intensity. MRF theory provides a convenient and consistent way of modeling context-dependent entities such as image pixels and correlated features. This is achieved through characterizing mutual influences among such entities using a conditional MRF distribution. For the aforementioned reasons, MRF has been widely employed to solve vision problems. The segmentation based on MRF is to assign a label from label set
Ł
to each of the sites in
S
according to the feature values. Thus, a label,
f_{i}
, is assigned from the set
Ł
= {
road, non-road
} to site
i
∈
S
, where the elements in
S
index the map. Set
f
is
(9) $$f=\left\{{f}_{1},\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{...}\text{\hspace{0.17em}\hspace{0.17em}},{f}_{m}\right\}.$$
According to Bayes’ theory, when both the prior distribution and the likelihood function of a pattern are known, the best that can be estimated from these sources of knowledge is Bayes’ labeling. The maximum a posteriori probability (MAP) solution, as a special case in the Bayes’ framework, is
(10) $${f}^{*}=\underset{f}{\text{argmax}}P(f\text{\hspace{0.17em}|\hspace{0.17em}}D).$$
In other words, a regularization solution is obtained by minimizing the energy of the form
(11) $$E(f)={\displaystyle \sum}_{i\in S}{E}_{1}({f}_{i})+\lambda {\displaystyle \sum}_{i\in S}{\displaystyle \sum}_{{i}^{\prime}\in S,i\ne {i}^{\prime}}{E}_{2}({f}_{i},{f}_{{i}^{\prime}}),$$
where
E
_{2}
(
f_{i}
,
f_{i′}
) is the cost of assigning labels
f_{i}
and
f_{i′}
to two neighboring nodes; this is referred to as the discontinuity cost. Here,
E
_{1}
(
f_{i}
) is the cost of assigning label
f_{i}
to node
i
, which is referred to as the data cost, and
λ
controls the balance between
E
_{1}
(
f_{i}
) and
E
_{2}
(
f_{i}
,
f_{i′}
). Next, we need to compute the posterior distribution from the prior and the likelihood. Actually, there have been no researches based on MRF to recognize a road area using 3D point information. We therefore try to provide a comparison of the results using previous similar techniques for modeling the cost function in a vision application.
First, we consider the Boykov and Jolly (BJ) segmentation algorithm, which was proposed in
[18]
for a monochrome image, in which a user imposes hard constraints by drawing a few strokes. To define the likelihood term,
E
_{1}
(
f_{i}
), BJ computed the foreground and background gray-level distributions. These were normalized to be used as foreground probability
P
_{1}
(
i
) and background probability
P
_{2}
(
i
) and provide the likelihood term with negative log-likelihoods as
(12) $${E}_{1}({f}_{i})=-\mathrm{ln}{P}_{l}({d}_{i}).$$
They defined the smoothing term as
(13) $${E}_{2}({f}_{i},{f}_{{i}^{\prime}})=\text{exp}\left\{-\beta {({d}_{i}-{d}_{{i}^{\prime}})}^{2}\right\},$$
where parameter
β
adjusts the sensitivity for the intensity difference. When the constant
β
= 0, the smoothing term is simply the well-known Ising interaction, and it is usually far more effective to set
β
> 0 and relax the smoothness tendency.
The Lazy Snapping algorithm
[19]
, proposed by Y. Li and others, clusters the RGB intensities of pixels on the dataset using the
k
-means method. The centroids of the foreground and background clusters are denoted as
n
and
m
are the cluster indices. Computing the minimum distance from
d_{i}
to the foreground clusters as
(14) $${E}_{1}({f}_{i})=\frac{{\text{Dist}}_{i}^{f}}{{\text{Dist}}_{i}^{1}+{\text{Dist}}_{i}^{2}},$$
(15) $${E}_{2}({f}_{i},{f}_{{i}^{\prime}})=\frac{1}{\text{||}{d}_{i}-{d}_{{i}^{\prime}}\text{\hspace{0.05em}||}+1},$$
where
f
value; the number of clusters
k
is determined by an experiment based on the performance criteria (
k
= 15 in our work).
Finally, we enhance the discrimination between a road/non-road using the above feature vectors and learning from the training data to extract the effective information for the segmentation. A support vector machine (SVM)
[20]
with a kernel trick provides one of the schemes for carrying out this task, which generates a more discriminative nonlinear classification boundary between a road and non-road. SVM is a powerful machine learning technique for binary classification, supported by a strong theoretical foundation and excellent empirical success.
We are given X training data {
x
_{1}
,
x
_{2}
, ... ,
x
_{M}
}(
x
_{i}
∈
R
) and their labels {
y
_{1}
,
y
_{2}
, ... ,
y_{M}
}(
y_{i}
∈ {
Road, Non-Road
}). The SVM output function can be expressed as
(16) $${f}_{\text{SVM}}(x)={w}^{\text{T}}\varphi (x)+b,$$
where superscript T represents the transpose operator, and
w
and
b
are the weight vector and bias, respectively, which should be determined appropriately through learning. Input vector
x
is classified according to
f
_{SVM}
. Geometrically,
w
indicates the normal vector of a hyperplane. All vectors lying on one side of the hyperplane are classified as a road, and all vectors on the other side are classified as a non-road. We construct the likelihood term based on the probabilistic outputs of SVM (P-SVM). Note that the training data are the proposed feature vectors. We restrict the values of the likelihood and smoothing terms in the cost function to the range [0, 1].
(17) $${E}_{1}({f}_{i})=-\mathrm{log}\left(P({d}_{i}|{f}_{i})\right),$$
(18) $$P({d}_{i}{\displaystyle \text{|}}{f}_{i})=\frac{1}{1+\text{exp}({a}_{0}+{a}_{1}{f}_{i})}.$$
The logistic function is useful for converting the SVM outputs, such as the probability, into values between 0 and 1. The coefficients
a
_{0}
and
a
_{1}
should be optimally tuned for the road and non-road training data, respectively.
(19) $$P({d}_{i}{\displaystyle \text{|}}{f}_{i})=-\text{log}\left(\frac{1}{1+\text{exp}({a}_{0}+{a}_{1}{f}_{i})}\right).$$
To ensure the local consistency, we define the smoothness cost function as
(20) $${E}_{2}({f}_{i},{f}_{{i}^{\prime}})=\text{min}\left(\left|{f}_{i}-{f}_{{i}^{\prime}}\right|,\epsilon \right),$$
where
ε
is a truncation value.
i
sends to a neighboring node
i
^{′}
at time
t
. When using negative log probabilities, all entries in
(21) $${m}_{i{i}^{\prime}}^{t}({f}_{{i}^{\prime}})=\underset{{f}_{i}}{\mathrm{min}}\left({E}_{2}\left({f}_{i},{f}_{{i}^{\prime}}\right)+\text{\hspace{0.05em}}{E}_{1}({f}_{i})\text{\hspace{0.05em}}+{\displaystyle \sum}_{i\in S}{\displaystyle \sum}_{j\in S,i\ne j}{m}_{ji}^{t-1}({f}_{i})\right).$$
After
T
iterations, a belief vector is computed for each node as follows:
(22) $${b}_{{i}^{\prime}}({f}_{{i}^{\prime}})={E}_{1}({f}_{{i}^{\prime}})+{\displaystyle \sum}_{i\in S}{\displaystyle \sum}_{j\in S,i\ne j}{m}_{ji}^{T}({f}_{{i}^{\prime}}).$$
Finally, the label
f
^{*}
that minimizes
b_{i′}
(
f_{i′}
) individually at each node is selected
[21]
.
ε
= 0.5. In the second experiment, for the best selection of the likelihood terms (Gaussian distribution,
k
-means clustering, P-SVM), a comparison of the performance according to what is selected is performed. This was accomplished with a hand-labeled ground truth on three kinds of road types. In the case of the first scenario, like the uphill road, the result of classification with a Gaussian distribution method shows that a road/non-road is not perfectly separated owing to the large gap in the various heights. When the
k
-means clustering method is applied, the value of
k
is most influential. To determine the optimal
k
value in our problem, we tried to conduct an experiment on the performance variance according to the
k
value. Because a lesser value does not provide a good result, and a value larger than 15 requires too much time, the performance of classification cannot be largely enhanced; thus, we finally determined that the proper value of
k
is 15 based on our experiments. However, this method also shows that a false detection occurs in some parts of the vehicle’s surroundings. We can see that the proposed method based on P-SVM provides more excellent results than other methods (Gaussian or
k
-means) in two scenes. For SVM, we use a radial basis function model as the kernel function. To obtain parameters
C
and
γ
, we use cross-validation and grid-search by LibSVM
[20]
. Their best accuracy was achieved at
C
= 2 and
γ
= 8. Unlike other methods, this method shows good classification results, not only for a median strip but also for a vehicle and ground area in front of the vehicle. Similar results can be seen in the case of a separated road (SR) type. We can see that some parts of the vehicle are misclassified as a region of road. In conclusion, we can see a reduction in the rate of mismatching using the P-SVM method compared with other methods.
A. Quantitative Evaluations
To evaluate the quantitative performance of our road detection method, we used various real road types including those previously mentioned; that is, FR, UR, DR, MR, and IR. Because the few published datasets that exist are limited in terms of the number of frames and utilized road environments, we collected over 100 frames on the different aforementioned road types under a test environment (over 2 km), as can be seen in
Fig. 1
. We built up our dataset through hand labeling using a 3D point cloud editing tool for the ground truth on
road
and
non-road
areas in each frame. The quantitative evaluations with our dataset are based on three measurements, accuracy rate (AR), precision rate (PR), and recall rate (RR). The three measurements are defined as follows:
(23) $$\text{Accuracyrate\hspace{0.17em}}(\%)=\frac{{N}_{\text{TN}}+{N}_{\text{FP}}}{{N}_{\text{TP}}+{N}_{\text{FN}}+{N}_{\text{FP}}+{N}_{\text{TN}}}\times 100,$$
(24) $$\text{Precision\hspace{0.17em}\hspace{0.17em}rate\hspace{0.17em}\hspace{0.17em}}(\text{\%})=\frac{{N}_{\text{TP}}}{{N}_{\text{TP}}+{N}_{\text{FP}}}\times 100,$$
(25) $$\text{Recall\hspace{0.17em}\hspace{0.17em}rate\hspace{0.17em}\hspace{0.17em}}(\text{\%})=\frac{{N}_{\text{TP}}}{{N}_{\text{TP}}+{N}_{\text{FN}}}\times 100,$$
where
N
_{TP}
is the number of road cells labeled as road in the reference scene,
N
_{FP}
is the number of non-road cells erroneously labeled as road (false positives) in the reference scene,
N
_{FN}
is the number of road pixels erroneously labeled as non-road (false negatives) in the reference scene, and
N
_{TN}
is the number of non-road cells labeled as non-road in the reference scene.
Table 1
shows some performance results of the proposed method for the six different road types and those obtained with the methods in
[14]
,
[9]
, and
[24]
, respectively. First, the results of our method in the case of the flat road (FR) type are shown in
Table 1
. Similar results are obtained by all methods for this road type (92.3%, 89.8%, 89.2%, and 90.0%, respectively). Guo’s method, which is most similar to our own, shows that misclassification has happened at the boundary between vehicle and road because of a lack of point cloud information (see
Fig. 7(m)
). This is because Guo’s method uses only a gradient value for labeling, but the gradient value of the vehicle roof is similar to that of the ground. The other two methods,
[9]
and
[24]
, are based on ground plane modeling and show good detection results (approximately 90%) in the case of the FR type, unlike other road types (UR, DR, MR, SR, IR). In FR type, the Himmelsbach method
[9]
using the local plane model shows better results than Guo’s method
[14]
, which is based on an MRF model, as shown
Table 1(b)
. We think that the line fitting method proposed by Himmelsbach gives a good result for the slope road along the radial direction, but it causes a problem particularly at the boundary region between the
road
and the
non-road
along the circular direction because of the discontinuity of the model (see
Fig. 7(r)
). The accuracy of the Silk method
[24]
is 80.8 %, and we think that this method is not robust in the case of a sloped road because it uses only a plane model.
In
Table 1
, (d) and (e) show the separated and combined road types of various slope. From the results, we can see that the accuracy of two methods,
[9]
and
[24]
, which are based on plane estimation, is not very high (see
Figs. 7(s)
and
7(w)
). These road types cannot be described well by one model or even a few models because of the various height profiles.
Four examples in the 3D LiDAR datasets are presented. Their ground truth is labeled as ground (green) and other objects (red). Each algorithm’s result is described as true positive (green), false positive (red), and false negative (blue), respectively.

However, our proposed method, which is based on a 3DGI feature and region-based labeling, did detect these road types accurately. The accuracy of our approach reached 90.2% and 83.5%, respectively, which is reasonable for complicated road conditions.
In
Table 1
, (f) shows another sloped-road scene with a T-junction. Our method and the algorithm in
[14]
gave good results among the four types of methods, though some ground points (false negative points) are misclassified as obstacles by the algorithms in
[9]
and
[24]
(as shown in
Figs. 7(t)
and
7(x)
). This is because the root-mean-square error of the line fitting for a steep slope exceeds a predefined threshold.
B. Computation Time
As the algorithm is applied to the autonomous vehicle, the real-time performance of the road detection algorithm is our main focus. We have calculated the computation time of the proposed approach for numerous scenes acquired by our experimental vehicle. As can be seen from
Fig. 8
, the average computation time of our method is 0.150 s/scan.
Computation time of our method (red) compared to that of the Silk [24] (green), Himmelsbach [9] (blue), and RBNN [25] (cyan) methods.
Our method is slower than those that are based on plane estimation, yet it is much faster than the previous works of
[12]
–
[13]
, which were based on an MRF model. The processing times of these methods,
[12]
–
[13]
, with the exception of the training times, are similar at about 2.31 s/scan. Consequently, our approach can reach an almost real-time performance, because the refresh rate of the scan is 0.1 s/scan.
This work was supported by the IT R&D program of MSIP/IITP (B0101-15-0134, Development of Decision Making/Control Technology of Vehicle/Driver Cooperative Autonomous Driving System Based on ICT).
Corresponding Author bbhan97@gmail.com
Jaemin Byun received his BS degree in mechatronics engineering and his MS degree in robotics from Chungnam National University, Daejeon, Rep. of Korea, in 2005 and 2007, respectively. From 2007 to 2008, he worked as a researcher engineer for Doddam Systems, Daejeon, Rep. of Korea. He is currently a senior researcher engineer of the Intelligent Robot Control Research Section, Electronics and Telecommunications Research Institute, Daejeon, Rep. of Korea. His main research interests are autonomous vehicles, environmental perception for navigation, and motion planning.
bsseo@etri.re.kr
Beom-Su Seo received his BS and MS degrees in computer science and statistics from the University of Seoul, Rep. of Korea, in 1996 and 1998, respectively. He worked as a researcher engineer for the System Engineering Research Institute, Daejeon, Rep. of Korea, in 1998. He is currently a director of the Intelligent Robot Control Research Section, Electronics and Telecommunications Research Institute, Daejeon, Rep. of Korea.
jihong@cnu.ac.kr
Jihong Lee received his BS degree in electronics engineering from Seoul National University, Rep. of Korea, in 1983 and his MS and PhD degrees in robotics from the Korea Advanced Institute of Science and Technology, Daejeon, Rep. of Korea, in 1985 and 1991, respectively. From 1985 to 1988, he worked for Hyundai Heavy Industry, Ulsan, Rep. of Korea. Since 1994, he has been with the Department of Mechatronics Engineering, Chungnam National University, Daejeon, Rep. of Korea, where he is now a professor. His main research interests are high-speed mobile robots in rough terrain, underwater robots, and localization.

I. Introduction

Interest in technology for autonomous navigation has dramatically increased along with the development of intelligent vehicles. In particular, more accurate environment recognition is required for safe driving on complex inner city roads and highways. Many studies are thus handling obstacle and drivable space detection for road environments. Many past researches have studied the detection of drivable spaces and obstacles using a camera, radar, and 2D LiDAR. Recent approaches have looked at transferring 2D data processing into 3D information through the use of a high-performance 3D LiDAR sensor and stereo camera
[1]
. In 2007, in the DARPA Urban Challenge, some teams, for the first time in the history of the challenge, used a 3D LiDAR sensor for drivable space and obstacle detection.
Recently, there have been a number of studies using 3D point cloud data
[2]
–
[5]
. However, most of the methods incorporated in these studies are focused on ground extraction on, ideally, a flat road and not on sloping or undulating roads, which are often found in more complex road environments.
We now move on to explain about previous related works for road and obstacle detection with 3D LiDAR. One of the most widely used methods for road and obstacle detection with 3D LiDAR is that of 3D point cloud projection. In this method, 3D point clouds are projected onto an assumed or estimated ground plane. Then, coordinates whose y-value (height) exceeds a given threshold are found. The ground plane is then represented by a grid, in which each cell contains only one representative value, such as an average, a maximum, or a minimum z-value of the number of points in a cell
[6]
–
[7]
. One of the advantages of this is that several sensors can be easily fused, thereby ensuring that any resulting mapping is straightforward. Many teams participating in the DARPA Urban Challenge successfully applied this method.
However, the difficulty of road detection remains in sloped terrains or situations with a large rolling/pitch angle of the host vehicle. Both J. Leonard and others
[8]
and M. Himmelsbach and others
[9]
describe a method that identifies points in a point cloud that are likely to be on the ground. The method then attempts to fit a ground model through those ground points. In addition, other points above the ground model are dealt with as obstacle points.
B. Douillard and others
[10]
proposed a strategy that utilizes ground models of non-constant resolution, either providing a continuous probabilistic surface or a terrain mesh built from the structure of a range image. F. Moosmann and others
[11]
proposed a graph-based approach to segment the ground and objects from 3D LiDAR scans using a novel generic criterion based on local convexity measures. D. Anguelov and others
[12]
and Y. Lu and C. Rasmussen
[13]
apply machine learning approaches to the segmentation process with 3D point clouds — the results of which show good performances.
The theory of Markov random fields (MRFs) can be considered through characterizing mutual influences among entities, such as points, cells, or nodes, under a conditional MRF distribution. In a supervised learning framework, an MRF is trained to label points with a
- ■ An overall pipeline (flow and process) and proper combination (of process) for accurate road detection in an acceptable processing time.
- ■ Guidelines for the best selection of the proper features using 3D geometrical information (3DGI) (the gradient value, the average height, the normal vectors, and the intensity value) from 3D point clouds in undulated and sloped roads such as downhill/uphill roads.
- ■ A new formulation of the road detection problem based on MRF modeling through the selection of the proper likelihood term, as well as the determination of different regions with different classes through loopy belief propagation (LBP).
- ■ The construction of a dataset for quantitative evaluation of various real roads, which is something that is not available in any existing public dataset, is used to show the performance of our work and that of relative works.

PPT Slide

Lager Image

PPT Slide

Lager Image

II. 3D Point Cloud Representation and Projection

- 1. Correction of Distorted 3D Point Cloud

The Velodyne LiDAR mounted to the top of the vehicle shown in
Fig. 3
has 64 lasers to cover different vertical angles and provide a 360-degree field of view of the surrounding environment at a rate of more than 1.3 million points per second. The LiDAR returns spherical point coordinates and thus needs to transform these into points fit for a Cartesian space. For the transformation, we have to consider calibration parameters, such as distance correction factor (Δ
PPT Slide

Lager Image

R ˜

and translation
t →

for all data, the undistorted coordinates
p O t+Δt i

of point
p O r i

referenced with respect to
PPT Slide

Lager Image

- 2. Polar Grid Map Representation

We can obtain over 1.3 million 3D point cloud points per second from a 3D LiDAR sensor such as the Velodyne 3D LiDAR for the recognition of the environment surrounding a vehicle, unlike the 2D information obtained from a 2D LiDAR and camera. A lot of time is required to compute all of these points for proper recognition. It is therefore necessary to use a method that can efficiently handle such computations. Such methods can be roughly categorized into four conventionally used types. The first is an elevation map that represents information in 2.5D using a horizontal plane constructed from rectangular grid cells that contain 3D point clouds; the 3DGI can be described through the elevation value corresponding to each cell.
Another method uses a
III. Feature Extraction

This section describes how to select and extract the necessary features for distinguishing between a drivable area and a non-drivable area based on MRF. The recognition of drivable and non-drivable spaces should be considered with a geometrical structure of the environment surrounding the vehicle. To do so, past works use a height value for the detection of a road/obstacle through a comparison between the height value and its predetermined threshold value. However, this method may misinterpret a part of the road as an obstacle when the vehicle passes a downhill/uphill or sloped road. M. Li and Q. Li
[15]
proposed a method of 4DSG. 4DSG not only can reflect the flatness of the pavement but also the distinguishing features of a point cloud on curbs in four directions. F. Moosmann and others
[11]
introduced a generic criterion based on local convexity measures for the segmentation of 3D LiDAR data in non-flat urban environments. C. Guo and others
[14]
focused on a gradient cue instead of the absolute height information, since it can reflect essential differences between drivable and non-drivable regions. This approach is similar to our work. However, under-segmentation can, unfortunately, occur because it uses only the gradient value; for example, the flat roof of a car cannot be distinguished from a road surface.
In this paper, we propose a novel feature vector (3DGI) that can describe the characteristics of a local road surface in a 3D environment to detect a region of a road using four types of local features: the magnitude of the gradient; the height value; the cosine similarity of the normal vector and the vector in the
d i g

is the magnitude of the gradient,
d i h

is the height,
d i n

is the similarity value of the normal vector,
d i i

is the intensity value,
N 1 r , N 2 r , N 1 c

, and
N 2 c

, where c indicates the circular direction, and r is the radial direction. The aforementioned magnitude of the gradient can be expressed as follows:
PPT Slide

Lager Image

PPT Slide

Lager Image

IV. Classification Based on MRF Modeling

- 1. Formulation of Road Detection Based on MRF Model

In this paper, MRF modeling is used for extracting a road area with previous introduced features,
{ D n 1 }

and
{ D m 2 }

, respectively, and
Dist i 1 = min n || d i − D n 1 ||, Dist i 2 = min m || d i − D m 2 ||

, they defined the likelihood as
Dist i f

is the distance from the center of the cluster to the corresponding
- 2. Road Classification by LBP

The max-product BP algorithm works by passing messages around the graph defined by the four connected grids. Each message is a vector of the dimension given by the number of possible levels. Let
m i i ′ t

be the message that node
m i i ′ 0

are initialized to zero, and at each iteration, new messages are computed in the following way:
V. Experiments

- 1. Experimental Setup

This section presents the experimental results of our work. To show the validation of our work, we collected real 3D point cloud data using experimental vehicles equipped with a 3D LiDAR sensor, such as a Velodyne sensor, under various road environments, such as a flat road, uphill, downhill, and sloped road, as shown in
Fig. 3
.
The software development environment consists of an ROS and OPRoS framework
[22]
. We employ the PCL library
[23]
for pre-processing with a 3D point cloud and use C++ language for an implementation of our work.
We then validate the proposed method by a quantity evaluation. In addition, we compare the performance of our approach and previous related works (
[14]
,
[9]
, and
[24]
) for various challenging road environments. In the first experiment, the procedure of the proposed method is sequentially performed, as shown in
Fig. 2
. First, 3D point cloud points obtained by a Velodyne sensor are used to compose a grid map based on a polar coordinate system performance. The best accuracy rate of detection was achieved at
- 2. Results and Discussion

PPT Slide

Lager Image

Performance of proposed method and related works[14],[9], and[24].

Method | Proposed approach | C. Guo et al. | M. Himmelsbach et al. | Silk method | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

AR | PR | RR | AR | PR | RR | AR | PR | RR | AR | PR | RR | |

(a) FR | 92.3 | 92.1 | 97.8 | 89.8 | 92.4 | 86.6 | 89.2 | 98.6 | 98.3 | 90.0 | 94.7 | 82.6 |

(b) UR | 92.0 | 97.4 | 93.0 | 86.4 | 96.9 | 86.7 | 88.7 | 95.4 | 83.4 | 80.8 | 89.3 | 81.9 |

(c) DR | 93.2 | 94.4 | 91.1 | 81.4 | 95.1 | 82.6 | 89.1 | 95.7 | 84.1 | 82.4 | 90.7 | 83.1 |

(d) MR | 90.2 | 97.6 | 90.1 | 80.9 | 93.2 | 80.1 | 83.0 | 95.6 | 72.0 | 79.1 | 66.9 | 97.3 |

(e) SR | 83.5 | 98.4 | 80.5 | 81.1 | 99.3 | 81.0 | 72.9 | 91.4 | 61.2 | 93.3 | 87.4 | 88.5 |

(f) IR | 95.7 | 98.1 | 99.7 | 92.4 | 94.2 | 84.4 | 80.2 | 95.0 | 77.6 | 73.2 | 51.4 | 98.3 |

Total | 91.2 | 96.3 | 91.3 | 85.3 | 95.1 | 83.5 | 83.8 | 95.3 | 79.4 | 83.1 | 80.1 | 88.6 |

PPT Slide

Lager Image

VI. Conclusion

In this paper, we presented a robust method for road detection using 3D point clouds on challenging road environments such as downhill/uphill sloped roads. The correction of a 3D point cloud distorted by the motion of the vehicle was proposed. We introduced a guideline for the best selection of the proper features, such as the gradient value and average height of a neighboring node.
We have described, in detail, the transformation of the road detection problem into a classification problem of different features and an inference algorithm based on MRF with loopy belief propagation. In our experiments, the results proved that the proposed method is more robust and reliable than conventional approaches in a variety of challenging environments.
BIO

Broggi A.
“Terrain Mapping for Off-Road Autonomous Ground Vehicles Using Rational B-Spline Surfaces and Stereo Vision,”
IEEE Intell. Vehicles Symp.
Gold Coast, Australia
June 22–26, 2013
648 -
653

Urmson C.
2008
“Autonomous Driving in Urban Environments: Boss and the Urban Challenge,”
J. Field Robot.
25
(8)
425 -
466
** DOI : 10.1002/rob.20255**

Montemerlo M.
2008
“Junior: The Stanford Entry in the Urban Challenge,”
J. Field Robot.
25
(9)
569 -
597
** DOI : 10.1002/rob.20258**

Bohren J.
2008
“Little Ben: The Ben Franklin Racing Team’s Entry in the 2007 Darpa Urban Challenge,”
J. Field Robot.
25
(9)
598 -
614
** DOI : 10.1002/rob.20260**

Buehler M.
,
Iagnemma K.
,
Singh S.
2009
“The DARPA Urban Challenge: Autonomous Vehicles in City Traffic,”
Springer
Berlin, Germany

Thrun S.
2003
“Learning Occupancy Grid Maps with Forward Sensor Models,”
Auton. Robots
15
(2)
111 -
127
** DOI : 10.1023/A:1025584807625**

Tay M.
2008
“An Efficient Formulation of the Bayesian Occupation Filter for Target Tracking in Dynamic Environments,”
Int. J. Vehicle Auton. Syst.
6
(1)
155 -
171
** DOI : 10.1504/IJVAS.2008.016483**

Leonard J.
2008
“A Perception-Driven Autonomous Urban Vehicle,”
J. Field Robot.
25
(10)
727 -
774
** DOI : 10.1002/rob.20262**

Himmelsbach M.
,
Hundelshausen F.
,
Wuensche H.
“Fast Segmentation of 3D Point Clouds for Ground Vehicles,”
IEEE Intell. Vehicles Symp.
San Diego, CA, USA
June 21–24, 2010
560 -
565

Douillard B.
“On the Segmentation of 3D LiDAR Point Clouds,”
IEEE Int. Conf. Robot. Autom.
Shanghai, China
May 9–13, 2011
2798 -
2805

Moosmann F.
,
Pink O.
,
Stiller C.
“Segmentation of 3D Lidar Data in Non-flat Urban Environments Using a Local Convexity Criterion,”
IEEE Intell. Vehicles Symp.
Xi’an, China
June 3–5, 2009
215 -
220

Anguelov D.
“Discriminative Learning of Markov Random Fields for Segmentation of 3D Scan Data,”
IEEE Comput. Vis. Pattern Recogn.
June 20–25, 2005
2
169 -
176

Lu Y.
,
Rasmussen C.
“Simplified Markov Random Fields for Efficient Semantic Labeling of 3D Point Clouds,”
IEEE/RSJ Int. Conf. Intell. Robots Syst.
Vilamoura, Portugal
Oct. 7–12, 2012
2690 -
2697

Guo C.
“Graph-Based 2D Road Representation of 3D Point Clouds for Intelligent Vehicles,”
IEEE Intell. Vehicles Symp.
Baden-Baden, Germany
June 5–9, 2011
715 -
721

Li M.
,
Li Q.
2009
“Real-Time Road Detection in 3D Point Clouds Using Four Directions Scan Line Gradient Criterion,”
Future
Xi’an, China
5 -

Muhammad N.
,
Lacroix S.
“Calibration of a Rotating Multi-beam Lidar,”
IEEE/RSJ Int. Conf. Intell. Robots Syst.
Taipei, Taiwan
Oct. 18–22, 2010
5648 -
5653

Rusu R.B.
2010
“Semantic 3D Object Maps for Everyday Manipulation in Human Living Environments,”
KI-Künstliche Intelligenz
24
(4)
345 -
348
** DOI : 10.1007/s13218-010-0059-6**

Boykov Y.
,
Kolmogorov V.
2004
“An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision,”
IEEE Trans. Pattern Anal. Mach. Intell.
26
(9)
1124 -
1137
** DOI : 10.1109/TPAMI.2004.60**

Li Y.
2004
“Lazy Snapping,”
ACM Trans. Graph.
23
(3)
303 -
308
** DOI : 10.1145/1015706.1015719**

Chang C.-C.
,
Lin C.-J.
2011
“LibSVM: A Library for Support Vector Machines,”
ACM Trans. Intell. Syst. Technol.
2
(3)
1 -
27

Felzenszwalb P.F.
,
Huttenlocher D.P.
2006
“Efficient Belief Propagation for Early Vision,”
Int. J. Comput. Vis.
70
(1)
41 -
54
** DOI : 10.1007/s11263-006-7899-4**

Jang C.
2010
“OPRoS: A New Component-Based Robot Software Platform,”
ETRI J.
32
(5)
646 -
656
** DOI : 10.4218/etrij.10.1510.0138**

Rusu R.B.
,
Cousins S.
“3D is here: Point Cloud Library (PCL),”
IEEE Int. Conf. Robot. Autom.
Shanghai, China
May 9–13, 2011
1 -
14

Choi S.
“Robust Ground Plane Detection from 3D Point Clouds,”
Int. Conf. Contr., Autom. Syst.
Goyang, Rep. of Korea
Oct. 22–25, 2014
1076 -
1081

Klasing K.
,
Wollherr D.
,
Buss M.
“A Clustering Method for Efficient Segmentation of 3D Laser Data,”
IEEE Int. Conf. Robot. Autom.
Pasadena, CA, USA
May 19–23, 2008
4043 -
4048

Citing 'Toward Accurate Road Detection in Challenging Environments Using 3D Point Clouds
'

@article{ HJTODO_2015_v37n3_606}
,title={Toward Accurate Road Detection in Challenging Environments Using 3D Point Clouds}
,volume={3}
, url={http://dx.doi.org/10.4218/etrij.15.0113.1131}, DOI={10.4218/etrij.15.0113.1131}
, number= {3}
, journal={ETRI Journal}
, publisher={Electronics and Telecommunications Research Institute}
, author={Byun, Jaemin
and
Seo, Beom-Su
and
Lee, Jihong}
, year={2015}
, month={Jun}