For a vision-based driver assistance system, unusual motion detection is one of the important means of preventing accidents. In this paper, we propose a real-time unusual-motion-detection model, which contains two stages: salient region detection and unusual motion detection. In the salient-region-detection stage, we present an improved temporal attention model. In the unusual-motion-detection stage, three kinds of factors, the speed, the motion direction, and the distance, are extracted for detecting unusual motion. A series of experimental results demonstrates the proposed method and shows the feasibility of the proposed model.
The reduction of traffic accidents and improved road safety are important research subjects in transportation-related institutions or the vehicle industry. Driver-assistance systems (DASs) aim at bringing potentially hazardous conditions to the driver’s attention in real time
, and they also aim at more driver comfort. Autonomous driving is also already reality in on-road vehicles
At present, moving object detection and tracking is an important research subject of DASs. Usually, approaches of moving object detection and tracking first identify objects and then try to estimate the motion by tracking the objects. Dynamic scenes, the diversity of moving objects, including non-rigid body pedestrians and rigid body vehicles, as well as weather, light and other factors, make moving objects detection and tracking very difficult.
However, drivers seem to be more concerned about unusual motion regions than with moving objects in general. We suggest to detect the unusual motion regions at pixel-level rather than at moving object level. We estimate a collision risk for every single image point, independent of an object detection step.
Visual attention is one of the most important mechanisms of a human visual system. According to the visual attention mechanism, visual saliency can detect salient regions in image and video. The visual attention model, using a mathematical model to simulate the human visual system, became a “hot topic” in computer vision.
This article aims at using the visual attention mechanism to detect unusual motion for vision-based driver assistance. The remainder of this paper is organized as follows. The unusual motion detection framework is presented in Section 2. Section 3 describes salient region detection based on visual attention. Section 4 introduces unusual motion detection within the detected salient regions. Section 5 presents the experimental results. Finally, Section 6 concludes the paper and opens perspectives for future work.
2. Unusual Motion Detection Framework
In DASs, unusual motion detection is one of the important means of preventing accidents. Motion detection techniques often rely on detecting a moving object before computing motion
. The performance of such methods greatly depends on the performance of moving object detection.
It is a common human experience of getting out of the way of a quickly moving object before actually identifying what it is
. In conclusion, a human can perceive motion earlier than form and meaning.
In this paper, we propose an unusual-motion-detection model for vision-based driver assistance. The proposed model is able to detect the collision risk for the considered image points, that is independent of an object detection step.
illustrates our proposed unusual-motion-detection framework. This framework contains two stages, salient region detection based on visual attention, and unusual motion detection within the detected salient regions.
The proposed unusual-motion-detection framework.
Since directly computing pixel-level unusualness is computationally expensive, we first introduce a salient-region-detection method, so as to define the unusual-motion-detection areas. In the stage of salient region detection, an improved temporal attention model is proposed to detect the salient regions.
In the second stage, three different factors, the speed, the motion direction, and the distance are considered to detect the unusual motion for every pixel within the detected salient regions.
3. Salient Region Detection Based on Visual Attention
In video sequences, motion plays an important role and human perceptual reactions will mainly focus on motion contrast regardless of visual texture in the scene.
Visual saliency measures low-level stimuli to the human vision system that grab a viewer’s attention in the early stage of visual processing
. While many models have been proposed in the image domain, much less work has been done on video saliency
Zhai and Shah
proposed a temporal attention model to use the interest point correspondences and the geometric transformations between images. The projection errors of the interest points, defined by the estimated homographies, are incorporated in the motion contrast computation.
Tapu and Zaharia
extended the temporal attention model. Different types of motion presented in the current scene are determined using a set of homographic transforms, estimated by recursively applying the Random Sample Consensus (RANSAC) algorithm, see
, on the interest correspondences.
In the previously developed methods, detecting the feature points is the first and most important step. Obviously, the performance of the temporal attention model is greatly influenced by the results of point correspondences
). The Scale Invariant Feature Transform (SIFT) is used to find the interest points and compute the correspondences between the points in video frames; see also
for a description of SIFT.
However, it is well known that the interest point distribution generally represents a rich texture information area. If there is less texture in the potential object regions, then there are no feature points to be detected in these regions and thus the potential object regions cannot be detected. An example of interest points, detected by SIFT, is shown in
. In this case, the region where the cat (running right to left) is located is the potential object region. As shown in
, most of the detected interest points are located in the background.
An example of detected interest points using Scale Invariant Feature Transform (SIFT). The top figure shows the original frame. The bottom figure shows the interest points detected by a SIFT detector.
In this stage, we extend the temporal attention model, proposed by Zhai and Shah
, to obtain dense point correspondences based on dense optical flow fields. The optical flow technique is the most widely used motion detection approach
. Optical flow at edge pixels is noisy if multiple motion layers exist in the scene.
Furthermore, in texture-less regions, dense optical flow may return error values
. To overcome this problem, we use a RANSAC algorithm on point correspondences to eliminate outliers.
As shown in
, this stage consists of the following steps:
Flowchart of the salient region detection based on visual attention.
Step 1: Dense point matching
- First, the dense optical flow method TV-L
is used on two consecutive frames to calculate the dense point correspondences at 10-pixel intervals. Since the moving objects always appear on the lower part of the input frames, the dense optical flow method is applied only to the bottom two thirds of the input frames. This reduces the computation time. Furthermore, to avoid the effect of noise, we exclude a 10-pixel-wide border around every frame.
Step 2: Background / Camera motion estimation
- Obviously, most of the points detected at Step 1 are located in the background. The subset of
background points can be determined with the epipolar geometry constraints.
We use the multi-view epipolar constraint which requires the background points to lie on the corresponding epipolar lines in subsequent images.
If the points are far away from the corresponding epipolar lines, then we can determine them as being foreground points.
For the spatial point correspondences detected at at Step 1, we apply a RANSAC algorithm respectively to determine the fundamental matrix
Step 3: Different types of motion recognition
- In practice, multiple motions are present that result from the moving objects, but also from background objects or camera movement.
In this case we determine a new subset of points formed by all the outliers and all the points not considered in previous step.
For the current subset, we apply a RANSAC algorithm recursively to determine multiple homographies until all the points belong to a motion class. The estimated homographies model different planar transformations in the scene.
Every estimated homography
has a set of points
as its inliers, and
is the number of inliers for
For every homography
, its inliers in
can be considered as being located in the same plane. However, the points may belong to a distributed region.
To avoid the problem, we use the
-means clustering algorithm to divide
}. The spanning region of
is denoted by
, which corresponds to a moving region.
Step 4: Saliency computing
- For all the moving regions determined at Step 3, we compute now their projection errors as saliency value.
The temporal saliency value of the moving region
is defined by
is the total number of homographies in the scene,
is the projection of
computed after applying
is the correspondence of
found by TV-L
is the spanning area of the subset
An example of salient region detection based on visual attention is demonstrated in
, where apparently the attention region in the sequences corresponds to the running cat.
An example of salient region detection based on visual attention. The top left figure shows the original frame, the top middle figure shows the dense point correspondences, and the top right figure shows the background points. The bottom left figure shows clustering results for the inliers of H1, the bottom middle figure shows the salient map, and the bottom right figure shows the salient region.
4. Unusual Motion Detection within the Detected Salient Regions
In the first stage, detecting the salient regions to define the unusual motion search area can reduce the computation time. In this stage, we detect the unusual motion within the detected salient regions.
We analyze the unusualness for every pixel from the following three factors: the speed, the motion direction, and the distance. As shown in
, this stage consists of the following steps:
Flowchart of unusual motion detection within detected salient regions.
Step 1: The speed factor analysis
- The optical flow technique can accurately detect motion in the direction of intensity gradient and is the most widely used motion detection approach
. The TV-L
method is one of the best optical flow methods proposed in recent years
. The optical flow features in each detected salient region are obtained using the TV-L
Intuitively, the speed is a determining factor in judging the unusualness of a pixel. Let (
) be the motion vector at a pixel location (
). Therefore, the unusualness value
) at pixel location (
) can be defined as follows:
are the maximum and minimum value of magnitude, respectively.
Step 2: The motion distance factor analysis
- Since only motion within some distance range is interesting for the driver, we enhance effects of motions within an assumed distance range, and decrease the impact of motions outside of this distance range. We use the general weighted operator of
to calculate the weight value of the motion distance factor
) as follows:
is the spatial distance between the pixel location (
) and (
), which is normalized to [0, 1]. Here,
is the threshold of the motion distance,
is − ln 2/(ln(1 −
) − ln 2) with
in the range of [0, 1], and
controls the strength of motion distance weighting. Larger values of
increase the effect of motion distance weighting so that the closer motion would contribute more to the unusualness of the current pixel.
Step 3: The motion direction factor analysis
- The motion direction is another factor in considering the unusualness for every pixel.
illustrates the motion direction for every pixel within the salient regions.
The motion direction for a pixel.
As shown in
, the host vehicle is in the bottom middle of a frame. Intuitively, for the left-half region, we should just consider those pixels with the motion directions in [−
/2] or [
]. Similarly, for the right-half region, we should just cope with the pixels with motion directions in [−
In practice, to deal with the width of the host vehicle car, we will adjust the right and left region. The weight value
) of the motion direction factor is defined as follows:
is the motion direction at the pixel location (
is the width of the host vehicle, and
controls the strength of motion direction weighting. Larger values of
increase the effect of motion direction weighting.
Step 4: The unusualness estimation
- Based on all the factors determined in the steps before, we compute the unusualness value
) at pixel location (
) as follows:
5. Experimental Results
To evaluate the performance of the proposed unusual-motiondetection model, we conducted experiments on different kinds of video. A few detailed results are shown in
. The following information is presented: the representative frames of the testing videos (
), the temporal saliency maps of the representative frames (
), the detected salient regions (
), the unusualness maps of the detected salient regions (
), and the regions that correspond to potentially unusual motions (
Unusual motion detection results for three different videos. Row (a) shows the original frames; row (b) shows the temporal saliency maps; row (c) shows the detected salient regions; row (d) shows the unusualness maps; and row (e) shows the regions that correspond to potentially unusual motions in the selected video (e.g. in the left column, correctly a bicyclist in the left region, and, incorrectly, motion on the ground due the the moving shadow caused by the car on the right).
In this paper, we have developed a model for detecting unusual motion of nearby moving regions. The model can estimate the unusualness for an output of warning messages to the driver to avoid vehicle collisions. To develop this model, two stages, salient region detection and unusual motion detection, were implemented.
Based on spatiotemporal analysis, an improved temporal attention model was presented to detect salient regions. Three factors, the speed, the motion direction, and the distance, were considered to detect the unusual motion within the detected salient regions. Experimental results show that the proposed real-time unusual-motion-detection model can effectively and efficiently detect unusually moving regions.
In our future work, we plan to extend the proposed method by taking into account not merely successive frames, but also some accumulated content (e.g. about the traffic context) of a video in order to increase the robustness of the algorithm and to incorporate an object tracking method.
Conflict of Interest No potential conflict of interest relevant to this article was reported.
The first author thanks China Scholarship Council (CSC) for supporting her stay at The University of Auckland.
Li-hua Fu is an associate professor at the College of Computer Science, Beijing University of Technology. She received the Ph.D. degree in computer software and theory from Northwestern Polytechnical University, China. Currently, her main research interests include computer vision, image processing, and image understanding.
Wei-dong Wu is a Master student at the College of Computer Science, Beijing University of Technology. His research is in the fields of visual attention detection and pattern recognition.
Yu Zhang is an undergraduate student at the College of Computer Science, Beijing University of Technology. His research activities are related to studying optical flow.
Reinhard Klette is a Fellow of the Royal Society of New Zealand and a professor at Auckland University of Technology. He (co-)authored more than 400 publications in peer-reviewed journals or conferences, and books on computer vision, image processing, geometric algorithms, and panoramic imaging. He was an associate editor of IEEE PAMI (2001-2008).
Fang C. Y.
Chen C. P.
Chen S. E.
“Critical motion detection of nearby moving vehicles in a visionbased driver-assistance system,”
IEEE Transactions on Intelligent Transportation Systems
DOI : 10.1109/TITS.2008.2011694
Herrtwich R. G.
“Making bertha see,”
Proceedings of 2013 IEEE International Conference on Computer Vision Workshops (ICCVW)
DOI : 10.1109/ICCVW.2013.36
“Modeling and tracking the driving environment with a particle-based occupancy grid,”
IEEE Transactions on Intelligent Transportation Systems
DOI : 10.1109/TITS.2011.2158097
“Computational modelling of visual attention,”
Nature Reviews Neuroscience
DOI : 10.1038/35058500
Goldman D. B.
“Learning video saliency from human gaze using candidate selection,”
Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
DOI : 10.1109/CVPR.2013.152
“Visual attention detection in video sequences using spatiotemporal cues,”
Proceedings of the 14th Annual ACM International Conference on Multimedia
Santa Barbara, CA
DOI : 10.1145/1180639.1180824
“Salient object detection based on spatiotemporal attention models,”
Proceedings of 2013 IEEE International Conference on Consumer Electronics (ICCE)
Las Vegas, NV
DOI : 10.1109/ICCE.2013.6486786
Lee J. J.
“Robust estimation of camera homography using fuzzy RANSAC,”
Proceedings of International Conference on Computational Science and Its Applications (ICCSA)
Kuala Lumpur, Malaysia
DOI : 10.1007/978-3-540-74472-6_81
Concise Computer Vision: An Introduction into Theory and Algorithms
Shyu M. L.
“Effective moving object detection and retrieval via integrating spatial-temporal multimedia information,”
Proceedings of 2012 IEEE International Symposium on Multimedia (ISM)
DOI : 10.1109/ISM.2012.74
Zhong S. H.
“Video saliency detection via dynamic consistent spatio-temporal attention modelling,”
Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI)
“An improved algorithm for TV-L1 optical flow,”
Statistical and Geometrical Approaches to Visual Motion Analysis
DOI : 10.1007/978-3-642-03061-1_2
Lewis J. P.
Black M. J.
“A database and evaluation methodology for optical flow,”
International Journal of Computer Vision
DOI : 10.1007/s11263-010-0390-2
“Parametric analysis of flexible logic control model,”
Discrete Dynamics in Nature and Society
article id. 610186
DOI : 10.1155/2013/610186