Advanced
Unusual Motion Detection for Vision-Based Driver Assistance
Unusual Motion Detection for Vision-Based Driver Assistance
International Journal of Fuzzy Logic and Intelligent Systems. 2015. Mar, 15(1): 27-34
Copyright © 2015, Korean Institute of Intelligent Systems
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • Received : December 16, 2014
  • Accepted : March 17, 2015
  • Published : March 25, 2015
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Li-hua Fu
College of Computer Science, Beijing University of Technology, Beijing, China
Wei-dong Wu
College of Computer Science, Beijing University of Technology, Beijing, China
Yu Zhang
College of Computer Science, Beijing University of Technology, Beijing, China
Reinhard Klette
School of Engineering, Auckland University of Technology, Auckland, New Zealand

Abstract
For a vision-based driver assistance system, unusual motion detection is one of the important means of preventing accidents. In this paper, we propose a real-time unusual-motion-detection model, which contains two stages: salient region detection and unusual motion detection. In the salient-region-detection stage, we present an improved temporal attention model. In the unusual-motion-detection stage, three kinds of factors, the speed, the motion direction, and the distance, are extracted for detecting unusual motion. A series of experimental results demonstrates the proposed method and shows the feasibility of the proposed model.
Keywords
1. Introduction
The reduction of traffic accidents and improved road safety are important research subjects in transportation-related institutions or the vehicle industry. Driver-assistance systems (DASs) aim at bringing potentially hazardous conditions to the driver’s attention in real time [1] , and they also aim at more driver comfort. Autonomous driving is also already reality in on-road vehicles [2] .
At present, moving object detection and tracking is an important research subject of DASs. Usually, approaches of moving object detection and tracking first identify objects and then try to estimate the motion by tracking the objects. Dynamic scenes, the diversity of moving objects, including non-rigid body pedestrians and rigid body vehicles, as well as weather, light and other factors, make moving objects detection and tracking very difficult.
However, drivers seem to be more concerned about unusual motion regions than with moving objects in general. We suggest to detect the unusual motion regions at pixel-level rather than at moving object level. We estimate a collision risk for every single image point, independent of an object detection step.
Visual attention is one of the most important mechanisms of a human visual system. According to the visual attention mechanism, visual saliency can detect salient regions in image and video. The visual attention model, using a mathematical model to simulate the human visual system, became a “hot topic” in computer vision.
This article aims at using the visual attention mechanism to detect unusual motion for vision-based driver assistance. The remainder of this paper is organized as follows. The unusual motion detection framework is presented in Section 2. Section 3 describes salient region detection based on visual attention. Section 4 introduces unusual motion detection within the detected salient regions. Section 5 presents the experimental results. Finally, Section 6 concludes the paper and opens perspectives for future work.
2. Unusual Motion Detection Framework
In DASs, unusual motion detection is one of the important means of preventing accidents. Motion detection techniques often rely on detecting a moving object before computing motion [3] . The performance of such methods greatly depends on the performance of moving object detection.
It is a common human experience of getting out of the way of a quickly moving object before actually identifying what it is [1] . In conclusion, a human can perceive motion earlier than form and meaning.
In this paper, we propose an unusual-motion-detection model for vision-based driver assistance. The proposed model is able to detect the collision risk for the considered image points, that is independent of an object detection step.
Figure 1 illustrates our proposed unusual-motion-detection framework. This framework contains two stages, salient region detection based on visual attention, and unusual motion detection within the detected salient regions.
PPT Slide
Lager Image
The proposed unusual-motion-detection framework.
Since directly computing pixel-level unusualness is computationally expensive, we first introduce a salient-region-detection method, so as to define the unusual-motion-detection areas. In the stage of salient region detection, an improved temporal attention model is proposed to detect the salient regions.
In the second stage, three different factors, the speed, the motion direction, and the distance are considered to detect the unusual motion for every pixel within the detected salient regions.
3. Salient Region Detection Based on Visual Attention
In video sequences, motion plays an important role and human perceptual reactions will mainly focus on motion contrast regardless of visual texture in the scene.
Visual saliency measures low-level stimuli to the human vision system that grab a viewer’s attention in the early stage of visual processing [4] . While many models have been proposed in the image domain, much less work has been done on video saliency [5] .
Zhai and Shah [6] proposed a temporal attention model to use the interest point correspondences and the geometric transformations between images. The projection errors of the interest points, defined by the estimated homographies, are incorporated in the motion contrast computation.
Tapu and Zaharia [7] extended the temporal attention model. Different types of motion presented in the current scene are determined using a set of homographic transforms, estimated by recursively applying the Random Sample Consensus (RANSAC) algorithm, see [8 , 9] , on the interest correspondences.
In the previously developed methods, detecting the feature points is the first and most important step. Obviously, the performance of the temporal attention model is greatly influenced by the results of point correspondences [10] ). The Scale Invariant Feature Transform (SIFT) is used to find the interest points and compute the correspondences between the points in video frames; see also [9] for a description of SIFT.
However, it is well known that the interest point distribution generally represents a rich texture information area. If there is less texture in the potential object regions, then there are no feature points to be detected in these regions and thus the potential object regions cannot be detected. An example of interest points, detected by SIFT, is shown in Figure 2 . In this case, the region where the cat (running right to left) is located is the potential object region. As shown in Figure 2 , most of the detected interest points are located in the background.
PPT Slide
Lager Image
An example of detected interest points using Scale Invariant Feature Transform (SIFT). The top figure shows the original frame. The bottom figure shows the interest points detected by a SIFT detector.
In this stage, we extend the temporal attention model, proposed by Zhai and Shah [6] , to obtain dense point correspondences based on dense optical flow fields. The optical flow technique is the most widely used motion detection approach [9 , 11] . Optical flow at edge pixels is noisy if multiple motion layers exist in the scene.
Furthermore, in texture-less regions, dense optical flow may return error values [6] . To overcome this problem, we use a RANSAC algorithm on point correspondences to eliminate outliers.
As shown in Figure 3 , this stage consists of the following steps:
PPT Slide
Lager Image
Flowchart of the salient region detection based on visual attention.
Step 1: Dense point matching - First, the dense optical flow method TV-L 1 [12] is used on two consecutive frames to calculate the dense point correspondences at 10-pixel intervals. Since the moving objects always appear on the lower part of the input frames, the dense optical flow method is applied only to the bottom two thirds of the input frames. This reduces the computation time. Furthermore, to avoid the effect of noise, we exclude a 10-pixel-wide border around every frame.
Step 2: Background / Camera motion estimation - Obviously, most of the points detected at Step 1 are located in the background. The subset of m background points can be determined with the epipolar geometry constraints.
We use the multi-view epipolar constraint which requires the background points to lie on the corresponding epipolar lines in subsequent images.
If the points are far away from the corresponding epipolar lines, then we can determine them as being foreground points.
For the spatial point correspondences detected at at Step 1, we apply a RANSAC algorithm respectively to determine the fundamental matrix F .
Step 3: Different types of motion recognition - In practice, multiple motions are present that result from the moving objects, but also from background objects or camera movement.
In this case we determine a new subset of points formed by all the outliers and all the points not considered in previous step.
For the current subset, we apply a RANSAC algorithm recursively to determine multiple homographies until all the points belong to a motion class. The estimated homographies model different planar transformations in the scene.
Every estimated homography Hm has a set of points
PPT Slide
Lager Image
as its inliers, and nm is the number of inliers for Hm .
For every homography Hm , its inliers in Lm can be considered as being located in the same plane. However, the points may belong to a distributed region.
To avoid the problem, we use the K -means clustering algorithm to divide Lm into K subsets,
PPT Slide
Lager Image
, for i = {1,..., K }. The spanning region of Lm,i is denoted by Rm,i , which corresponds to a moving region.
Step 4: Saliency computing - For all the moving regions determined at Step 3, we compute now their projection errors as saliency value.
The temporal saliency value of the moving region Rm,i is defined by
PPT Slide
Lager Image
PPT Slide
Lager Image
PPT Slide
Lager Image
where M is the total number of homographies in the scene,
PPT Slide
Lager Image
is the projection of
PPT Slide
Lager Image
computed after applying Hj ,
PPT Slide
Lager Image
is the correspondence of
PPT Slide
Lager Image
found by TV-L 1 , and αj,i is the spanning area of the subset Lj,i .
An example of salient region detection based on visual attention is demonstrated in Figure 4 , where apparently the attention region in the sequences corresponds to the running cat.
PPT Slide
Lager Image
An example of salient region detection based on visual attention. The top left figure shows the original frame, the top middle figure shows the dense point correspondences, and the top right figure shows the background points. The bottom left figure shows clustering results for the inliers of H1, the bottom middle figure shows the salient map, and the bottom right figure shows the salient region.
4. Unusual Motion Detection within the Detected Salient Regions
In the first stage, detecting the salient regions to define the unusual motion search area can reduce the computation time. In this stage, we detect the unusual motion within the detected salient regions.
We analyze the unusualness for every pixel from the following three factors: the speed, the motion direction, and the distance. As shown in Figure 5 , this stage consists of the following steps:
PPT Slide
Lager Image
Flowchart of unusual motion detection within detected salient regions.
Step 1: The speed factor analysis - The optical flow technique can accurately detect motion in the direction of intensity gradient and is the most widely used motion detection approach [11] . The TV-L 1 method is one of the best optical flow methods proposed in recent years [13] . The optical flow features in each detected salient region are obtained using the TV-L 1 method [12] .
Intuitively, the speed is a determining factor in judging the unusualness of a pixel. Let ( vx, vy ) be the motion vector at a pixel location ( x, y ). Therefore, the unusualness value Us ( x, y ) at pixel location ( x, y ) can be defined as follows:
PPT Slide
Lager Image
where Dmin and Dmax are the maximum and minimum value of magnitude, respectively.
Step 2: The motion distance factor analysis - Since only motion within some distance range is interesting for the driver, we enhance effects of motions within an assumed distance range, and decrease the impact of motions outside of this distance range. We use the general weighted operator of [14] to calculate the weight value of the motion distance factor wd ( αd, d ) as follows:
PPT Slide
Lager Image
where d is the spatial distance between the pixel location ( x, y ) and ( w / 2 , h ), which is normalized to [0, 1]. Here, ed is the threshold of the motion distance, n is − ln 2/(ln(1 − αd ) − ln 2) with αd in the range of [0, 1], and αd controls the strength of motion distance weighting. Larger values of αd increase the effect of motion distance weighting so that the closer motion would contribute more to the unusualness of the current pixel.
Step 3: The motion direction factor analysis - The motion direction is another factor in considering the unusualness for every pixel. Figure 6 illustrates the motion direction for every pixel within the salient regions.
PPT Slide
Lager Image
The motion direction for a pixel.
As shown in Figure 6 , the host vehicle is in the bottom middle of a frame. Intuitively, for the left-half region, we should just consider those pixels with the motion directions in [− π , − π /2] or [ π /2, π ]. Similarly, for the right-half region, we should just cope with the pixels with motion directions in [− π /2, π /2].
In practice, to deal with the width of the host vehicle car, we will adjust the right and left region. The weight value wa ( αa , a ) of the motion direction factor is defined as follows:
PPT Slide
Lager Image
where a is the motion direction at the pixel location ( x, y ), wcar is the width of the host vehicle, and αa controls the strength of motion direction weighting. Larger values of αa increase the effect of motion direction weighting.
Step 4: The unusualness estimation - Based on all the factors determined in the steps before, we compute the unusualness value U ( x, y ) at pixel location ( x, y ) as follows:
PPT Slide
Lager Image
5. Experimental Results
To evaluate the performance of the proposed unusual-motiondetection model, we conducted experiments on different kinds of video. A few detailed results are shown in Figure 7 . The following information is presented: the representative frames of the testing videos ( Figure 7a ), the temporal saliency maps of the representative frames ( Figure 7b ), the detected salient regions ( Figure 7c ), the unusualness maps of the detected salient regions ( Figure 7d ), and the regions that correspond to potentially unusual motions ( Figure 7e ).
PPT Slide
Lager Image
Unusual motion detection results for three different videos. Row (a) shows the original frames; row (b) shows the temporal saliency maps; row (c) shows the detected salient regions; row (d) shows the unusualness maps; and row (e) shows the regions that correspond to potentially unusual motions in the selected video (e.g. in the left column, correctly a bicyclist in the left region, and, incorrectly, motion on the ground due the the moving shadow caused by the car on the right).
6. Conclusions
In this paper, we have developed a model for detecting unusual motion of nearby moving regions. The model can estimate the unusualness for an output of warning messages to the driver to avoid vehicle collisions. To develop this model, two stages, salient region detection and unusual motion detection, were implemented.
Based on spatiotemporal analysis, an improved temporal attention model was presented to detect salient regions. Three factors, the speed, the motion direction, and the distance, were considered to detect the unusual motion within the detected salient regions. Experimental results show that the proposed real-time unusual-motion-detection model can effectively and efficiently detect unusually moving regions.
In our future work, we plan to extend the proposed method by taking into account not merely successive frames, but also some accumulated content (e.g. about the traffic context) of a video in order to increase the robustness of the algorithm and to incorporate an object tracking method.
Conflict of Interest No potential conflict of interest relevant to this article was reported.
Acknowledgements
The first author thanks China Scholarship Council (CSC) for supporting her stay at The University of Auckland.
BIO
Li-hua Fu is an associate professor at the College of Computer Science, Beijing University of Technology. She received the Ph.D. degree in computer software and theory from Northwestern Polytechnical University, China. Currently, her main research interests include computer vision, image processing, and image understanding.
Wei-dong Wu is a Master student at the College of Computer Science, Beijing University of Technology. His research is in the fields of visual attention detection and pattern recognition.
Yu Zhang is an undergraduate student at the College of Computer Science, Beijing University of Technology. His research activities are related to studying optical flow.
Reinhard Klette is a Fellow of the Royal Society of New Zealand and a professor at Auckland University of Technology. He (co-)authored more than 400 publications in peer-reviewed journals or conferences, and books on computer vision, image processing, geometric algorithms, and panoramic imaging. He was an associate editor of IEEE PAMI (2001-2008).
References
Fang C. Y. , Chen C. P. , Chen S. E. 2009 “Critical motion detection of nearby moving vehicles in a visionbased driver-assistance system,” IEEE Transactions on Intelligent Transportation Systems 10 (1) 70 - 82    DOI : 10.1109/TITS.2008.2011694
Franke U. , Pfeiffer D. , Rabe C. , Knoeppel C. , Enzweiler M. , Stein F. , Herrtwich R. G. 2013 “Making bertha see,” Proceedings of 2013 IEEE International Conference on Computer Vision Workshops (ICCVW) Sydney 214 - 221    DOI : 10.1109/ICCVW.2013.36
Danescu R. , Oniga F. , Nedevschi S. 2011 “Modeling and tracking the driving environment with a particle-based occupancy grid,” IEEE Transactions on Intelligent Transportation Systems 12 (4) 1331 - 1342    DOI : 10.1109/TITS.2011.2158097
Itti L. , Koch C. 2001 “Computational modelling of visual attention,” Nature Reviews Neuroscience 2 (3) 194 - 203    DOI : 10.1038/35058500
Rudoy D. , Goldman D. B. , Shechtman E. , Zelnik-Manor L. 2013 “Learning video saliency from human gaze using candidate selection,” Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Portland, OR 1147 - 1154    DOI : 10.1109/CVPR.2013.152
Zhai Y. , Shah M. 2006 “Visual attention detection in video sequences using spatiotemporal cues,” Proceedings of the 14th Annual ACM International Conference on Multimedia Santa Barbara, CA 815 - 824    DOI : 10.1145/1180639.1180824
Tapu R. , Zaharia T. 2013 “Salient object detection based on spatiotemporal attention models,” Proceedings of 2013 IEEE International Conference on Consumer Electronics (ICCE) Las Vegas, NV 39 - 42    DOI : 10.1109/ICCE.2013.6486786
Lee J. J. , Kim G. 2007 “Robust estimation of camera homography using fuzzy RANSAC,” Proceedings of International Conference on Computational Science and Its Applications (ICCSA) Kuala Lumpur, Malaysia 992 - 1002    DOI : 10.1007/978-3-540-74472-6_81
Klette R. 2014 Concise Computer Vision: An Introduction into Theory and Algorithms Springer London
Liu D. , Shyu M. L. 2012 “Effective moving object detection and retrieval via integrating spatial-temporal multimedia information,” Proceedings of 2012 IEEE International Symposium on Multimedia (ISM) Irvine, CA 364 - 371    DOI : 10.1109/ISM.2012.74
Zhong S. H. , Liu Y. , Ren F. , Zhang J. , Ren T. 2013 “Video saliency detection via dynamic consistent spatio-temporal attention modelling,” Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI) Bellevue, WA 1063 - 1069
Wedel A. , Pock T. , Zach C. , Bischof H. , Cremers D. 2009 “An improved algorithm for TV-L1 optical flow,” Springer-Verlag Berlin Statistical and Geometrical Approaches to Visual Motion Analysis 23 - 45    DOI : 10.1007/978-3-642-03061-1_2
Baker S. , Scharstein D. , Lewis J. P. , Roth S. , Black M. J. , Szeliski R. 2011 “A database and evaluation methodology for optical flow,” International Journal of Computer Vision 92 (1) 1 - 31    DOI : 10.1007/s11263-010-0390-2
Fu L. , Wang D. , Kuang J. 2013 “Parametric analysis of flexible logic control model,” Discrete Dynamics in Nature and Society article id. 610186 2013    DOI : 10.1155/2013/610186