Advanced
Anomalous Event Detection in Traffic Video Based on Sequential Temporal Patterns of Spatial Interval Events
Anomalous Event Detection in Traffic Video Based on Sequential Temporal Patterns of Spatial Interval Events
KSII Transactions on Internet and Information Systems (TIIS). 2015. Jan, 9(1): 169-189
Copyright © 2015, Korean Society For Internet Information
  • Received : July 28, 2014
  • Accepted : November 20, 2014
  • Published : January 31, 2015
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
P.M Ashok Kumar
Department of Electronics Engineering & AU-KBC Research Centre, MIT Campus. Anna University, Chennai-600 044, India.
V Vaidehi.
Department of Electronics Engineering & AU-KBC Research Centre, MIT Campus. Anna University, Chennai-600 044, India.

Abstract
Detection of anomalous events from video streams is a challenging problem in many video surveillance applications. One such application that has received significant attention from the computer vision community is traffic video surveillance. In this paper, a Lossy Count based Sequential Temporal Pattern mining approach (LC-STP) is proposed for detecting spatio-temporal abnormal events (such as a traffic violation at junction) from sequences of video streams. The proposed approach relies mainly on spatial abstractions of each object, mining frequent temporal patterns in a sequence of video frames to form a regular temporal pattern. In order to detect each object in every frame, the input video is first pre-processed by applying Gaussian Mixture Models. After the detection of foreground objects, the tracking is carried out using block motion estimation by the three-step search method. The primitive events of the object are represented by assigning spatial and temporal symbols corresponding to their location and time information. These primitive events are analyzed to form a temporal pattern in a sequence of video frames, representing temporal relation between various object’s primitive events. This is repeated for each window of sequences, and the support for temporal sequence is obtained based on LC-STP to discover regular patterns of normal events. Events deviating from these patterns are identified as anomalies. Unlike the traditional frequent item set mining methods, the proposed method generates maximal frequent patterns without candidate generation. Furthermore, experimental results show that the proposed method performs well and can detect video anomalies in real traffic video data.
Keywords
1. Introduction
Recently, there has been an increasing demand for surveillance cameras in public places for addressing security, safety and monitoring issues. Thus, there is a need for automatic detection of anomalous or abnormal events in surveillance videos. However, robust and accurate detection of abnormal events in a video still remains a difficult problem for the majority of computer vision applications due to its size, unrestricted flow of video streams and complex composite events involving multiple objects interactions, e.g., jay-walking (people cross the road while vehicles pass by) and vehicle overtaking [2 , 3 , 8 , 11 , 16 , 25 , 32] .
Most of the works in this field are based on modeling the statistical features of the background, the appearance of the foreground objects (person, car, bicycle, etc.,) and foreground dynamics (such as location and motion at different times). These object features are used in characterizing video events. However, these approaches [2 , 8] ignore important spatial and temporal contextual information.
Instead of relying solely on trajectories, the primitive events of the object are defined as the basic units (along with spatial and temporal symbols) for describing more complicated activities and interactions. In traffic applications, these multiple primitive events usually persist for an interval of time, and the main goal is to detect anomalous events with complex temporal relationship between the primitive events. First, the temporal patterns involving multiple spatial primitive events with high frequency are calculated. These regular temporal patterns describe various scenarios in traffic video sequences such as when a car stops to wait for a pedestrian passing by, when cars move at the same time in the same lane or different lanes, etc.
To the best of our knowledge, there have been no efficient methods developed for mining frequent time interval-based patterns (referring as temporal patterns) from sequences of video data. Traditional sequential mining approaches cannot be applied for detecting anomalous events in surveillance videos as they can handle only instantaneous events and static databases. Moreover, they are not capable of detecting anomalous events based on event intervals and stream of video data.
This paper proposes the Lossy Count based approach for Sequential Temporal Patterns (LC-STP), in which primitive events of each object are analyzed to form a regular expression. Regular expression represents temporal relation between various objects’ primitive events in a sequence of frames. The support for this temporal sequence is calculated for finding regular patterns of normal events. Events that deviate from these patterns are identified as anomalies.
The proposed approach is applied on real traffic videos, where vehicles have been detected and tracked. The task is to discover anomalous events from a collection of movement trajectories of vehicles. The results show that the proposed approach can automatically infer regular patterns of traffic motion in the training phase and detect spatio-temporal anomalous events due to multiple objects over different regions in the testing phase.
The main contribution of this work includes:
  • (a) A formal definition of the problem of mining frequent temporal arrangements of primitive event intervals in a sequence of frames,
  • (b) A novel notation for representing timing information associated with the spatial primitive events,
  • (c) An efficient algorithm for mining maximal frequent temporal patterns in a sequence of frames, and
  • (d) An extensive experimental evaluation of these techniques using real traffic datasets.
The rest of this paper is organized as follows: Section 2 describes the related works. The problem definition and formulation, notations used in this paper are defined in Section 3. Section 4 describes the proposed work, which includes an algorithm for finding both regular and irregular temporal patterns between various objects in a video. In Section 5, we demonstrate the effectiveness of the proposed algorithm through a series of experiments. In Section 6, the conclusions and further research are discussed.
2. Related Works
The abnormal event detection in a video sequence has been an active area. Some of the works as in Dong et al. [3] use directional motion behavior descriptor to compare with normal behavior descriptor. In addition to motion, the foreground region area, shape factors and pixel velocity vector are used as features [11] for simple classifier to determine the objects’ normal or abnormal states. In the work of Chen et al. [2] , a support vector machine (SVM) was selected as a classifier for features such as incident point velocity, downstream and upstream velocities, incident point occupancy rate, and upstream and downstream occupancy rates. Instead of motion features, the entire trajectory information was used for SVM [25] . In some works [32] , Kalman filter and motion models were used for tracking, and any deviation in this resulted in an anomaly.
Most of the works are based on trajectory information [4 , 22 , 31 , 38] and their corresponding modeling techniques used are hidden Markov model (HMM) [4 , 31] , coupled HMM [22] , and 3-D graphs [38] . While in Gilbert et al. [5] , 2D corners were grouped spatially and temporally using a hierarchical process to learn descriptive and distinctive features for action recognition.
Some works reveal spatiotemporal dependencies of moving agents in complex dynamic scenes, such as the right way between different lanes or typical traffic light sequences, using the Markov random field model, the topic model, and the dependent Dirichlet processes [12 , 13] . Wang et al. [34] have used hierarchical Bayesian models to connect three elements in visual surveillance: low-level visual features, simple atomic activities, and interactions. Thus, a summary of typical atomic activities and interactions occurring in the scene was provided, and video anomalies were detected at different levels.
Though there are several schemes for detecting anomalies in the video, there is a need to address the primitive event intervals based anomalies. Thus, this paper proposes a representation of video frames in terms of primitive events. In order to detect anomalous video events based on temporal context, frequent temporal pattern mining based approach is performed on a stream of video sequences.
3. Problem Definition
In general, traffic video sequences contain time interval based spatial events, which exist over a sequence of continuous frames. Each sequence id is represented as the set of consecutive transactions, i.e., frames in our case. Each frame is represented as a single transaction, and each transaction will have a set of primitive events. Since the time of conclusion of the primitive events is not known, direct application of traditional sequential mining techniques becomes difficult. Thus, we have introduced novel notations for preserving the timing information associated with each spatial primitive event. After summarizing, these data are grouped into sequences. From such a list of sequences, the application of proposed LC-STP algorithm leads to the discovery of maximal patterns. The following definitions formally describe this mining problem.
Definition 3.1 Spatial Primitive Event:
Each event corresponds to the individual object’s id, abstract location and its time point with respect to each frame. Thus, the event e is defined by three related attributes: its abstract spatial location, the object id number and its time point. Accordingly, the following notations are defined for a primitive event e:
  • (1) e.loc refers to the abstract spatial location.
  • (2) e.id refers to the object id number in e.loc attribute.
  • (3) e.time refers to the notation of the object’s temporal state, i.e., start or end or continue time state as +, -, * respectively.
The temporal state and object id information are represented as superscript and subscript to the spatial location respectively.
Example 1. Assume that in a particular frame of a video sequence, there are three moving objects: O 1 , O 2 , O 3 . O 1 just started in location A, O 2 continued the movement from earlier frames in location B and O 3 finished its movement in location C. Based on the definition 3.1, the event representation for those objects O 1 , O 2 , O 3 will be in
PPT Slide
Lager Image
respectively.
Definition 3.2 Transaction:
Consider each frame as a single transaction. A transaction is a set of primitive events such as T i = 1 ,e 2 ,..e n > , where e n is a spatial primitive event with respect to each object ‘n’ for that frame ‘i’.
Example 2. Consider the previous example 1. If all the events corresponding to different objects occur in a particular frame, based on definition 3.2, then transaction T is represented as <
PPT Slide
Lager Image
>.
Definition 3.3 Sequence of Transactions:
A sequence of transactions, seq-id is represented by a sequence of frames seq-id = 1 ,T 2 ,..T n > , where ts(T i ) < ts(T j ) for i < j. The ts(T i ) means the time-stamp at which T i has been issued.
Example 3. Consider the sequence of frames in which Object O 1 in location A starts at frame 1 and finishes at frame 3 at location B, object O 2 starts at frame 2 and finishes at location B of frame 3, and object O 3 starts at frame 2, location C, and still continues at frame 3 at location B. Based on the definitions 3.1, 3.2, 3.3, Seq - id 1 is represented as
PPT Slide
Lager Image
.
Definition 3.4 Event interval temporal relation:
Allen’s temporal logic describes the 13 possible relations for any pair of state intervals. Out of 13 relations, seven relations are: before, meets, overlaps, is-finished-by, contains, starts, equals and other six relations are simply their inverses. Allen’s relations have been used by most works on mining time interval data [7 , 22 , 24 , 29 , 35] . The use of all the Allen’s relations is not appropriate in our case of traffic videos. All the moving objects are dependent only on the traffic control signal and they are mutually independent. Thus, the two relations “finished-by (E 1 ,E 2 ) ” and “starts (E 1 ,E 2 ) ” are not considered for our traffic video sequences. Therefore, in this paper only five temporal relations (as shown in Table 1 ):
Before (E 1 ,E 2 ) , Meets (E 1 ,E 2 ) , Overlaps (E 1 ,E 2 ) , Contains (E 1 ,E 2 ) , Equals (E 1 ,E 2 ) are used and defined as follows:
Given two event intervals E 1 and E 2 :
  • (1) E1Before E2if, which is the same as Allen’s before relation and is denoted as E1→ E2.
  • (2) E1Meets E2, ifi.e., E1starts before E2and end time of E1equals to start time of E2. It is denoted as E1E2.
  • (3) E1Overlaps E2, ifi.e., E1starts before E2and end time of E1is in between start time and end times of E2. It is denoted as E1> E2.
  • (4) E1Contains E2, ifi.e., E1starts before E2. Start time and end times of E2is in between start time and end time of E1. It is denoted as E1/E2.
  • (5) E1Equals E2, ifi.e., Start time and end times of E2is equal to start time and end time of E1. It is denoted as E1= E2.
Temporal Pattern Expression Representation and Notation
PPT Slide
Lager Image
Temporal Pattern Expression Representation and Notation
Definition 3.5 Temporal Patterns:
In order to obtain temporal descriptions of the data, we combine basic spatial events along with temporal relations to form temporal patterns. There are two kinds of temporal patterns that exist in a traffic video. The first is called an ‘‘intra-event pattern”, which represents the frequent temporal relation that exists within each spatial region. The other kind of multilabel temporal pattern is an ‘‘inter-event pattern”, which represents the frequent temporal relation that exists between each pair of spatial region. The regular temporal patterns are represented as per the notations defined in 3.4.
Definition 3.6 Temporal Relation b/w intra-primitive events:
The temporal relation between intra-events is defined as the temporal pattern that exists between each pair of spatial primitive events within the same region. The temporal relation is calculated for each pair of events based on definition 3.4. The intra-temporal relation for each sequence of frames is constructed with the help of { >, =, |, ͸} operators, and its count is maintained and is given as
TR i : {S 1 - (R 1 ),S 2 - (R 2 ),…, in i th sequence, for all possible set regions S. The same procedure is repeated for the next sequence of frames.
Example 4.
  • 1. TR1:{A -(>,→,=) } - semantic interpretation: The intra-temporal relations found in region A are {Overlaps, Before, Equals}.
  • 2. TR2:{B - (>,=) } - semantic interpretation: The intra-temporal relations found in region B are {Overlaps, Equals}.
Definition 3.7 Relation B/W inter-primitive events:
The temporal relation between inter-primitive events is defined as the temporal pattern that exists between each pair of spatial primitive events belonging to different regions. The temporal relation is calculated for each pair of regions based on definition 3.4 and is given as {S 1 (R 1 )S 2 , S 1 (R 1 ) S 3 ....}, containing temporal operators R that exists in between regions.
Example 5.
  • 1. TR1:A -{>,=}-C semantic interpretation: The inter-temporal relation found between objects moving in regions A and C are { Equals, Overlaps }.
  • 2. TR2: A-{ >,=}-B , semantic interpretation: The inter-temporal relation found between objects, moving in regions A, B are { Equals, Overlaps }.
Based on the above definitions, the problem of mining frequent temporal arrangements can be formulated as shown below.
Given a stream of e-sequence of window size w and a support threshold min-sup, the task is to find frequent intra-event and inter-event patterns of the form
F intra ={S 1 - (R 1 ),S 2 - (R 2 ),...}, for all possible set regions S containing temporal operators that exists within that region, and
F inter ={S 1 - (R 1 )-S 2 , S 1 - (R 1 )-S 3 ....}, for all possible combinations between the different set regions, where R i is a set containing temporal operators that exists between regions.
4. Detection of Abnormal Temporal pattern Framework
In this section, a Lossy Count based Sequential Temporal Pattern mining approach (LC-STP) is proposed for detecting abnormal patterns in traffic scenes based on the spatiotemporal context. There are two phases in this method. In the training phase, a portion of normal surveillance video frames is selected as a training sample to generate frequently occurring temporal patterns (regular events) with the interval based spatial events. Here, the class labels are not assigned manually, and the regular temporal patterns are found out by the proposed LC-STP algorithm. In the testing phase, the incoming temporal patterns for each sequence of frames are compared with the stored regular temporal patterns. The proposed framework (as shown in Fig. 1 ) consists of object detection and tracking, data stream conversion, novel frequent temporal pattern mining and temporal pattern matching.
PPT Slide
Lager Image
Proposed framework for Abnormal Event Detection System.
- 4.1 Object detection and Tracking
In this paper, our emphasis is mainly on event modeling tasks. Thus, the state of art techniques such as Gaussian mixture models (GMM) [9] are used for object detection and morphological operations are applied for noise elimination to get the resultant blob. The main reason for choosing GMM is its ability to deal with lighting changes, repetitive motions of scene elements.
Object tracking is done in two steps (i.e., tracking by detection technique):
  • 1. Detecting moving objects in first frame, and
  • 2. Associating the detections corresponding to the same object over time.
Our simple tracking method works as follows:
  • 1. Input: The input video is converted into a sequence of frames (f1,f2,f3,...fn) .
  • 2. Initialization: Let f0be the first frame for which detection is available with GMM[9]. A new track set Tjofor each blob detection j is created.
  • 3. Iteration: Loop over frames i from f1to f2,
  • 4. Object detection and track assignment: Perform block motion estimation using the three step search algorithm[26]based on the bock matching criteria, i.e., mean square error (MSE).
PPT Slide
Lager Image
  • where, B is N1× N2block of pixels, and s(x,y,k) denotes a pixel location at (x,y) in frame k, 1 2 (d1,d2) is the displacement of the center pixel of the block.
  • 5. Delete lost tracks: Each track keeps count of the number of consecutive frames, where it remained unassigned. If the count exceeds a specified threshold, (i.e., object might have left out of that frame), then we delete those tracks.
  • 6. Create new tracks: New blobs are introduced by applying GMM[9]at every 10 frames. Noisy object detection is eliminated with the help of size and appearance.
  • 7. Output: In our case, object’s id, spatial location and name are extracted to form a formatted output for further processing in our proposed framework.
  • 8. New and old blobs are tracked by using steps 4–7.
- 4.2 Data Stream Conversion based on spatial atomic events
First, the static background frame of a traffic video is divided into segments based on lanes, junctions, and zebra crossings and are labeled as per alphabetical order (A, B, C, D, E…) as shown in Figs. 2 (a) & 2 (b). The primitive spatial events are formed for each object based on the objects-id, its abstract spatial location, and its timing information. These events are grouped for each frame as per definition 3.1 to form a transaction, T i = 1 ,e 2 ,..e n >, where e n is a spatial primitive event with respect to each object n for that frame i. A sequence of transaction is formed as per definition 3.2,
PPT Slide
Lager Image
Label assignment of video sequence
seq-id = 1 ,T 2 ,..T n >, for each set of ‘n’ consecutive frames. For each sequence, seq-id candidate temporal regular pattern is constructed. Frequent pattern mining is performed over a window of sequences to form regular temporal patterns.
Example 6: In Fig. 2 (a), if the object with id’s O 1 and O 2 is moving in a straight road from bottom left to top left in one sequence and other object with id’s O 3 & O 4 is moving from top right to bottom right in another sequence of frames, then sequence of data streams are created as shown in Table 2 .
Data Stream Conversion
PPT Slide
Lager Image
Data Stream Conversion
Example 6: In Fig. 2 (b), consider the scenario of person-id P 1 crossing the road through the regions B, C, D .While the object with id O 2 , stopped at junction A and object id O 3 , moving at region E. Then the sequence of data streams is created as shown in Table 3 .
Data Stream Conversion
PPT Slide
Lager Image
Data Stream Conversion
- 4.3 Frequent Temporal Pattern Mining
The streams of transaction (i.e., frames) with spatial primitive events are grouped to form sequences. The method of finding frequent temporal patterns in sequence of frames consists of two phases: temporal expression construction and frequent pattern generation and is presented in Table 4 .
A Lossy Counting based algorithm for Sequential Temporal Patterns (LC-STP)
PPT Slide
Lager Image
A Lossy Counting based algorithm for Sequential Temporal Patterns (LC-STP)
Using construct-temp-exp( F i ), Temporal expression is constructed for each set sequences (S 1 ,S 2 ,...S n ) to form candidate intra and inter-temporal patterns. The intratemporal patterns consist of a triplet of labeled region, the temporal relation in that region and its support count. The same is repeated for every sequence, and the support count is updated if the relation already exists. If the relation is a new one, we add the expression and maintain a count.
Similarly, in inter-temporal pattern expressions are formed for all possible combinations of the region. The expression consists of four items: a pair of labeled set regions, the temporal relation between them, and its support count. The same procedure of forming intertemporal patterns is performed for every sequence, and the support count is updated if that temporal relation already exists. If the relation is a new one, we add the relation in the expression and start maintaining the count.
Frequent temporal patterns are generated using generate-temp-pattern( T i ) for each set window of sequences (W 1 ,W 2 ,...W n ) . After each window sequence, infrequent patterns are removed with frequency less than min-support ‘s’ and error threshold 'ε' .This method of finding frequent temporal patterns is similar to the lossy count algorithm [20] and is presented in Table 4 . This approach eliminates the need for finding sub-sets or candidate generation. Thus, specialized data structures (trie, suffix tree, etc.,) are not needed for subsets or candidate generation.
In LC-STP, the user has to supply parameters such as support ‘s’, error 'ε' , bucket width ‘w’, and sequences length ‘l’. We maintain a data structure D that stores both inter- and intra-temporal patterns in that sequence, and the frequency count is maintained for the same sequence. In the initial stage, D is empty. The temporal pattern of the first sequence is added to the initial set. The incoming sequence frames are conceptually divided into buckets of width w and are labeled with the bucket id. Therefore, each bucket consists of a set of overlapping sequences. Each sequence (i.e., transactions sequence) is processed to form inter- and intra-temporal pattern expressions, and the frequency count of the temporal relation is updated if it is an old one or is created if it is a new one. This is performed over a set of overlapping buckets to find frequent temporal patterns, and infrequent ones are deleted if its frequency f < (s - ε) .
In the LC-STP algorithm (as shown in Table 4 ), lines 1–14 represents the main code. We call function Construct-Temp-Exp (f i ) for every sequence to construct inter- and intratemporal patterns and call Generate-Temp-Pattern (T i ) at the end of every sequence to calculate the count of the temporal patterns generated for that sequence. Infrequent temporal patterns are removed based on the frequency f < (s - ε) as shown in line 12. The threshold value is determined by the user based on the length of the training video and periodicity of the traffic rules.
The main idea is that the frequently generated patterns are always regular. Thus, infrequent ones will be the anomalous events or traffic rule breakers. In the training phase, videos with high frequency of regular traffic events are taken as input, and frequently occurring temporal patterns are found without manual labeling.
- 4.4 Pattern Matching
In the testing phase, object detection and tracking, data stream conversion and temporal expression construction tasks are performed to find both (intra & inter) temporal sequence patterns for each sequence (i.e., set of frames). These patterns are matched with the stored regular Intra-T i & Inter-T i patterns, to find the irregular patterns. If the temporal sequence pattern of each sequence is equal or a subset of the stored patterns, then it is considered as normal events. Otherwise, that sequence is considered as an abnormal event sequence.
The Brute Force algorithm [21] is used to compare the patterns to the frequently occurring patterns, i.e., match each relation by relation with stored Inter-D and Intra-D patterns.
Abnormal detection Algorithm for Sequential Temporal Patterns using simple pattern matching method
PPT Slide
Lager Image
Abnormal detection Algorithm for Sequential Temporal Patterns using simple pattern matching method
5. Results and Discussions
In this section, the experimental study and results are presented for the evaluation of the proposed approach. This approach for anomaly detection is applicable to many different scenarios. Most motions in this scenario are normal while only a few are outliers. The task is to automatically mine these regular temporal rules of normal motion from all the data and to detect any anomalous motions that broke the rules.
- 5.1 Traffic intersection scenario
In order to evaluate the proposed abnormal event detection framework, a busy traffic data set containing videos of one hour length (9,000 frames) and frame size of 360 * 288 is taken [39] as shown in Fig. 3 .
PPT Slide
Lager Image
Sequence showing Regular Traffic patterns
The data set contains all the regular and irregular traffic patterns. The proposed LC-STP algorithm (as shown in Table 4 ) is applied for determining regular patterns with 50% overlapping of bucket. We obtained 8,882 frequent temporal pattern and 118 irregular patterns (out of 9,000 frames).
This video monitors a four-road intersection. Each road consists of a two-way lane for moving objects in both directions. The entire moving traffic in this area is controlled by the traffic lights within the intersection. However, in the test video frame, only the top and right lanes are visible, but left and bottom lanes are not. Thus, the underlying rule of normal motion is the legal motion directed by the traffic lights. However, in the training phase, the goal is to discover the traffic spatial patterns with valid temporal patterns followed by most vehicles in this area and to detect anomalies within a lane and in-between lanes.
In the training phase, the proposed algorithm is applied in the offline mode to discover frequent temporal patterns. Since the motion of the object is small, only 1 frame for every 10 frames (key frame) is taken to improve the execution time. Then 10 frames are grouped into each sequence, which is then converted into temporal expression containing both intra and inter patterns. These temporal patterns are processed by the proposed algorithm (as shown in Table 4 ) to determine regular patterns. The results obtained are 8,882 frequent pattern sequences and 118 irregular patterns (out of total 9,000 frames).
Figs. 3 (a–d) represent the regular patterns present in the traffic video sample. It shows some of the temporal sequence patterns obtained through the application of our proposed LC-STP algorithm. We showed only six frames in each sequence for display purposes. Fig. 3 (a) shows the pattern of objects moving from the right lane to the top left lane, i.e., objects move through the E,D,C,A or E,G,C,A regions, when some objects stopped at the top right lane, i.e., objects region B (as per notations). The objects in other lanes are not visible (this will not affect the performance of our algorithm). Fig. 3 (b) shows the pattern of objects crossing from the top right lane to the left lane, i.e., through B,G,C regions; some crossed from the bottom left to the right lane, i.e., from region G to E, and some moved in the region A. Fig. 3 (c) shows some objects crossing the junction from left to right lane, i.e., through regions C,D,G,E; some crossed towards the bottom right lane, i.e., through regions C,D,G,H, and some moved in the region A. Fig. 3 (d) shows the pattern of moving objects from bottom left to top left lane, top right to bottom right lane, i.e., through regions F,C,A and regions B,E,H respectively. While some are stopped at region D from B and some objects are stopped at G.
Figs. 4 (a–c) represent abnormal patterns (anomalies) present in the traffic video sample. The results shown are abnormal events, which are obtained only when frequencies do not exceed the support and error thresholds. The displayed abnormal event in Fig. 4 (a) shows the object crossing the junction from left to right lane illegally (through regions C,D,E), while all the objects followed the regular pattern of moving upside and downside of the frame (through regions F,C,A and B,E,H). Fig. 4 (b) shows one such abnormal event of the objects, i.e., sudden crossing from region G to E or G to E,H, while all the other objects followed the regular pattern of moving upside and downside of the frame (through regions F,C,A and B,E,H). Fig. 4 (c) shows another unusual event of objects moving from region D to region C or region D to region C,A, while all other objects followed the regular pattern of moving upside and downside of the frame (through regions F,C,A and B,E,H).
PPT Slide
Lager Image
Example Output Sequence showing Irregular Traffic Pattern
The proposed LC-STP algorithm detected 103 out of 118 abnormal events in the video sequence, and 12 are detected falsely as anomalies, setting the detection rate and false alarm rate of 87.2% and 10.2% respectively. Generally, the value of error rate ϵ is one by tenth of support, s. These results are tabulated with different values of ϵ, s as shown in Table 6 .
Statistical results of the proposed work
PPT Slide
Lager Image
Statistical results of the proposed work
- 5.2 Central Pedestrian crossing sequence:
In addition to the above scenario, the proposed work is applied on pedestrian crossing sequence videos obtained from datasets [39] to detect anomalies such as pedestrian crossing the road illegally when the vehicles are moving. Figs. 5 (a–d) represent regular patterns present in the pedestrian crossing sequence. This video contains pedestrians crossing the road through the regions B,C,D and D,C,B, (region C is zebra crossing) and vehicles moving through regions A,C,E (one way direction). Normal pattern followed in this video is that the pedestrians cross the road when the vehicles are not in the road or do not move through zebra crossing.
PPT Slide
Lager Image
Detected Pedestrian Crossing Sequence showing Regular patterns
This video data set consists of 900 frames. Five frames are considered for each sequence, and the proposed LC-STP algorithm are applied on each sequence. Some of the results are shown in Figs. 5 (a–d), which depict normal regular temporal patterns. Fig. 5 (a) shows the pattern of pedestrians moving through the zebra crossing (through region C from either B or D to D or B). Fig. 5 (b) shows the pattern of pedestrians crossing the road (region C) while the vehicles are moving in the region A or E.
Fig. 5 (c) shows the regular pattern of objects moving through A,C,E regions. Fig. 5 (d) shows the pattern of pedestrians crossing through regions B,C,D or D,C,B. While some object stopped at region A, others moved at region E.
For this video data set, 11 sequences are obtained that consist of abnormal events. One of them is shown in Fig. 6 , which depicts pedestrians crossing the street while the vehicle is still in the zebra crossing area (i.e., region C); the detected vehicle is shown in red box. The proposed algorithm detected 9 out of 11 abnormal events in the video sequence, and two are detected falsely as anomalies, setting the detection rate and false alarm of 81.8% and 18% respectively. These results are tabulated with different values of ε, s as shown in Table 6 .
PPT Slide
Lager Image
People in zebra crossing, while the vehicles moving across the zebra crossing
- 5.2 Comparison
For the comparison of the proposed LC-STP based system on anomaly detection, all the normal trajectories are extracted and clustered based on their positions. Spectral clustering is performed for all the vehicle trajectories in the traffic video sequence based on the dynamic Bayesian network [10] . The detected outliers are treated as anomalies. However, this approach fails to detect anomalies due to multiple objects as the proposed LC-STP algorithm did. For the detection of anomaly, some of the sequences are selected as abnormal, and other activities are treated as normal, for the training phase. Cross-validation is used to assess the performance of anomaly detection. Fig. 7 (a) shows the ROC curves of trajectory clustering and the proposed method (LC-STP) for different values of ϵ, s in traffic video sequences.
PPT Slide
Lager Image
Performance Comparison using ROC Curves
HMM [37] is used to model the pedestrian crossing sequence. It has three states, and each state represents an event. The Gaussian mixture distribution (GM) is used to describe the state-conditional probability distributions, and the EM algorithm is used to train probability parameters. Fig. 7 (b) shows the ROC curves of HMM [37] and the proposed method for different values of ϵ, s in pedestrian crossing sequences.
6. Conclusion
This paper has proposed a new approach based on frequent temporal pattern mining of interval based spatial events. The main contributions of this paper include a novel representation of spatial primitives with timing information and lossy count based sequential temporal pattern (LC-STP) approach for finding frequent temporal patterns. The proposed approach is analyzed to detect both intra- and inter-temporal spatial events. It detects anomalous events in both traffic video sequences and pedestrian crossing sequences with a high detection rate and low false alarm rate.
Experimental results show that the proposed approach is effective for abnormality detection and can be applied to many traffic video scenes, provided we should have prior knowledge of the spatial regions or lanes. However, the limitation of the current work is its dependence on the outcomes of object detection and tracking methods in videos. In future, this work will be extended to include low-level pixel features and to find semantic rules in the video to enhance the overall performance of the system.
BIO
Mr.P.M.Ashok Kumar obtained his B.Tech in ECE from JNTU Hyderabad and M.E Computer Science Engg. from Anna University, Chennai, India. Currently he is doing Ph.d in the Department of Electronics Engineering Anna University, India. He authored more than 10 articles in reputed journals, conferences. His main research interests include Image processing, Data mining, Machine learning. He is currently working on Anomaly detection, Video Traffic surveillance, Texture feature extraction, Object tracking features.
Dr. V. Vaidehi has done her BE in ECE from College of Engineering, Guindy, University of Madras, ME Applied Electronics from MIT and Ph.D Electronics Engineering from MIT. She has joined Madras Institute of Technology in 1982 after serving as Scientific Assistant in I.I.Sc, Bangalore.
She has served as Head-Computer Centre, Member Board of Studies, Member - Academic Council, Head- Electronics Engineering, Head- Computer Technology, and Head- Information Technology. Currently, she is the Professor of Electronics Department, Director of AU-KBC Research Center and Chairman of Faculty of Information and Communication Engineering.
She has executed several funded research projects; she published several research papers in reputed international journals and conferences. She has received several awards. Her areas of interest are Networks, Data Mining, and Image processing.
References
Benezeth. Y , Jodoin P.M. , Saligrama V “Abnormal events detection based on spatio-temporal co-occurrences,” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition 2009 2458 - 2465
Chen L , Cao Yuan , Ji Ronghua “Automatic Incident Detection Algorithm Based on Support Vector Machine,” in Proc. of Sixth IEEE International Conference on Natural Computation, vol. 2 2010 864 - 866
Dong N , Jia Zhen , Shao Jie “Traffic Abnormality Detection through Directional MotionBehaviour Map,” in Proc. of Seventh IEEE International Conference on Advanced Video and Signal Based Surveillance 2010 80 - 84
Galata A , Johnson N , Hogg D.C. 2001 “Learning variable-length Markov Models of behaviour,” Computer Vision and Image Understanding 81 (3) 398 - 413    DOI : 10.1006/cviu.2000.0894
Gilbert. A , Illingworth. J. , Bowden R 2011 “Action Recognition Using Mined Hierarchical Compound Features,” inIEEE Transactions on Pattern Analysis and Machine Intelligence 33 (5) 883 - 897    DOI : 10.1109/TPAMI.2010.144
Harwood D , Haritaoglu I , Davis Larry S. 2000 “W4: Real-time surveillance of people and their activities,” IEEE Transactions on Pattern Analysis and Machine Intelligence 22 809 - 830    DOI : 10.1109/34.868683
Hoppner F. , Ph.D. thesis 2003 “Knowledge discovery from sequential data,” Technical University Braunschweig Germany Ph.D. thesis
Hsu W L , Tsai Chang-Lung , Chang Po-Lun “Automatic Traffic Monitoring Method Based on Cellular Model,” in Proc. of Fifth IEEE International Conference on Intelligent Information Hiding and Multimedia Signal Processing 2009 640 - 643
Hu W , Wang L , Tan T 2003 “Recent developments in human motion analysis and Pattern Recognition,” inElsevier Journal 36 585 - 601
Jung C.R. , Hennemann L. , Musse S.R. 2008 “Event detection using trajectory clustering and 4-d histograms,” IEEE Transactions on Circuits and Systems for Video Technology 18 (11) 1565 - 1575    DOI : 10.1109/TCSVT.2008.2005600
Kamijo S , Harada M , Sakauchi M 2004 “Incident Detection based on Semantic Hierarchy composed of the Spatio-Temporal MRF model and Statistical Reasoning,” IEEE International Conference on Man and Cybernetics 1 415 - 421
Ki Y , Lee D 2007 “A traffic accident recording and reporting model at Intersections,” IEEE Transactions on Intelligent TransportationSystem 8 (2) 188 - 196    DOI : 10.1109/TITS.2006.890070
Kim. J. , Grauman K. “Observe locally, infer globally: A space-time MRF for detecting abnormal activities with incremental updates,” in Proc. of Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2009 2921 - 2928
Kuettel. D , Breitenstein M.D. , VanGool L. “What’s going on? Discovering spatio-temporal dependencies in dynamic scenes,” in Proc. of Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2010
Leibe. B , Schindler K. , Cornelis N. 2008 “Coupled Object Detection and Tracking from Static Cameras and Moving Vehicles,” IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (10) 1683 - 1698    DOI : 10.1109/TPAMI.2008.170
Lili C , Li Kehuang , Chen Jiapin 2011 “Abnormal Event Detection in Traffic Video Surveillance Based on Local Features,” IEEE Transactions on Image and Signal Processing 1 362 - 366
Lin C. , Tai J. , Song K 2003 “Traffic monitoring based on real-time image tracking,” inProceedings of the IEEE International Conference on Robotics & Automation 2 2091 - 2096
Loy C.C , Xiang Tao , Gong Shaogang “Stream-based active unusual event detection,” in Proc. of proceedings of Springer on Computer Vision–ACCV 2010 2011 161 - 175
Loy. C.C , Xiang T. , Gong S. “From Local Temporal Correlation to Global Anomaly Detection,” in Proc. of ECCV, International Workshop on Machine Learning for Vision-based Motion Analysis 2008
Manku G.S. , Motwani Rajeev “Approximate Frequency Counts over Data Streams,” in Proc. of the 28th VLDB Conference Hong Kong, China 2002
Moerchen F. “Algorithms for time series knowledge mining,” in Proc. of the international conference on Knowledge Discovery and Data mining (SIGKDD) 2006 668 - 673
Moskovitch R. , Shahar Y. “Medical temporal-knowledge discovery via temporal abstraction,” in Proc. of the American Medical Informatics Association (AMIA) 2009
Oliver. N.M , Rosario B. , Pentland A.P. 2000 “A Bayesian computer vision system for modeling human interactions,” IEEE Transactions on Pattern Analysis and Machine Intelligence 22 831 - 843    DOI : 10.1109/34.868684
Papapetrou P. , Kollios G. , Sclaroff S. “Discovering frequent arrangements of temporal intervals,” in Proc. of the International Conference on Data Mining (ICDM) 2005
Piciarelli C. , Micheloni C. , Foresti G.L 2008 “Trajectory-Based Anomalous Event Detection,” IEEE Transactions on Circuits and Systems for Video Technology 18 (11) 1544 - 1554    DOI : 10.1109/TCSVT.2008.2005599
Li Renxiang , Zeng Bing , Liou Ming L. 1994 “A new Three- step Search Algorithm for Block Motion Estimation,” IEEE Trans, Circuits and Systems For Video Technology 4 (4) 438 - 442    DOI : 10.1109/76.313138
Sacchi L , Larizza Cristiana , Combi Carlo 2007 “Data mining with Temporal Abstractions: learning rules from time series,” Data Mining and Knowledge Discovery
Schindler. K. , Schindler K. , Cornelis N. 2008 “Coupled Object Detection and Tracking from Static Cameras and Moving Vehicles,” IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (10) 1683 - 1698    DOI : 10.1109/TPAMI.2008.170
Shan kam P , Chee fu A. W. “Discovering temporal patterns for interval-based events,” in Proc. of the International Conference on Data Warehousing and Knowledge Discovery (DaWaK) 2000
Stauffer. C , Grimson W. “Adaptive background mixture models for real-time tracking,” in Proc. of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 1999 246 - 252
Swears E. , Hoogs A. , Perera A.G.A “Learning motion patterns in surveillance video using HMM clustering,” in Proc. of IEEE Visual Motion Computing (Motion) Workshop Copper Mountain/Colarado 2008 1 - 8
Veeraraghavan H. , Schrater. P. , Papanikolopoulos N “Switching Kalman Filter-Based Approach for Tracking and Event Detection at Traffic Intersections,” in Proc. of IEEE Conference on Control and Automation 2005 1167 - 1172
Villafane R , Hua Kien A. , Tran Duc 2000 “Knowledge discovery from series of interval events,” Journal of Intelligent Information Systems 15 71 - 89    DOI : 10.1023/A:1008781812242
Wang. X , Ma Xa , Grimson W.E.L. 2009 “Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models,” IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (3) 539 - 555    DOI : 10.1109/TPAMI.2008.87
Winarko E , Roddick J. F. 2007 “Armada - an algorithm for discovering richer relative temporal association rules from interval-based data,” Data and Knowledge Engineering 63 76 - 90    DOI : 10.1016/j.datak.2006.10.009
Wu S.Y. , Chen Y.L. 2007 “Mining non-ambiguous temporal patterns for interval-based events,” IEEE Transactions on Knowledge and Data Engineering 19 742 - 758    DOI : 10.1109/TKDE.2007.190613
Li Xiaokun , Porikli F.M. “A hidden Markov model framework for traffic event detection using video features,” in Proc. of IEEE Conference on Image Processing 2004 1902 - 1907
Yao B , Wang L , Zhu S. “Learning a scene contextual model for tracking and abnormality detection,” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition Workshops 2008 1 - 8
Zou Y , Shi Guangyi , Shi Hang “Image sequences based traffic incident detection for signalled intersections using HMM,” in Proc. of international Conference on Hybrid Intelligent Systems 2009