To manipulate large video contents, effective video indexing and retrieval are required. A large number of video indexing and retrieval algorithms have been presented for framewise user query or video content query whereas a relatively few video sequence matching algorithms have been proposed for video sequence query. In this paper, we propose an efficient algorithm that extracts key frames using color histograms and matches the video sequences using edge features. To effectively match video sequences with a low computational load, we make use of the key frames extracted by the cumulative measure and the distance between key frames, and compare two sets of key frames using the modified Hausdorff distance. Experimental results with real sequence show that the proposed video sequence matching algorithm using edge features yields the higher accuracy and performance than conventional methods such as histogram difference, Euclidean metric, Battachaya distance, and directed divergence methods.
I. INTRODUCTION
To efficiently manage and utilize digital media, various video indexing and retrieval algorithms have been proposed. A large number of video indexing and retrieval methods have been proposed, focusing on framewise query or indexing, whereas a relatively few algorithms have been presented for video sequence matching or video shot matching. In this paper, we propose the efficient algorithm to match the video sequences for video sequence query.
If the video indexing algorithm shows a lot of false or miss shot boundaries, the accuracy can be reduced, where the accuracy is defined using the numbers of false and miss detections
[1]
. In this paper, to improve the accuracy of video sequence matching, we propose the efficient key frame extraction algorithm using the color histograms of consecutive frames and the sequence matching algorithm using the modified Hausdorff distance with edge features, which yields a higher performance than conventional methods.
The key frames extracted from segmented video shots can be used not only for video shot clustering but also for video sequence matching or browsing, where the key frame is defined by the frame that is significantly different from the previous frames
[2]
. Many key frame extraction algorithms have been proposed, in which similar methods used for shot boundary detection were employed with proper similarity measures. The key frame extraction method using set theory employing the semiHausdorff distance and the key frame selection algorithm using skincolor and face detection have been also proposed. In this paper, we propose the efficient key frame extraction algorithm that employs the cumulative measure and the distance between key frames, and compare its performance with that of conventional algorithms.
Video sequence matching using key frames can be performed by evaluating the similarity between data sets of key frames. In this paper, to improve the matching efficiency with the sets of extracted key frames we employ color and edge features. Experimental results with several video sequences show that the proposed methods give better matching accuracy than conventional algorithms. As the performance measure of matching methods, we introduce the accuracy ratio of the average normalized value of the nonmatching shots to that of the matching shot.
The rest of the paper is structured as follows. The distance measures for video indexing are briefly discussed in Section II. The proposed algorithm for video sequence matching is presented in Section III and experimental results are shown in Section IV. Finally conclusions are given in Section V.
II. DISTANCE MEASURES FOR VIDEO INDEXING
The commonly used video indexing methods utilize histogram comparisons, because histograms show less sensitivity to frame changes within a shot and extraction of histograms is computationally efficient compared with the motion based methods. Most common algorithms using histogram comparison include histogram difference, Euclidean metric, Battachaya distance, and directed divergence
[3]
.
 A. Histogram Difference
The histogram difference is defined by
where
H_{t}
(
j
) signifies the histogram in the
j
th bin, 0 ≤
j
≤ 255, with the subscript
t
denoting the
t
th frame and bin signifying the gray level range of the histogram representation.
 B. Euclidean Metric
The Euclidean metric between histograms is defined by
 C. Battachaya Distance
The Battachaya distance with respect to histograms is used to estimate the distance between histogram features, defined by
where
j
represents the bin index of the histogram.
 D. Directed Divergence
The divergence measure is defined by the sum of directed divergences. The directed divergences of histograms are expressed as
III. PROPOSED VIDEO RETRIEVAL ALGORITHM
To match video sequences, we first extract key frames using the cumulative measure and the distance between key frames, then evaluate the similarity between two video sequences by employing the modified Hausdorff distance between two sets of key frames: one extracted from the query sequence and the other from the video sequence to be matched. The similarity between two sets of key frames is computed using the modified Hausdorff distance. The proposed video sequence matching algorithm consists of three steps: key frame extraction, key frame matching, and video sequence matching.
 A. Key Frame Extraction Using the Cumulative Measure and the Distance between Key Frames
In the proposed algorithm, we use the cumulative measure based on the histogram difference
to efficiently extract candidate key frames, where
k
denotes the total number of accumulated frames that can be varied depending on the criteria for key frame extraction. Note that application of the cumulative concept over
k
frames to (1) yields the cumulative measure (5).
The key frames are detected when two criteria are satisfied: if the cumulative value
C
between the current frame and the previous key frame is larger than the given threshold, and the histogram difference (1) between the previous key frame and the current frame is larger than the threshold. T
The key frames extracted within video shots can be used not only for representing contents in video shots but for efficiently matching the video sequences with a very low computational load
[4]
.
 B. Key Frame Matching Using Edge Features
To match video sequences efficiently, we perform edge matching. To extract edge features we employ the MarrHildreth edge detector. Edges in the current frame
I
(
i
,
j
) satisfy
where * denotes the convolution operator. The Gaussian function
G
is defined by
where
and
σ
represents a Gaussian spread parameter, standard deviation. ∇
^{2}
G
signifies the Laplacian of Gaussian
[5]
The edge density and thickness can be controlled by
σ
.
To efficiently match edges of two key frames, we propose a novel approach. The edge matching procedures in
Y
(luminance) component are summarized as follows.

1) Extract the query edge imageEq(i,j) from the query key frame and the matched edge imageEm(i,j) from the matched key frame.

2) Obtain the ‘common edge imageEc(i,j) ’ fromEq(i,j) andEm(i,j) using ‘AND’ operation.

3) Calculate the edge matching rate (EMR) defined by
where NEP represents the number of edge pixels.
The EMR is used to calculate the similarity between key frames. Note that to reduce a computational load edge matching is performed only on key frames rather than on all the frames. Edge matching is applied to video sequence matching efficiently and the experimental results are shown in Section IV.
 C. Video Sequence Matching Using the Modified Hausdorff Distance
For matching between video sequences, we employ the modified Hausdorff distance measure. In this paper, to efficiently evaluate the similarity between two sets of key frames, we use the modified Hausdorff distance
DSR
(
k
) given by
where
R
= {
r
_{1}
, ... ,
r_{K}
} signifies the set of key frames for the sequence to be matched and
S
= {
s
_{1}
, ... ,
s_{N}
} represents the set of key frames for the query sequence, with
K
and
N
denoting the total numbers of elements in sets
S
and
R
, respectively.
DSR
(
k
) represents the modified Hausdorff distance between the
k
th and (
k
+1)th key frames of the sequence to be matched, with
d
(
s
,
r
)(
s
∈
S
and
r
∈
R
) signifying the EMR in (9) obtained from edge matching between two key frames
[6]
.
Let indices
t
and
k
specify the time index and the key frame index of the video sequence to be matched, respectively, 1 ≤
t
≤
T
and 1≤
k
≤
K
<
T
with
T
and
K
representing the total number of the test frames and the total number of its key frames, respectively. Let
f_{kt}
denote the function that maps the key frame index
k
to the time index
t
.
f_{kt}
^{1}
represents the inverse function of
f_{kt}
. The normalized modified Hausdorff distance
NMHD
(
t
) can be defined by
Note that
NMHD
(
t
) is constant for the time index
t
that corresponds to the key frame index interval between
k
anf
k
+1.
The normalized similarity metric (11) represents the dissimilarity between the two sequences: the normalized values for ‘Matching shots’ are small whereas those for ‘Nonmatching shots’ are large. The ratio of the average
NMHD
for ‘Nonmatching shots’ to that for ‘Matching shots’ represents the separation capability. It is noted that the algorithm with a large ratio can match sequences accurately.
IV. SIMULATION RESULTS AND DISCUSSIONS
 A. Key Frame Extraction
To show the effectiveness of the proposed algorithm, we simulate video sequence matching using color test sequence: ‘Animation’ sequence consisting of nine shots within 330 frames with
K
=9 and
T
=330 containing large motions and dynamic scene changes.
To extract the key frames we use two criteria. If both the cumulative value in (5) and the histogram difference value in (1) between the previous key frame and the current frame are larger than threshold values, the candidate frame is extracted as a key frame. Even though the accumulated value is larger than the threshold value, the frame is not regarded as a key frame since the accumulated value can be gradually increased with the histogram difference value between the previous key frame and the current showing the value smaller than the threshold. To extract a key frame, both conditions with respect to (1) and (5) must be satisfied. Once the key frame is extracted, the cumulative value is reset to zero. If the thresholds to extract key frames are large, the number of key frames and the computational load can be reduced.
 B. Video Sequence Matching
To show the performance of video retrieval algorithm five methods are simulated with color video sequences. In experiments for key frame extraction, we apply the cumulative concept to all of form similarity measures.
Table I
and
Fig. 1
show matching results of the color ‘Animation’ sequence using the modified Hausdorff distance.
Performance comparison of video sequence matching using the modified Hausdorff distance.
Performance comparison of video sequence matching using the modified Hausdorff distance.
Performance comparison of video sequence matching as a function of the frame number. (a) Histogram Difference, (b) Euclidean Metric, (c) Battachaya Distance, (d) Directed Divergence, (e) Proposed Method
In experiments for video sequence matching, we assumed that the video sequence to be matched includes the query sequence and the query sequence has similar frame length to same shot within the video sequence to be matched.
In
Table I
and
Fig. 1
, the query sequence, the same as shot 2 in the ‘Animation’ sequence, consists of frames from 49 to 83, and ‘Histogram difference’, ‘Euclidean metric’, ‘Battachaya distance’, and ‘directed divergence’ signify the histogram difference method, the Euclidean metric method, the Battachaya distance method, and the directed divergence method, respectively. Note that the modified Hausdorff distance measure is applied to all methods.
Fig. 1
shows the normalized modified Hausdorff distance value between the set of query key frames and that of the video sequence to be compared as a function of the frame number. The normalized value is obtained by (11), resulting in the normalized value between 0 and 1. In
Table I
, ‘Matching shots’ (‘Nonmatching shots’) represents the average of normalized modified Hausdorff distances (11) between the set of query key frames and the set of color video sequence to be compared over the interval that contains (does not contain) ‘Matching shots’.
As shown in
Table I
and
Fig. 1
, in the proposed method using edge and color features the ratio between ‘Matching shots’ and ‘Nonmatching shots’ is larger than the conventional methods using only color histograms. That is, the algorithm using edge features can reduce the number of false matching, whereas the conventional video sequence matching methods may yield a lot of false matching. The conventional methods show wide variations for ‘Nonmatching shots’. In contrast, the proposed method employing edge features shows small fluctuations for ‘Nonmatching shots’ (see
Fig. 1
).
The proposed video sequence matching also shows large accuracy ratio for both query sequence extracted from sequences compared with that of the conventional methods.
Fig. 2
shows the ratio (
B
/
A
) of the proposed video sequence matching algorithm as a function of the threshold of cumulative value. As shown in
Fig. 2
, the ratio decreases as the threshold increases. That is, to reduce the computational load, the number of key frames can be reduced by increasing the threshold for the cumulative value, however yielding low performance.
Ratio (B/A) of the proposed video sequence matching algorithm as a function of the cumulative measure.
Therefore, in experiments, the threshold is set to 0.1, which satisfies both the performance and the computational load.
Table I
shows that the proposed method using color and edge features can improve the accuracy for color video sequence matching, compared with conventional measures such as the histogram difference, Euclidean metric, Battachaya distance, and directed divergence.
In MPEG7 standardization, any specific video sequence matching method is not described. The proposed method can be applied to MPEG7 standard by using the MPEG7 descriptors for video content management and automatic monitoring system efficiently with low computational complexity
[7]

[10]
.
V. CONCLUSIONS
This paper proposes the efficient video sequence matching method using color and edge features with the modified Hausdorff distance. It gives a higher accuracy and efficiency than conventional methods such as the histogram difference, Euclidean metric, Battachaya distance, and directed divergence methods. The combination of color histograms and edge features improves the accuracy of video sequence matching. Experimental results with real video sequences show that the proposed algorithm can successfully extract key frames and match video sequences efficiently, showing a higher accuracy than the conventional methods. Further research will focus on the extension of the algorithm for various video sequences containing complex scenes.
BIO
Sang Hyun Kim
He received the B.S. and M.S. degrees in electronic and control engineering from Hankuk University of Foreign Studies, in 1997 and 1999, respectively, and the Ph.D. degree in electronic engineering from Sogang University, in 2003. In 2003 and 2004, he worked on the Digital Media Research Laboratory in LG Electronics Inc., as a Senior Research Engineer. In 2004 and 2005, he also worked on the Computing Laboratory at Digital Research Center in Samsung Advanced Institute of Technology, as a Senior Research Member. Since 2005, he has been with the school of convergence & fusion system engineering at Kyungpook National University as an associate professor. His current research interests are video indexing, video coding, and computer vision.
Wen X.
,
Shao L.
,
Fang W.
,
Xue Y.
2015
“Efficient feature selection and classification for vehicle detection,”
IEEE Trans. Circuits and Systems for Video Technology
25
(3)
508 
517
DOI : 10.1109/TCSVT.2014.2358031
Luis G.
,
Tuia D.
,
Moser G.
,
Gusta C.
2015
“Multimodal classification of remote sensing images: A review and future directions,”
Proc. of IEEE
103
(9)
1560 
1584
DOI : 10.1109/JPROC.2015.2449668
Jaffery Z. A.
,
Dubey A. K.
2013
“Architecture of noninvasive real time visual monitoring system for dialtype measuring instrument,”
IEEE Sensors Journal
13
(4)
1236 
1244
DOI : 10.1109/JSEN.2012.2231940
Yang Y.
,
Zha Z.
,
Gao Y.
,
Zhu X.
,
Chua T.
2014
“Exploiting web Images for semantic video indexing via robust samplespecific loss,”
IEEE Trans. Multimedia
16
(6)
1677 
1689
DOI : 10.1109/TMM.2014.2323014
Chasanis V. T.
,
Likas A. C.
,
Galatsanos N. P.
2009
“Scene detection in video using shot clustering and sequence alignment,”
IEEE Trans. Multimedia
11
(1)
89 
100
DOI : 10.1109/TMM.2008.2008924
Geng J.
,
Miao Z.
,
Zhang X.P.
2015
“Efficient heuristic methods for multimodal fusion and concept fusion in video concept detection,”
IEEE Trans. Multimedia
17
(4)
498 
511
DOI : 10.1109/TMM.2015.2398195
Yan H
,
Paynabar K.
,
Shi H.
2015
“Imagebased process monitoring using lowrank tensor decomposition,”
IEEE Trans. Automation Science and Engineering
12
(1)
216 
227
DOI : 10.1109/TASE.2014.2327029
Yin Y.
,
Yu Y.
,
Zimmermann R.
2015
“On generating contentoriented geo features for sensorrich outdoor video search,”
IEEE Trans. Multimedia
17
(10)
1760 
DOI : 10.1109/TMM.2015.2458042
Youm S.
,
Jeon Y.
,
Park S.
,
Zhu W.
2015
“RFIDbased automatic scoring system for physical fitness testing,”
IEEE Systems
9
(9)
326 
334
DOI : 10.1109/JSYST.2013.2279570
Ferdousi S.
,
Dikbiyik F.
,
Habib M. F.
,
Tomatone M.
,
Mukherjee B.
2015
“Disasteraware datacenter placement and dynamic content management in cloud networks,”
IEEE Optical Communication and Networking
7
(7)
681 
694
DOI : 10.1364/JOCN.7.000681