While nonpredefined object segmentation (NDOS) distinguishes an arbitrary selfassumed object from its background, predefined object segmentation (DOS) prespecifies the target object. In this paper, a new and novel method to segment predefined objects is presented, by globally optimizing an orientationbased objective function that measures the fitness of the object boundary, in a discretized parameter space. A specific object is explicitly described by normalized discrete sets of boundary points and corresponding normal vectors with respect to its plane shape. The orientation factor provides robust distinctness for target objects. By considering the order of transformation elements, and their dependency on the derived oversegmentation outcome, the domain of translations and scales is efficiently discretized. A branch and bound algorithm is used to determine the transformation parameters of a shape model corresponding to a target object in an image. The results tested on the PASCAL dataset show a considerable achievement in solving complex backgrounds and unclear boundary images.
1. Introduction
F
irst of all, it is necessary to differentiate image segmentation and object segmentation. Image segmentation is a classic problem in computer vision. It targets the partitioning of an image into many nonsemantic regions, according to certain lowlevel features, such as color, intensity, gradient, and/or texture. The result of image segmentation does not indicate which regions belong to the foreground or background. Since it relies on lowlevel features, it consists of many assumptions, e.g. regions belonging to an object have homogenous color, and two adjacent regions are apparently different, in terms of color or intensity. On the other hand, object segmentation aims to find regions or boundaries of objects in an image. It explicitly distinguishes the foreground and background. Object segmentation is still a challenging problem, attracting the interest of many researchers.
Nonpredefined object segmentation (NDOS) raised a lot of attention for decades and has been studied extensively. In active contour based methods
[1]
, a contour is initialized around a target object. The process of minimizing the energy function that presents the curvature and the smoothness of the given contour evolutionally deforms it to the boundary of the target object. GrabCut
[2]
requires initializing a rectangle enclosing the target object. The object is segmented by minimizing an energy, relying on the knowledge of data inside and outside the rectangle. Another interactive object segmentation method is lazy snapping
[3]
. The user needs to provide two types of markers (i.e. foreground and background) appropriately near the boundary of the target object. The segmentation is achieved by global optimization using graphcut
[4]
. Some noninitialization segmentation methods rely on the saliency map
[5]
. Instead of utilizing a certain prior, K. Fukuchi et al.
[6]
used highsaliency regions as an automatic prior, to minimize an energy function based on graphcut. MingMing Cheng et al.
[7]
presented an outstanding method to find a saliency map based on global contrast of both feature and spatial information. Furthermore, P. Mehrani et al.
[8]
combined saliency map, graphcut, and learning in their work. All in all, NDOS methods aim to segment certain objects in an image, without knowing its actual shapes. They rely on basic characteristics, such as color continuity, and intensity discontinuity (high gradient magnitudes, or edges), to differentiate object and background.
In predefined object segmentation (DOS), the idea of utilizing prior shape for object detection was first raised by D. H. Ballard
[9]
in a paper about the generalized Hough transform. In the research, an object is described by a reference point, and a set of vectors presenting discrete points on the boundary of the object, with respect to the reference point.
Different from prior input in NDOS methods, prior input in DOS has a higher level representation, in terms of its structure. Such priors describe the whole shape of a target object, rather than its internal characteristics. V. Lempitsky et al.
[10]
proposed a segmentation framework in which priors are exemplars of many target objects. An object is defined by binary images of its plane shapes from various aspects. A. Toshev et al.
[11]
presented an object descriptor that is based on the holistic nature of an object. According to the descriptor, each boundary pixel is linked to all remaining ones, to form a ‘chord’ feature that records not only the length, but also the orientation relationships of pairs of boundary pixels. A chordiogram that is the histogram of all chords of an object is used as the descriptor of an object.
S. Abbasi
[12]
utilized curvature scale space (CSS) image to represent object boundary. In this approach, the original boundary (
u
parameterized curve) of an object is smoothed by different
σ
variance Gaussian kernels. The number of curvature zerocrossing points of a curve is inversely proportional to
σ
. The image capturing the relationship between
u
and
σ
is the CSS image of the object boundary. In DOS, since a target object is explicitly indicated based on its own nature, the description is more objective, than lowlevel priors and interactive markers.
This paper presents a novel method to detect and segment objects given prior shapes, by globally optimizing an orientationbased objective function in a discretized parameter space. Each prior shape is described by a normalized set of bound vectors (i.e., the length of each bound vector is 1), whose initial points are discrete points on the boundary of an object model, and directions are identical to normal vectors of the corresponding initial points with respect to the object boundary. Initial points can be picked evenly or uniformly randomly on an object boundary. The original priors, however, are binary images of objects, so that they are much more intuitive and straightforward to produce.
The workflow of our method
Derived sets of bound vectors, then, are normalized into the unit square, and stored into a database as shape models. The greater the quantity of bound vectors that is chosen, the higher the accuracy of the target shape that can be described. Furthermore, besides not storing whole images of exemplars, our prior descriptor is easier to apply transformation compared to
[10]
, which stores binary images of all feasible transformed exemplars. Indeed, we need to store only one normalized set of bound vectors, for each aspect of an object.
In a certain aspect, our object description method is similar to
[12]
which is primarily developed for the shape detection or identification of the object. While we use an ordered set of normal vectors of object boundary,
[12]
regarded curvature of the curve as the difference between adjacent tangent vectors. However, our approach relies on gradient orientation and magnitude while
[12]
worked on edge maps, where each curve in the derived edge map is considered as the shape of the object. It, then, is 'coded' in the form of CSS image. Consequently, CSS models of exemplars and CSS images are matched to find the closest pair. The problem of such methods is that edge map plays too large role, therefore it relies on the availability of exact edges in the image. However, in complex images, disrupted and merged edges are unavoidable. Also note that it targets on the identification of the shape, not the segmentation.
Global optimization is utilized to search through a parameter space for the most suitable configuration that makes a certain discrete model approximately fit the target object in the image. The dependency of transforming elements (i.e. translation, rotation, and scaling) on superpixels generated by
[13]

[14]
is considered to expeditiously discretize the parameter space. Moreover, we propose an orientationbased objective function to measure the fitness of the transformed discrete model, and target object in the image. The objective function is designed to be compatible with the discretization phase. We utilized the wellknown branchbound global optimization algorithm to search for a solution. The consequent transformed model is linked to form a closed curve, which may need refining, by applying the optimization
[15]
in a small number of iterations.
The results tested on the PASCAL dataset
[16]
indicate that our method is robust at segmenting an object on a complex background, or with unclear boundary images. In the preliminary version of this paper
[17]
, we proposed a simple, effective orientationbased object description, which is able to express arbitrary objects, together with an orientationbased objective function that measures the fitness of transformed object models and target objects in images. In addition, a robust solution space discretization was proposed.
In this paper, a preprocessing step using a bilateral filter is utilized to remove noise, but still preserve edges in images. Moreover, we demonstrate the effectiveness of orientation, by comparing segmented results generated with, and without, an orientation factor. The results show that when orientation is not utilized, not only is the dominance of the maximum objective value compared to others less, but also an inaccurate transformed model is chosen.
The next sections are organized as follows: our method is expressed in section 2; experiments are shown in section 3; and we make some conclusions in section 4.
2. Proposed Method
 2.1 Overview
The overall workflow of the proposed method is described briefly in
Fig. 1
. Records of object models that describe shapes of objects are stored in a model dataset. The superpixels in
Fig. 1
(b) are generated by oversegmenting the input image, using
[13]

[14]
. The quantity of superpixels should be small enough to discretize parameter space efficiently; it should also be large enough to include as much boundary of the target object as possible. Those superpixels form an uneven grid, on which the discretization phase depends. Originally, the parameter space(
t_{x}
,
t_{y}
,
λ
,
θ
), which indicates x, ytranslation, scaling and rotation, is continuous. However, it includes a lot of infeasible configurations, because of the digitization of the image and characteristics of the object shape. Based on boundaries of superpixels and the constraint of an aspect ratio (in 2.5), the parameter space is discretized efficiently, by merely considering feasible configurations. A branchbound algorithm is applied to find the most suitable transformation configuration (TC). The resultant transformed model is shown in
Fig. 1
(c). The tiny cyan circles represent the approximation (i.e. the parameter
h
) in the objective function (in section 2.4). Finally, the segmentation result is achieved after discrete points of the consequent model are chained and refined by the level set method
[15]
, in a small number of iterations (
Fig. 1
(d)).
The following subsections present in detail the preprocessing (2.2), the shape descriptor (2.3), orientationbased objective function (2.4), the discretization of parameter space (2.5), and a global optimization method (2.6).
 2.2 Preprocessing by Bilateral Filtering
The object segmentation in section 2.5 relies on the oversegmentation of an input image. There are some factors that have negative influences on oversegmentation, which are noise and uncertain edges (i.e. where color or intensity changes). Moreover, the proposed object function, in section 2.4, utilizes gradient magnitude. Hence, removing uncertain edges enhances the fitness of the correct transformed model and target object in the image, compared to others.
The bilateral filter is a local, noniterative, and simple method, which combines both feature and spatial information in the form of weight
[12]
. When applying a shiftinvariant lowpass domain filter to an image, we have
where,
k_{d}
= ∑
_{ξ∈N}
c
(
ξ

x
),
N
is the neighboring region of
x
, Range filtering is defined as
where,
k_{r}
= ∑
_{ξ∈N}
s
[
f
(
ξ
)
f
(
x
)]. Combining geometric and photometric factors, we have the bilateral filter
k_{d}
= ∑
_{ξ∈N}
c
(
ξ

x
)
s
[
f
(
ξ
)
f
(
x
)].
 2.3 Orientationbased Object Descriptor
The term ‘object’ is very general. It can be an arbitrary sort of visible thing, and can be found almost everywhere, in both the foreground and background of natural scene images. Despite considering only the foreground of an image, we can realize a lot of objects existing individually or conjointly.
The scene of a little boy wearing a yellow tshirt and handling a balloon aside, for instance, has at least three objects: the whole body of the boy, the yellow tshirt, and the balloon. While the balloon stays separate, the body and tshirt overlap each other. The question is, which thing is the target one? In some sense of subjective human mind, the body may be the most attractive one to detect or segment. However, is detecting or segmenting the tshirt or the balloon wrong, when clues of the goal object do not exist? NDOS together with the implications of the human mind, in fact, is able to cause the ambiguity.
Object segmentation can be seen as either binary image labeling, or boundary finding. Those two views have a mutual relationship, and result in distinguishing object and background by one or many closed contours defining the plane shape (from a certain aspect) of the object. Shape, therefore, is an important factor to detect and segment an object. Use of the distinctness of colors, or the discontinuity of intensity, is eventually able to identify object shape. Besides, given this sort of highlevel prior, we explicitly define the target object of segmentation. The ambiguity mentioned above can be prevented.
Estimation of a circle by a square (a), and by an octagon (b)
Binary images are used for inputs to generate the database of normalized models (sets of bound vectors) are binary images, in which white pixels belong to the object area, and black pixels are in the background. Then, the boundary of the object is produced from its connected components, using
[18]
. Not all points on an object boundary are utilized. A set of discrete points are established by sampling points on the boundary evenly, in terms of the quantity of intermediate points. In the simpler case, they can be picked randomly, with uniform distribution. A record corresponding to each boundary point is a bound vector, whose initial point is the boundary point, and its normal vector, with respect to the boundary of the object. Hence, it is a 4element vector, with the first two elements for the initial point, and the last two ones for the related normal vector. The quantity of records is chosen identically for all exemplars, so that the evaluation taking place later is effective. The number of bound vectors is proportional to the captured fineness of object shape. All those tasks are done in the preprocessing phase, which is separate from the segmentation process.
The use of normal vectors with discrete points provides an efficient and effective shape model for the object segmentation. It is simple in terms of computation and it can express significant structure of the object, using a set of bound vectors. Each bound vector consists of a boundary point, and its normal vector. The example in
Fig. 2
expresses how bound vectors imply the structure of an object. In order to estimate the original circle, we can use 4 bound vectors, which imply a square, or 8 bound vectors, which imply an octagon. The more bound vectors that are selected, the more accurate the estimation is. In other words, the number of bound vectors is proportional to the matching constraints.
The description (b) without an orientation feature can mismatch with red shapes (c) and (d).
The orientation factor fortifies the accuracy of object matching. In other words, random points have a low probability of mismatch to the object model. For instance, let us describe a circle without using orientation, as in
Fig. 3
(b). Not utilizing an orientation factor results in the mismatching of the model, as in
Fig. 3
(c), and
Fig. 3
(d). One of the factors reflecting a good object descriptor is the discrimination, i.e. it has to enhance the distinctness when a suitable transformed model fits and does not fit the target object in an image under a certain evaluation. In fact, such a factor is illustrated in
Fig. 5
, under our proposed objective function.
Exemplars are normalized into the unit square, and lists of derived bound vectors are stored as a prior dataset. Each exemplar corresponds to an object shape from a particular view point. Such relationship between transformation of view point, and set of bound vectors, is expressed by the function
P
where,
t_{x}
,
t_{y}
are x and ytranslation,
λ
is scale,
θ
is rotating angle, (
x_{i}
,
y_{i}
) is initial point, and (
u_{i}
,
v_{i}
) is normal vector.
The exemplars can be extended to 3D object model with the proposed object descriptor. A number of sampled view points are chosen from the 3D model to create plane shapes of that object, by projecting it into the planes orthogonal to the view directions. This descriptor is followed by a suitable evaluation function, and a simple and effective global optimization framework.
 2.4 Orientationbased Objective Function
To evaluate the fitness of each transformed model and target object in an image, an objective function is needed. The objective function maps sets of records, each of which consists of an initial point p and its normal vector
d_{p}
, into the set of real numbers
where,
N
is the number of bound vectors. To build the objective function, we rely on potential boundaries created by
[13]
. The oversegmentation process generating superpixels not only establishes regions of homogenouscolored pixels, but also forms potential boundaries of objects (i.e. pixels are potentially the boundary of the target object). The objective
f
is to find which model and its appropriate TC most fit a certain combination of potential boundaries, in terms of position and orientation.
f
is expressed as
where,
PB
is the set of pixels belonging to potential boundaries,
h
is the radius of circles whose centers are
p_{i}
’s,
N
is the quantity of bound vectors,
G_{q}
is the gradient magnitude of pixel
q
, and
∈_{G}
is a threshold of gradient magnitude.
The objective function
f
measures the summation of the averages of cosine similarity between each bound vector (
p_{i}
,
d_{pi}
) of the transformed model and bound vectors, whose initial points are in its neighboring circle. In other words,
f
evaluates the correspondences, in terms of the orientation between the bound vectors, and the orientation trend of pixels in their neighborhoods. Our objective is to find an exemplar, and its corresponding TC, such that the function
f
is maximized
where,
n
is the index of the exemplar, and
k
is the index of the TC.
The objective function
f
aims to not only match pointtopoint, which was done in
[12]
, and
[20]
, but it also seeks for a suitable discrete structure, based on orientation. Therefore, random noise points are restricted, to affect the measuring. In
Fig. 5
, we list all feasible transforming configurations (
t_{x}
,
t_{y}
,
λ
) (rotating angle
θ
is fixed to 0) of the proper model of the red car, and their value computed by
f
. In order to enable the showing of the correlation of those variables, they are split into three 2combinations. The maximum value of
f
in each combination is outstanding, in comparison with others. In addition, we demonstrate the need of orientation in
f
, by comparing how dominant the maximum value of
f
is, in the case of
f
with orientation, and without orientation. The comparison is exhibited in
Fig. 6
and
Fig. 7
.
 2.5 Parameter Space Discretization
The objective function can be given as follows by combining with the map P:
where,
s
∈
S
. The actual variable of the objective function is the TC (
t_{x}
,
t_{y}
,
λ
,
θ
). Because models capture object shapes in many aspects, it is not necessary to have two separate scaling parameters for x and ycoordinates. Therefore, we use only one variable
λ
, to indicate the scaling of the model.
While the transforming elements are treated individually in
[21]
, the optimal result can be achieved by globally optimizing over the entire parameter space. However, there is a tradeoff between them. A potential set of TCs is very huge, and it is infeasible to consider all of them. The purpose of this phase is to obtain a feasible set of TCs, which is much smaller than the potential set, based on the invariance of the aspect ratio given a certain rotating angle.
Proposition 1 proves the invariance of aspect ratio of model when it is rotated by a fixed angle
θ
. Due to result of oversegmentation, a model transformed by an appropriate TC can fit into the superpixel set which forms target object in image. Therefore, we only consider maximum and minimum of (
x
,
y
) coordinates of superpixels. Let
T
,
R
,
L
and
B
be the set of top points, right points, left points, and bottom points respectively. One 4point record is feasible if it contains 4 points
t
∈
T
,
r
∈
R
,
l
∈
L
and
b
∈
B
such that
is equal to a specified aspect ratio
C
; and
t
,
r
,
l
, and
b
belongs to corresponding sides of derived rectangle (
Fig. 4
.b and
Fig. 4
.c). Feasible set consists of all feasible 4point records. In order to find 4point records, proposition 2 firstly collects all feasible leftright pairs. Then, proposition 3 finds corresponding top or left points. In practical, we merely use 3 points to determine a rectangle (i.e., leftrighttop or leftrightbot).
(a) 4 optimal points of a superpixel; (b) a feasible 4point record; (c) an infeasible 4point record.
 Proposition 1
Given a rotating angle
θ
, the aspect ratio of the model that is transformed by arbitrary translation and scaling configuration (
t_{x}
,
t_{y}
,
λ
) is constant.
For each rotating angle
θ
∈ [0; 2
π
), translation and scaling are discretized into high potential sets of values. We suggest utilizing proposition 1, and the outcome of oversegmentation, to discretize the parameter space. Superpixels (patches) generated by
[13]
form an uneven grid, which plays the role as the basis of discretization. We consider the uneven grid, to find potential configurations of translation and scaling.
 Definition
Given a set of points rotated by a certain angle
θ
. Let
C
be the constant aspect ratio of those points. A rectangle in the image is called a potential bounding box (with respect to
θ
), if its aspect ratio is equal to
C
.
A list of all potential bounding boxes implies a list of translations, and a list of scaling values (with respect to
θ
). After being oversegmented by
[13]
, the input image is subdivided into many superpixels (patches), based on color distinctness and size constraint. The minimum and the maximum x and the ycoordinates of superpixels generate an uneven grid. By choosing a significant large number of superpixels, the boundaries of all superpixels can cover approximately the boundary of the target object. We let potential bounding boxes snap into the uneven grid, such that its minimum xcoordinate
x_{li}
corresponds to the minimum xcoordinate of superpixel
i
, its maximum xcoordinate
x_{rj}
corresponds to the maximum xcoordinate of superpixel
j
, and its minimum ycoordinate
y_{tk}
corresponds to the minimum ycoordinate of superpixel
k
. In the following propositions, we temporarily ignore the constraint of image size (i.e. the circumstance in which potential bounding boxes exceed the range of the input image is temporarily accepted). Violating ones will be eliminated afterward.
(a) x and ytranslation; (b) xtranslation and scale; (c) ytranslation and scale.
 Proposition 2
Ignore the constraint of image size. If there are two points ∀(x
_{1}
,y
_{1}
),(x
_{2}
,y
_{2}
):y
_{2}
y
_{1}
≤Cx
_{2}
x
_{1}
, then
x
_{1}
and
x
_{2}
are the bounds of the xcoordinates of a certain potential bounding box. (
C
is the aspect ratio of the normalized model (with respect to angle
θ
)).
Another proposition following proposition 2, is to completely show how to search for potential bound boxes; as a result, configurations of translation and scaling corresponding to a rotating angle
θ
are listed discretely.
 Proposition 3
Ignore the constraint of image size.
If ∀(x
_{1}
,y
_{1}
),(x
_{2}
,y
_{2}
):y
_{2}
y
_{1}
≤Cx
_{2}
x
_{1}
 then ∀
y
∈ [
CΔx
+
y_{M}
,
CΔx
+
y_{m}
],∃
x
_{0}
:(
x
_{0}
,
y
) belongs to the top or bottom of a certain potential bounding box (where
Δx
= 
x
_{2}

x
_{1}
,
y_{m}
= min(
y
_{1}
,
y
_{2}
), and
y_{M}
= max(
y
_{1}
,
y
_{2}
)).
According to the constant aspect ratio (in proposition 1) when given a rotating angle, potential bounding boxes of transformed models in the image are listed discretely, due to proposition 2 and proposition 3. Since the measuring of orientation in an objective function regards an radius circle, the set of ycoordinate values in proposition 3 can be enumerated as
Also, it does not require having all partial edges of the target object. The necessary portions are the left, the right, and the top of the target object. For that reason, we choose oversegmentation
[13]

[14]
, rather than Canny
[18]
.
We utilize the branchbound algorithm to find the global optimum in the discretized parameter space. In
Fig. 5
, based on the color map, in which values corresponding to the colors are increasing from left to right, blue areas indicate ignored sets of transforming configurations that are implicitly removed by the propositions, and the constraint of image size. It shows that our propositions are effective in discretizing the parameter space, by disregarding nonpotential transforming configurations. In the combination of xtranslation and scale (
Fig. 5
(b), there exist huge sets of ignored configurations. It is much more than the ones in
Fig. 5
(a) and
Fig. 5
(c), because the aspect ratio of the target object is small (its width is larger than its height).
 2.6 BranchBound for Global Optimization
A global optimization is taken to find the optimal solution for the objective function in the discretized parameter space. Among several techniques for the global optimization, such as graphcut
[4]
, branchbound
[22]
, and space mapping
[22]
, branchbound is chosen because of its efficiency and capability. Branchbound is a specialized discrete programming technique that provides a strategy to avoid full search through the entire solution space
[22]
. In the proposed object segmentaion scheme, the solution space is a 4D space of transformations (i.e. rotation, scale, x, and ytranslation). Different from
[10]
, the objective of our method is to build based on discrete sets of bound vectors, and to evaluate the fitness based on the matching of boundary structure (position and orientation), rather than pixel labels.
Branchbound is a wellknown algorithm in discrete programming. It has the ability to get rid of exhaustively searching the whole solution space, by fathoming subspaces that are not feasible to contain the optimal solution. The two main factors of the branchbound algorithm are a branching (subdivision) strategy, and a bounding function. The discretized parameter space is recursively divided into subspaces, each of which has a bounding value, reflecting its priority for being chosen. In our maximization problem, a bounding function
g
is designed to be an upper bound of the function
f
, such that

•g(v)≥f(v), ∀v

•

•, ifis a ‘leaf’ of a searching tree.
where,
is an upper bound of
g
in subspace
Thus,
g
tends to approach
f
from above, when we branch the discretized parameter space progressively. Subspaces that have the largest bounds in comparison with others in the same level are eliminated, together with their children. The branching process can be terminated, if
g
is close enough to
f
. That is how the branchbound algorithm can ease the size of the solution space.
The function
g
is supposed to be more straightforward to compute, compared to
f
. Consider the objective function
The first inequality is for the property of the cosine function; the second is proven in the following proposition.
 Proposition 4
Hence,
g
is the upper bound of
f
for all TCs. It counts the quantity of initial points (of the transformed model) that are close to an arbitrary potential boundary. The orientation factor is relaxed in
g
. At the same level, if
has more chance th be fathomed, and
has more chance to be picked soon.
The function
g
can also be viewed as
where,
PB
’ = {
q
∈
PB

G_{q}
>
ε_{G}
}, and ⌊·⌋ is the predicate function that is 1, if the condition is true, and 0, if otherwise.
can be calculated by choosing v
_{0}
, and loosening the radius of the neighboring area of boundary points, such that
(see
[20]
).
Even though the orientationbased objective function
f
is quite straightforward, it takes more time to compute, compared to
g
. From the view of (9),
g
simply counts the quantity of the intersect sets of a small ball
B
(
p_{i}
,
h
) and
PB
’ Thus, evaluating
g
is simpler and faster. In practice, it is about 10 times faster than evaluating
f
.
The branchbound algorithm is applied for each exemplar, whose details can be found in
[17]
.
 2.7 Refinement
After applying the branchbound algorithm to the discretized parameter space, we obtain the model and its corresponding TC that generates the fittest discrete ‘skeleton’ to the boundary of the target object. A continuous contour is created, by orderly chaining the resultant set of points. Because of the approximation (i.e. measuring the fitness of orientation in an
h
radius circle) of the objective function, the contour is not completely accurate. In order to refine it, we utilize
[15]
to steadily deform the contour to a high gradient magnitude curve in a small number of iterations. When working on a complex background and/or low contrast images, the number of iterations should be very few, so that the structure of the resultant transformed model is preserved.
3. Experimental Results
The proposed method was implemented in Matlab version 7.11.0 (2010b). The configuration of the testing system is an Intel® Core™ i52500 CPU @ 3.30GHz 3.30GHz, and 4GB RAM. For evaluation, PASCAL and ETHZ datasets
[16
,
24]
are used, which provide both sample images and ground truths. Models (exemplars) are generated by extracting edges in ground truth images, then picking a specified number of boundary points, and collecting their normal vectors. The size of a model is chosen to be one third of the longest boundary. They are then normalized into the unit cube, and stored.
Superpixels are generated using the method in
[13]
, and the number of superpixels is set to 200, which is large enough to cover most of the target object boundary. Then, adjacent superpixels that are similar in color (i.e. the difference of means of two adjacent ones is significantly small, compared to the sum of their standard deviations) are merged together.
The branchbound algorithm is implemented, based on the bestfirst search. In contrast, the one having the smallest upper bound has the highest chance to be subsequently considered. The radius
h
= 2. The effectiveness of the refinement step is dependent on how different the boundary between the target object and background is. We only run from 3 to 8 iterations of
[15]
.
Some examples to compare our method to the level set based method[15], GrabCut[2], and saliency based method[7].
Some examples to compare our method to the level set based method [15], GrabCut [2], and saliency based method [7].
Our method is robust at detecting and segmenting objects, especially rigid, because it relies on whole instances of object shape, and solves the problem globally. However, the payoff is that it needs many models (exemplars) of many types of objects and aspects. The average runtime for each exemplar is about 11 seconds. The relaxation variable h of the objective function decides how much variation of object, in terms of shape in image compared to models, the method can handle. Hence, rigid objects require less exemplars, than nonrigid objects do.
Firstly, we demonstrate results tested on PASCAL dataset
[16]
.
Table 1
shows some improvements of our method, compared to the level set based method (LS)
[15]
, GrabCut
[2]
(not including interactive foreground and background editing), and the global contrast based method (GC)
[7]
, due to its shapebased property. Initials of the LS are rectangles inside those target objects. In those examples, the boundary of object and background is not clear; as a result, they run 1410 iterations, without converging. The GC is strong at segmenting images in which the contrast between object and background is clear (e.g. the red car, and the yellow car in
Table 1
.
To evaluate the essence of orientation in the proposed objective function, we regard how dominant the maximum of
f
, which corresponds to the resultant configuration, is in comparison with others, in two opposite cases. In the first case, we use the proposed function (i.e.
f
_{1}
=
f
), and the second case is the same, except for getting rid of the orientation.
Only correct models and appropriate rotating angles of target objects are considered; therefore, translation and scaling vary. Values of
f
_{1}
and
f
_{2}
are computed for a discretized set
T
of translations and scales. The dominance is measured by
where,
k
∈{1,2}.
Above – Segmentation using f_{2}; below – Segmentation using f_{1}.
According to
Fig. 7
and the tested images, the percentage differences
between
D
_{1}
and
D
_{2}
of those cases are
21.36%, 30.15%, 30.34%,
and
47.67%
. The difference
increases, when the background is more complicated. The reason is that there are many noise points in complex background images (e.g., the silver and the yellow cars in
Table 1
that have much influence on
f
_{2}
(not including orientation). Thus,
f
_{1}
with orientation measuring can enhance the solution stronger than
f
_{2}
does.
Values of D_{1}, and D_{2} of images in Fig. 6.
Precision and recall of car images in PASCAL 2005 dataset.
Due to the fact that our approach finds the best TC for a certain model to fit into target object, improper TC may result in very low precision and recall. Therefore, there is a big gap between two major regions in
Fig. 8
. In 100 car images of PASCAL 2005, 82% of images exceeds 75% in terms of precision and recall. The average precision is 75.5%, and the average recall is 73.6%. Precision and recall are computed by following formula.
Bottles and giraffes in ETHZ dataset segmented by our method. Images on the left of each column are transformed models with discrete points and their hradius circles. Images on the right of each column are final results.
Some examples to compare our method to branchmincut[10], and Chordiogram[11].
Some examples to compare our method to branchmincut [10], and Chordiogram [11].
The source code of
[10]
is from the author’s homepage. Its runtime varies from 29 to 102 seconds depending on image characteristic. The outputs of Chordiogram are from
[11]
.
Run time of branchmincut[10]with images in the ETHZ dataset.
Run time of branchmincut [10] with images in the ETHZ dataset.
where
TP
is the overlapping area of transformed model and target object, FN is the area belonging to target object but not transformed model, and
FP
is the area belonging to transformed model but target object.
Besides, our method achieves considerable result when tested on ETHZ dataset (
Fig. 9
). Bottle class, rigid object, and giraffle class, nonrigid object, are chosen for evaluation.
[10]
inspired us in utilizing global optimization, while
[11]
raised the good idea of relying on superpixels.However, different from
[11]
, which regards superpixels in the aspect of regions, our method concerns about seeking for partial superpixel boundaries which constitute a whole target object. We compared results of these 3 methods in
Table 2
.
4. Conclusions
A novel method for the predefined object segmentation was presented, by using global optimization in a discretization parameter space. By demonstrating the order of transformation elements with the derived constraints and superpixels, the parameter space is discretized. An orientationbased objective function, which is capable of measuring the fitness of transformed models and objects, was proposed. The robustness of the orientationbased objective function turned out to overcome situations in which some partial edges are lost. The reason is that it recognizes an object as a whole instance. Our method is able to detect and segment objects, especially rigid and multicolor objects, because it relies on the whole instance of an object shape, and solves the problem globally.
Acknowledgements
This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MEST)(2012047759 and 2013006535). Also this research was supported by the MSIP(Ministry of Science, ICT&Future Planning), Korea, under the ITRC(Information Technology Research Center) support program (NIPA2013H0301133005) supervised by the NIPA(National IT Industry Promotion Agency).
BIO
Huy Hoang Nguyen received the B.S. degree in Mathematics and Computer Science from the University of Science Ho Chi Minh City, Vietnam in 2009, and M.S. degree in Electronics and Computer Engineering from the Chonnam National University, Korea in 2013. His interesting studies are in multimedia and image segmentation, document processing, and pattern recognition.
GueeSang Lee received the B.S. degree in Electrical Engineering and the M.S. degree in Computer Engineering from Seoul National University, Korea in 1980 and 1982, respectively. He received the Ph.D. degree in Computer Science from Pennsylvania State University in 1991. He is currently a professor of the Department of Electronics and Computer Engineering in Chonnam National University, Korea. His research interests are mainly in the field of image processing, computer vision and video technology.
SooHyung Kim received his B.S. degree in Computer Engineering from Seoul National University in 1986, and his M.S. and Ph.D degrees in Computer Science from Korea Advanced Institute of Science and Technology in 1988 and 1993, respectively. From 1990 to 1996, he was a senior member of research staff in Multimedia Research Center of Samsung Electronics Co., Korea. Since 1997, he has been a professor in the Department of Computer Science, Chonnam National University, Korea. His research interests are pattern recognition, document image processing, medical image processing, and ubiquitous computing.
Hyung Jeong Yang received her B.S., M.S. and Ph. D from Chonbuk National University, Korea. She is currently an associate professor at Dept. of Electronics and Computer Engineering, Chonnam National University, Gwangju, Korea. Her main research interests include multimedia datamining, pattern recognition, artificialin telligence, eLearning, and eDesign.
Kass M.
,
Witkin A.
,
Terzopoulos D.
1987
“Snakes: Active contour models”
in Proc. of ICCV’87
Article (CrossRefLink).
259 
267
Rother C.
,
Kolmogorov V.
,
Blake A.
2004
“GrabCut: interactive foreground extraction using iterated graph cuts”
ACM SIGGRAPH Journal
Article (CrossRefLink).
23
(3)
309 
314
DOI : 10.1145/1015706.1015720
Li Y.
,
Sun J.
,
Tang C.K.
,
Shum H.Y.
2004
“Lazy snapping”
ACM SIGGRAPH Journal
Article (CrossRefLink).
23
(3)
303 
308
DOI : 10.1145/1015706.1015719
Boykov Y.
,
Veksler O.
,
Zabih R.
2001
“Fast Approximate Energy Minimization via Graph Cuts”
IEEE PAMI Transaction
Article (CrossRefLink).
23
(11)
1222 
1239
DOI : 10.1109/34.969114
Itti L.
,
Koch C.
,
Niebur E.
1998
“A Model of SaliencyBased Visual Attention for Rapid Scene Analysis”
IEEE PAMI Transaction
Article (CrossRefLink).
20
(11)
1254 
1259
Fukuchi K.
,
Miyazato K.
,
Kimura A.
,
Takagi S.
,
Yamato J.
2009
“Saliencybased video segmentation with graph cuts and sequentially updated priors”
in Proc. of ICME’09
Article (CrossRefLink).
638 
641
Cheng M.M.
,
Zhang G.X.
,
Mitra N.J.
,
Huang X.
,
Hu S.M.
2011
“Global contrast based salient region detection”
in Proc. of CVPR ’11
Article (CrossRefLink).
409 
416
Mehrani P.
,
Veksler O.
2010
“Saliency Segmentation based on Learning and Graph Cut Refinement”
in Proc. of BMVC’10
Article (CrossRefLink).
1 
12
Ballard D.H.
1981
“Generalizing the hough transform to detect arbitrary shapes”
PR Journal
Article (CrossRefLink).
13
(2)
714 
725
Lempitsky V.
,
Blake A.
,
Rother C.
2012
“BranchandMincut: Global Optimization for Image Segmentation with HighLevel Priors”
MIV Journal
Article (CrossRefLink).
44
(3)
315 
329
Toshev A.
,
Taskar B.
,
Daniilidis K.
2012
“ShapeBased Object Detection via Boundary Structure Segmentation”
CV Journal
Article (CrossRefLink).
99
(2)
123 
146
Abbasi S.
,
Mokhtarian F.
,
Kittler J.
1999
"Curvature scale space image in shape similarity retrieval"
Multimedia Syst.
Article (CrossRefLink).
7
(6)
467 
476
DOI : 10.1007/s005300050147
Shi J.
,
Malik J.
2000
“Normalized Cuts and Image Segmentation”
PAMI Journal
Article (CrossRefLink).
22
(8)
888 
905
DOI : 10.1109/34.868688
Malik J.
,
Belongie S.
,
Leung T.
,
Shi J.
2001
“Contour and Texture Analysis for Image Segmentation”
CV Journal
Article (CrossRefLink).
43
(1)
7 
27
Li C.
,
Xu C.
,
Gui C.
,
Fox M.D.
2010
“Distance regularized level set evolution and its application to image segmentation”
IP Transaction
Article (CrossRefLink).
19
(12)
3243 
3254
The homepage of PASCAL visual object classes
http://pascallin.ecs.soton.ac.uk/challenges/VOC/
Nguyen N. N.
,
Kim H. R.
,
Cha J. S.
,
Le T. K. V.
,
Lee G.S.
2013
“Global Optimization in Discretized Parameter Space for Predefined Object Segmentation”
in Proc. of ICUIMC ‘13
Article (CrossRefLink).
Breuel T.M.
2003
“Implementation techniques for geometric branchandbound matching methods”
CVIU Journal
Article (CrossRefLink).
90
(3)
258 
294
Belongie S.
,
Malik J.
,
Puzicha J.
2002
“Shape Matching and Object Recognition Using Shape Contexts”
PAMI Transaction
Article (CrossRefLink).
24
(4)
509 
522
DOI : 10.1109/34.993558
Breuel T.M.
1992
“Fast recognition using adaptive subdivisions of transformation space”
in Proc. of CVPR ’92
Article (CrossRefLink).
445 
451
Land A. H.
,
Doig A. G.
1960
“An Automatic Method of Solving Discrete Programming Problems”
Econometrica Journal
Article (CrossRefLink).
28
(3)
497 
520
DOI : 10.2307/1910129
Bakr M. H.
,
Bandler J.W.
,
Madsen K.
,
Søndergaard J.
“Review of the space mapping approach to engineering optimization and modeling”
OE Journal
Article (CrossRefLink).
1
(3)
241 
276
The homepage of ETHZ dataset
http://www.vision.ee.ethz.ch/datasets/