In this paper, a new algorithm by using 3D warping technique to effectively fill holes that are produced when creating a virtualviewpoint image is proposed. A hole is defined as the region that cannot be seen in the reference view when a virtual view is created. In the proposed algorithm, to reduce the blurring effect that occurs on the hole region filled by conventional algorithms and to enhance the texture quality of the generated virtual view, Exemplar Based Inpainting algorithm is used. The boundary noise which occurs in the initial virtual view obtained by 3D warping is also removed. After 3D warping, we estimate the relative location of the background to the holes and then pixels adjacent to the background are filled in priority to get better result by not using only adjacent object's information. Also, the temporal inconsistency between frames can be reduced by expanding the search region up to the previous frame when searching for most similar patch. The superiority of the proposed algorithm compared to the existing algorithms can be shown through the experimental results.
1. Introduction
Through the development of broadcasting technology, 3DTV, UHDTV and other next generation services have been able to provide viewers with unseen realism and the sense of immersion. The majority of the current 3D movies or 3DTV broadcasts are stereoscopic; in other words, they use two images, one being the left one and the other being the right one. In this case, because only the left and right stereo images are used, one viewpoint can be provided. Another drawback to the stereoscopic method is the need for the viewers to wear 3D glasses. To solve these problems, research for multiview is in progress. Multiview is formed with at least two or more views which mean that the image can be seen from a different viewpoint based on the location of the viewer. Not only does the multiview broaden the field of view, it also provides convenience by not requiring the use of 3D glasses. However, with the increase in the number of viewpoints, the number of cameras also increases which hampers smooth synchronization and calibration between the cameras. Additionally storing or transmitting the data is difficult because the amount of data increase proportionally to the number of cameras. Therefore, virtual view synthesis algorithms that use the minimum number of cameras to obtain reference views and then produce needed viewpoints with the obtained reference views are being researched.
Depth based 3D warping algorithm is used as a virtual view generation algorithm. 3D warping uses the depth information and camera parameters of the reference view to calculate the 3D world coordinates of the view. The calculated 3D world coordinates then are reprojected to the image plane with the camera parameters located at the virtual viewpoint. This algorithm allows the generation of intermediate views in between the reference views and also the generation of arbitrary virtual views using the defined virtual viewpoint and it’s camera's parameters
[1]
.
However, in the virtual view generated with 3D warping shows the occlusion region that could not be seen in the reference view. Because this region cannot be referenced, it remains as a commonhole. Therefore, these commonholes occurred as a result of 3D warping need to be filled in order to generate a high quality image. However, in most cases, it is very difficult to fill the commonholes perfectly because there is not enough information given about the commonholes. To fill the commonholes, linear interpolation and inpainting algorithms have been proposed
[2]
.
Linear interpolation on a set of common hole pixels is defined as the concatenation of linear interpolates between each pair of the opposite end pixel of the commonhole region
[3]
. This method has a fast execution speed but results in filled commonholes of low quality. Inpainting is an algorithm originally used to restore damaged regions of the original image
[4]
. It uses the color information around the damaged region to estimate the value of the regions that needs to be restored as naturally as possible. The VSRS (view synthesis reference software) provided by MPEG uses a pixel based inpainting algorithm
[5
,
6]
.
However, if a large region is restored using the pixel based inpainting algorithm, the restored region becomes blurred. To solve this problem, inpainting algorithms using a texture synthesis method have been proposed. Out of those, Criminisi proposed an exemplar based inpainting algorithm that uses a patch unit scale that can improve the texture of image
[7]
. This algorithm restores the image by prioritizing pixels on the boundary of the hole region and using a patch which include first priority pixel to find the most similar patch within the image. YJ. Kim
[8]
used this algorithm directly to fill commonholes in the virtual view while Daribo
[9]
proposed that the depth value be additionally used in this algorithm. However, most of the existing algorithms have problems because object’s information is used in the hole filling process and in the case of a video, the algorithm is applied independently to each frame so the filled commonhole region in between the frames are not matched which may cause flickering effect.
To solve these problems, a new commonhole filling algorithm using exemplarbased inpainting is proposed in this paper. In the proposed algorithm, the depth image that corresponds with the color image will be used to find the world coordinates with the calculated camera parameters and then they are reprojected into the virtual viewpoint image. In the generated virtual view by reprojection, boundary noise is created because the mismatch of the color and depth image boundaries. Therefore, we should remove this boundary noise first.
After 3D warping and removing the boundary noise, we estimate the relative location of the background to the holes and then pixels adjacent to the background are filled in priority to get better result by not using adjacent object's information. When searching for the region most similar to the defined patch, the previous frame’s original region is set as the search range in addition to the current frame in order to minimize the temporal inconsistency within frames. Many experiments have been performed in order to confirm that the proposed algorithm is more effective than the existing algorithms.
This paper is organized into 5 parts. First in section 2, the fundamental concept of the 3D warping and exemplar based inpainting algorithm will be explained. Section 3 will discuss the proposed commonhole filling algorithm of the virtual viewpoint image using exemplar based inpainting algorithm. Section 4 will include various experimental results to confirm the performance of the proposed algorithm and section 5 will discuss the conclusion.
2. 3D Warping & Exemplar Based Inpainting
 2.1 Virtual view generation by 3D warping
In 3D warping, the world coordinates of a given image by using the intrinsic and extrinsic camera parameters are first determined and then a desired viewpoint image through the reprojection to the 2D space by using camera parameters at the virtual viewpoint is determined. The world coordinate and the camera coordinate systems are both 3D coordinate systems and conversion between these two systems can be achieved through rotational and translational operations as shown in
Fig. 1
. These two operations are defined as the camera’s extrinsic parameter.
Transformation of the world coordinate to the camera coordinate
The conversion of the camera coordinate system to the 2D image coordinate system can be explained through the camera’s geometrical structure as shown in
Fig. 2
.
Fig. 2(a)
represents the 3D model of the pinhole camera, and
Fig. 2(b)
represents its 2D model. Generally, the image from the pinhole camera passes through the pinhole in a straight line and is formed as a reverse image in the −f location of the Zaxis, where the f stands for the focal length of the camera. However, the plane where an image is formed is moved to the focal length to the positive direction on the Zaxis then analyzed for convenience.
Geometrical structure of the pinhole camera; (a) 3D and (b) 2D
The projection of the object’s 3D coordinates of the camera on the image plane can be explained using similar triangles that are formed using the focal length and the coordinates of the object as shown in
Fig. 2(b)
. This conversion process is defined as the intrinsic parameter.
Using the extrinsic and intrinsic parameters of the camera, the 3D coordinates in the world coordinate system can be converted to the 2D coordinates in the image plane as shown in Eq. (1).
where
x
,
y
represents 2D coordinates of the image plane,
K
represents the intrinsic parameter of the camera,
R
is the rotational matrix of the camera,
T
is the translational vector of the camera and
X
,
Y
,
Z
represents the 3D coordinates of the world coordinate system. Also,
K
[
R

T
] is defined as the projection matrix.
Through the inverse operation of the matrix in Eq. (1), the 2D coordinates can be converted back to the world coordinate system as shown in Eq. (2). At this time, the disparity information
D
from Eq. (3) is needed to find the real depth value
Z
.
where
Z
(
i
,
j
) and
D
(
i
,
j
) are the depth value and the disparity values of the (
i
,
j
) coordinate within the image,
MinZ
and
MaxZ
represents the minimum and maximum values of
Z
, respectively.
To produce a virtual viewpoint image, the intrinsic and extrinsic parameters of a camera that exist at a virtual viewpoint location need to be defined. In general, the intrinsic parameter is determined by camera’s inner structure. Thus, the intrinsic parameter of the camera at reference viewpoint can be used as the intrinsic parameter of the virtual viewpoint camera. The value of the extrinsic parameter can be used after converting to the location of the virtual viewpoint camera. The converted 3D world coordinate and the parameter of the virtual viewpoint camera are applied to Eq. (1) to find Eq. (4). Then, Eq. (4) reprojects to the image coordinates of the virtual viewpoint.
where
x_{v}
,
y_{v}
represents 2D coordinates of the virtual viewpoint image to be formed, and
K_{V}
,
R_{V}
, and
T_{V}
represent the intrinsic parameter, rotation matrix and the translation vector of the virtual viewpoint camera, respectively
[1]
.
Fig. 3
shows an example of a virtual view produced by the 3D warping algorithm. As shown in
Fig. 3(b)
, an occlusion region can be found which cannot be seen at the reference view. This region is the commonhole region.
Generation of a virtual viewpoint image; (a) reference color and depth images and (b) generated virtual viewpoint image transformation of the world coordinate to the camera coordinate
 2.2 Exemplar based Inpainting
Inpainting is an algorithm generally used for restoring old images or images with damaged regions. In this algorithm, the region that needs to be restored is filled with estimates by using the information around the region that needs to be restored because there is no information given about the region. The earliest form of the inpainting algorithm existed in pixel units. However, pixel based inpainting algorithm fills the region by pixels so if the region grows, the quality of the texture of the image decreases and the blurring effect may also take place. Therefore, exemplar based inpainting algorithm using texture synthesis has been proposed. The structure of the exemplar based inpainting algorithm can be seen in
Fig. 4
.
Structure of exemplarbased texture synthesis [7]
The original image is divided into two parts: the target region(Ω), its contour δΩ and the source region(Φ). First, the pixels existing at the boundary(δΩ) of the target region needs to be prioritized in the order of restoration. Patches of fixed size having each pixel at the center are defined and restored in the prioritized order. The region most similar to the patch containing the pertinent pixel within the source image is found(Ψ
_{p’}
and Ψ
_{p’’}
). Then, the most similar patch that is found is filled into the patch region(Ψ
_{p}
) and the above process is repeated until the target region is completely filled.
 2.2.1 Importance of restoration priority
Because the exemplar based inpainting algorithm restores high priority patch first, the order of restoration is crucial.
Fig. 5
shows different restoration results based on the order of priority.
Importance of restoration priority; (a) Original image; (b) filling by counterclockwise order and (c) filling by an edgedriven filling order [7]
In
Fig. 5(a)
, the white region in the original image is the target region and the rest is the source region.
Fig. 5(b)
shows the results after filling by counterclockwise order.
Fig. 5(c)
shows the results using the edgedriven filling order. These two figures show that the results differ based on the filling order.
 2.2.2 Calculation of restoration priority
The order of restoration is defined as the multiple of the confidence term and the data term as shown in Eq. (5).
where
p
is the location of the center pixel of the
patch
,
C
(
p
) is the confidence term, and
D
(
p
) is the data term.
P
(
p
) is the final value calculated to determine the order of priority.
Confidence represents how much of the source region exists in a selected patch as in Eq. (6).
where Ψ
_{p}
represents the patch that include
p
pixel at the center and Ψ
_{p}
 represents the patch region.
q
represents the location of a pixel within the patch,
I
is the whole image and Ω represents
the
restoration region. If
q
is in the source region then
C
(
q
) is 1, if q is in the restoration region, then
C
(
q
) is initialized to 0.
C
(
p
) is repeated found for all the pixels within the patch.
The data term represents the value of the inner product between the gradient of the image and the unit vector orthogonal to the boundary at the center of the pixel in the patch that needs to be restored.
Fig. 6
shows how to find the data term by using Eq. (7).
Calculation of data term [7]
where ∇
I_{p}
^{⊥}
represents the direction of the gradient from
p
pixel,
n_{p}
is the unit vector orthogonal to the boundary at point
p
.
α
is the normalization parameter which is 255 at a graylevel image.
Fig. 7
shows an example of the calculation of filling priority. In
Fig. 7
, pixel ‘a’ has a low confidence value because it has low amount of reliable information around the region. However, it has high data term because the gradient and orthogonal directions are similar. The confidence value of pixel ‘b’ is greater than that of pixel ‘a’. Pixel ‘b’ has a smaller data term than pixel ‘a’ because it has the same gradient value but a different vector orthogonal to the boundary than pixel ‘a’. Pixel ‘c’ has the biggest confidence term but has a data term of 0 because the gradient value is 0. Consequently, pixel ‘b’ has the highest filling priority and pixel ‘c’ has the lowest filling priority
[7]
.
Example of priorities calculation [7]
 2.2.3 Finding similar region
Once the patch with the highest filling priority is decided, the most similar patch is found by using the full search algorithm on the source region. Eq. (8) shows the process of finding the patch most similar to that of the patch with the highest filling priority.
where Ψ
_{p}
represents the patch with the highest filling priority, Ψ
_{q}
is the candidate patch
having
pixel
q
at the center and Ψ
_{Q}
represents the selected patch.
SSE
(Ψ
_{p}
,Ψ
_{q}
) is the sum of the squared error of the summation of the difference of the squared values of the corresponding pixels from the two patches. Based on Eq. (8), the patch with the lowest error value is defined as the most similar patch and the pixel information from this patch is copied then renewed as the information of the copied restored region. This process is repeated until the region that needs to be restored is fully filled.
3. Proposed Algorithm
Fig. 8
is the block diagram of the proposed algorithm. First, the color and the corresponding depth images are used to produce color and depth images of the desired virtual view by using the 3D warping algorithm. Then, using the mean value of the initially produced virtual viewpoint image, the boundary noise is removed. Next, using the location of the reference viewpoint image and the virtual viewpoint image, the restoration priority between the pixels adjacent to the boundary of the commonhole’s background is calculated. When the patch with the highest priority is found, the patch most similar to the patch with the highest priority is found by searching the source region of the current frame and the previous frame. Next, the patch found after the searching is used to fill the commonhole and the information is renewed. The above process is repeated until all the commonholes are filled.
Block diagram of the proposed algorithm
 3.1 Boundary noise removal
Boundary noise occurs because the object boundary in the reference view does not match with the corresponding object boundary from the depth image. Boundary noise occurs with a similar afterimage of the object as shown in
Fig. 9
.
Examples of boundary noise
In the proposed algorithm, we first find the background region around the commonhole by considering the location of the reference and the virtual views. Then we find the average of some number of pixels from the background pixel closest to the hole in the horizontal direction. Next, we compare this average to the value of the pixel closest to the hole and consider it as the boundary noise when the absolute difference is bigger than the threshold. If the first pixel is determined to be a boundary noise, then we compare it to the next pixel to determine whether the boundary noise continuously appears or not. Then, the exposed boundary noise is assigned into the commonhole region and removed as common holes.
Fig. 10
shows the result of boundary noise removal image using the proposed algorithm.
Result of boundary noise removal image using the proposed algorithm: (a) Boundary noise image; (b) result of proposed algorithm
Eq. (9) shows the process of finding the boundary noise pixels.
where
b
(
x
,
y
) represents the candidate pixel for
finding
boundary noise at the (
x
,
y
) coordinate,
Avg_{Background}
is the average of pixels from the background,
p
(
x_{k}
,
y_{k}
) is the background pixel closest to the hole in the horizontal direction.
th
is the threshold value for determining boundary noise pixel.
If exemplar based inpainting algorithm is used when this noise is not removed, the quality of the filled hole falls off because the patch used to fill the hole includes the boundary noise as shown in
Fig. 11(b)
.
Fig. 11(c)
shows the image after removing the boundary noise and filling the commonhole through the proposed algorithm. When compared to the image that has been filled without removing the boundary noise, the image that has gone through the proposed algorithm shows better quality
[10]
.
(a) Boundary noise image: (b) hole Filling image with boundary noise, and (c) hole filling after removing the boundary noise with the proposed algorithm
 3.2 Commonhole filling algorithm using modified exemplar based inpainting
As explained earlier, the quality of the image generated may greatly differ by the restoration order when using the exemplar based inpainting algorithm. In existing methods, any pixel in the boundary region that needs to be restored is used as candidate for the priority order of the restoration. However in this case, if the priority order of the pixel adjacent to the object in the boundary is seen as the highest priority, there could be error in finding the most similar patch in the object region which could affect the filling process and consequently decrease the quality of the generated virtual view. Therefore, in the proposed algorithm, only pixels from the boundary region adjacent to the background are used to find the restoration priority.
Fig. 12
shows candidate pixels for the priority order of the restoration using the proposed algorithm.
Candidate pixels for the priority order of the restoration Commonhole image: (b) candidate pixels by existing algorithm, and (c) by the proposed algorithm
If this method is used, we start filling the hole from the region close to background which could prevent the error of the filling the common hole region with the object region.
Fig. 13(b)
and
(c)
show the results using the existing and the proposed algorithm of prioritizing the restoration order, respectively. It can be seen that the proposed algorithm generates an image of higher quality than that of the image generated with the existing algorithm.
(a) Uncovered hole: (b) hole filling by old algorithm and (c) by the proposed algorithm
 3.3 Consideration of temporal consistency
Most of existing algorithms use single image information to fill the hole. Therefore, the filled region may differ in successive frames. To remove this temporal inconsistency, if the hole is found in the same area in successive frames, the filled holes have to be consistent across the frames. In the proposed algorithm, the source region of the previous frame is included in the search range when searching for the most similar patch.
Eq. (10) shows the process of finding the most similar patch.
where Ψ
_{Q}
(
t
1) represents the patch in the same location in synthesized previous frames. If a commonhole is found in the same location in consecutive frames, the identical region in the previous frame is most likely to be the most similar patch thus the region from the previous frame is filled first in order to maintain temporal consistency.
Fig. 14
shows the comparison of the existing algorithm and the proposed algorithm. It shows that the proposed algorithm has better temporal consistency.
Proposed temporal inconsistency compensation algorithm; (a) Old algorithm (0^{th} frame); (b) the proposed algorithm (0^{th} frame); (c) old algorithm (1^{st} frame) and (d) the proposed algorithm (1^{st} frame)
4. Experimental Results
To test the efficiency of the proposed algorithm, multiview sequences provided by MPEG have been used as test sequences. Pixel based inpainting algorithm included in VSRS 3.5
[12]
, spiral weighted average algorithm included in VSRS 3.5 alpha
[11]
and bilateral filtering have been used
[6]
for performance comparison.
 4.1 Extrapolation case
As shown in
Fig. 15
, we could generate a virtual viewpoint image which is positioned at the outer side of the reference views (extrapolation view synthesis case) by the proposed algorithm.
Extrapolation view synthesis case
First, note that all the data for the test sequence in the table from now on are the average values for all the temporal frames of a sequence, as shown in
Table 1
.
Spec. of the test sequences.
Spec. of the test sequences.
Fig. 16
shows the resulting virtual views using the respective algorithms to fill the commonholes.
Fig. 17
is the enlarged result of
Fig. 16
.
Generated virtual view (“Newspaper” 7^{th} view, 0^{th} frame): (a) 3D warping result; (b) pixel based Inpainting algorithm; (c) VSRS 1; (d)VSRS 2, and (e) the proposed algorithm
Detail image of Fig. 16 (“Newspaper” 7^{th} view, 0^{th} frame); (a) 3D warping result; (b) pixel based Inpainting algorithm; (c) VSRS 1; (d) VSRS 2, and (e) the proposed algorithm
As shown in
Fig. 16
and
Fig. 17
, it can be seen that the proposed algorithm reduced the blurring effect and preserved the texture of the image more effectively than the existing algorithms.
Objective quality evaluation has been performed as well. The PSNR value was calculated by comparing the original (N+1)
^{th}
view and the virtual (N+1)
^{th}
view generated by using the reference viewpoint image with the N
^{th}
viewpoint.
As shown in
Table 2
, the results using the proposed algorithm generally show the improvement in the PSNR value when compared to that of the results by using the old algorithms. The reason why the proposed algorithm had better results is because of the fact that the background information has been used as much as possible to fill the holes, which improved the texture of the image.
Performance comparison for extrapolaion
Performance comparison for extrapolaion
 4.2 Interpolation case
Fig. 18
shows an interpolation view synthesis case. We generated a virtual viewpoint image which is positioned at the inner side of the reference views (interpolation view synthesis case) by the proposed algorithm. All the data for the test sequence in the table are also the average values for all the temporal frames of a sequence, as shown in
Table 3
.
Interpolation view synthesis case
Spec. of the test sequences
Spec. of the test sequences
In
Table 4
, we also compared PSNR performance of the proposed algorithm for the interpolation case.
Performance comparison for interpolation
Performance comparison for interpolation
The proposed algorithms performed better than existing algorithms for the interpolation case, too. And
Table 4
shows that the proposed algorithm gave better results in general.
5. Conclusion
In this paper, an algorithm to effectively fill holes that are produced when creating a virtualviewpoint image has been proposed. In the proposed algorithm, after 3D warping, the boundary noise from the generated initial virtual view is removed and then we estimate the relative location of the background to the holes and pixels adjacent to the background are filled in priority to get better result by not using adjacent object's information. Also, the inconsistency in between two consecutive frames can be reduced by expanding the search region up to the previous frame when searching for the most similar patch. The superiority of the proposed algorithm compared to the existed algorithms has been shown through the experiments performed.
Acknowledgements
The present Research has been conducted by the Research Grant of Kwangwoon University in 2014
BIO
Min Soo Ko He received the B.S. and M.S. degrees in electronics engineering from Kwangwoon University, Seoul, Korea in 2010 and 2012, respectively. He is currently a PhD student at Kwangwoon University. His research interests include stereo matching and 3D image processing and image com pression.
Jisang Yoo He was born in Seoul, Korea in 1962. He received the B.S. and M.S. degrees from Seoul national university, Seoul, Korea in 1985 and 1987, all in electronics engineering, and Ph.D. degree from Purdue University, West Lafayette, IN, USA, in electrical engineering in 1993, respectively. From september 1993 to august 1994, he worked as a senior research engineer in industrial electronics R&D center at Hyundai Electronics Industries Co., Ltd, ichon, Korea, in the area of image compression and HDTV. He is currently a professor with the department of electronics engineering, Kwangwoon University, Seoul, Korea. His research interests are in signal and
Mori Y.
,
Fukushima N.
,
Yendoa T.
,
Fujii T.
,
Tanimotoa M.
2009
“View generation with 3D warping using depth information for FTV”
ELSEVIER, Signal Processing: Image Communication
24
(12)
65 
72
DOI : 10.1016/j.image.2008.10.013
Tauber Z.
,
Li ZeNian
,
Drew M.S.
2007
“Review and preview : disocclusion by inpainting for imagebased rendering,”
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
37
(4)
527 
540
DOI : 10.1109/TSMCC.2006.886967
Park S.H.
,
Song H.
,
Jang E.Y.
,
Hur N.
,
Kim J.
,
Kim J. S.
,
Lee S.H.
,
Yoo J.
2008
“Compensation method for occludedregion of arbitraryview image synthesized from multiview video”
The Journal of Korea Information and Communications Society
33
(12)
1029 
1038
Bertalmio M.
,
Sapiro G.
,
Caselles V.
,
Ballester C.
2000
“Image inpainting”
Proc. of 27th Conference Computer Graphics and Interactive Algorithms (ACM SIGGRAPH 2000)
417 
424
2008
ISO/IEC JTC1/SC29/WG11 "Introduction to 3D video”, M9784
Telea A.
2004
“An Image Inpainting Algorithm Based on the Fast Marching Method”
Journal of Graphics Tools
9
(1)
25 
36
Criminisi A.
,
Perez P.
,
Toyama K.
2004
“Region filling and Object Removal by ExemplarBased Image Inpainting”
IEEE Trans. on Image Processing
13
(9)
1200 
1212
DOI : 10.1109/TIP.2004.833105
Kim Y.J.
,
Lee S. H.
,
Park J.I.
2010
“A Highquality Occlusion Filling Method Using Image Inpainting”
Journal of Broadcast Engineering
15
3 
13
DOI : 10.5909/JBE.2010.15.1.003
Daribo I.
,
Saito H.
2011
“A Novel InpaintingBased Layered Depth Video for 3DTV”
IEEE Transactions on Broadcasting
57
(2)
533 
541
DOI : 10.1109/TBC.2011.2125110
Ko M. S.
,
Yoo J.
2014
“Virtual View Generation by a New Hole Filling Algorithm”
The Journal of Electrical Engineering & Technology
9
(3)
1023 
1033
DOI : 10.5370/JEET.2014.9.3.1023
2011
ISO/IEC JTC1/SC29/WG11 “Boundary noise removal and hole filling for VSRS 3.5 alpha”, M19992
2011
ISO / IEC JTC1 / SC29 / WG11 “Implementation of Hole Filling Methods for VSRS 3.5. alpha”, M20005