Advanced
Disparity Refinement near the Object Boundaries for Virtual-View Quality Enhancement
Disparity Refinement near the Object Boundaries for Virtual-View Quality Enhancement
Journal of Electrical Engineering and Technology. 2015. Sep, 10(5): 2189-2196
Copyright © 2015, The Korean Institute of Electrical Engineers
This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/ licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • Received : August 11, 2014
  • Accepted : June 23, 2015
  • Published : September 01, 2015
Download
PDF
e-PUB
PubReader
PPT
Export by style
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Gyu-cheol Lee
Dept. of Electronic Engineering, Kwangwoon University, Korea. (gyucheol0116@gmail.com)
Jisang Yoo
Corresponding Author: Dept. of Electronic Engineering, Kwangwoon University, Korea. (jsyoo@kw.ac.kr)

Abstract
Stereo matching algorithm is usually used to obtain a disparity map from a pair of images. However, the disparity map obtained by using stereo matching contains lots of noise and error regions. In this paper, we propose a virtual-view synthesis algorithm using disparity refinement in order to improve the quality of the synthesized image. First, the error region is detected by examining the consistency of the disparity maps. Then, motion information is acquired by applying optical flow to texture component of the image in order to improve the performance. Then, the occlusion region is found using optical flow on the texture component of the image in order to improve the performance of the optical flow. The refined disparity map is finally used for the synthesis of the virtual view image. The experimental results show that the proposed algorithm improves the quality of the generated virtual-view.
Keywords
1. Introduction
In contrast to the box office success of the 3D movies, 3D broadcasting, which has been labeled as the next generation broadcasting, has not yet been able to find its place in the market. According to the data from Retrevo, a market analysis agency, 55% of the consumers who plan on buying a HDTV feel no need for the 3D function because of the cumbersome task of wearing 3D glasses and the lack of content [1] . Also, the current 3D display method (stereo 3D: S3D) uses only one viewpoint to synthesize the 3D images so the realistic feeling and the vividness of the object lessens when seen from another viewpoint. The alternative to this stereo 3D image is the multi-view display technique that does not require 3D glasses. This alternative offers a more realistic viewing because it provides more viewpoints than the stereo display. Therefore, the viewer can enjoy the 3D image from any perspective [2] .
There are many ways to obtain the multi view image but the simplest way is to install as many cameras as the viewpoints needed in order to obtain the images from each view [3] . However, this method lacks practicality because of the difficulty of calibrating the cameras and the high expenses of the cameras themselves.
Therefore, other methods of generating the multi-view image have been researched. The first alternative uses stereo matching algorithm in order to obtain the depth image from the stereo images [4] . The other alternative involves the use of the depth camera which allows the color and depth image to be obtained simultaneously [5] . Because stereo matching is relatively robust to environment, it is easy to obtain the disparity map. However, it takes a long execution time and lacks in accuracy of depth image. Depth camera enables for the obtainment of depth images with high accuracy but has low resolution and high equipment costs.
In this paper, we propose the method of disparity refinement near the object boundaries for virtual-view quality enhancement. The disparity map is obtained using stereo matching. However, this results in noise near the object boundaries because of discontinuity. In order to improve the quality of the virtual view image, the disparity map refinement is necessary. First, the error region is detected by investigating the consistency between left and right disparity maps [6] . In the occlusion region, the region in the left image does not exist in the right image [7] . Therefore, error is prone to happen because of the difficulty in getting the disparity information.
This phenomenon creates a larger range of error as the resolution of the image becomes higher because the arithmetic operation gets more complicated [8] . The optical flow algorithm is applied to the textural component of the image, which has structural characteristics of the image, in order to extract occlusion region [9] . The texture component has the structural characteristics of the image. It shows good performance because light, noise, and shadow have been almost removed when extracting motion information.
The motion information of all the pixels is extracted by Lucas-Kanade method, which is a dense optical flow algorithm [10] . Using the extracted motion information, the consistency of the left and right image is investigated. Then, for the pixels that do not consistent, they are considered non-existent and defined as occlusion region. The error and occlusion region of the extracted left and right disparity maps are fused to create a new region. The new labeled regions are filled with the appropriate disparity values by using the joint bilateral filter that conserves object boundaries in the reference image [11] . Finally, refined disparity map is used to synthesize the virtual-view image by using bidirectional linear interpolation [12] .
This paper is organized as follows. Section 2 explains the error regions detection of the disparity map. Section 3 describes the extraction of the occlusion region using the optical flow. The performance of the proposed algorithm is given through experiments in section 4. Finally, section 5 contains the conclusion
2. The Error Regions Detection of the Disparity Map
Stereo matching algorithm can only work on the premise that the randomly picked pixel values of the left image exist on the right image. However, depending on the viewpoint of the two cameras, the lighting and the amount of reflected light changes. That allows for the same points on the left and right image to have different pixel values. Additionally, if a certain region has identical pixel values, finding the pixel that corresponds with that region becomes difficult. Therefore, the possibility of extracting incorrect information is high. The same error occurs in the occlusion region which only exists in one image and does not exist in other regions. Thus, in order to improve the quality of the virtual-view image, the disparity map information has to be accurate. This section will explain how to detect the error region by investigating the consistency of the disparity map extracted using the stereo matching.
Fig. 1(c) and Fig. 1(d) shows the disparity map obtained using the stereo matching. There is a possibility of detecting the wrong disparity information because the fluctuation range of the disparity values near the object boundaries is high. As the resolution of the image gets higher, the arithmetic operations get more complicated and the range of error gets wider. The wrong disparity value can be detected by calculating the consistency of the left and right disparity map as shown in Eq. (1) and Eq. (2).
PPT Slide
Lager Image
The disparity map extracted using stereo matching: (a) Left color image; (b) Right color image; (c) Left disparity map; (d) Right disparity map
PPT Slide
Lager Image
PPT Slide
Lager Image
where xl and xr are the coordinates of the left and right image. dl and dr represent the left and right disparity maps respectively. If c ( x )=0, the disparity value of the corresponding coordinate is consistent. If c ( x )≠0, the corresponding coordinate has the wrong disparity value.
Fig. 2 represents the image extracted from the error region of the disparity map. It shows that the error is mostly detected near the object boundaries.
PPT Slide
Lager Image
The error region extraction: (a) The left disparity map; (b) The right disparity map
3. Occlusion Extraction using Optical Flow
Optical flow is the method of tracing the motion within two frames. There is a sparse type which only traces the region with the noticeable properties as the object’s boundary and a dense type which obtains the movement information of every pixel in the image.
In this paper, Lucas-Kanade method, which is a dense optical flow, is used to trace motion. After investigating the consistency of the extracted motion information from the left and right images, the pixels that do not match are regarded as the occlusion region which exists in the current image but does not exist in the other. Also optical flow is applied to the texture component of the image in order to improve the quality of the optical flow. Fig. 3 represents the flow chart of the proposed occlusion extraction.
PPT Slide
Lager Image
Block diagram of the occlusion extraction
- 3.1 Extraction of texture component
Generally, an image can be separated into the structure and texture components. The structure component represents the object’s appearance, color and etc. Therefore, it also contains pixels and shadow that violate the brightness constancy. However, the texture component represents a measure for characteristics such as smoothness, roughness and regularity. Thus, the performance of the optical flow can be improved by using the image with the texture component.
The separation structure component from intensity image is accomplished using the method of Rudin, Osher and Fatemi [22] that removes noise exploiting total variation. For the intensity image I ( x ), structure-texture separation is done by Eq. (3) and Eq. (4).
PPT Slide
Lager Image
PPT Slide
Lager Image
where I ( x ) is the intensity image, Is ( x ) is the structure component and θ is a constant. ∇ Is represents the gradient variation of the structure component. The component that minimizes Eq. (3) is the solution of the structure component Is ( x ). IT ( x ) is the texture component and is calculated by finding the difference between the intensity image and its structure component as the expression of Eq. (4). Fig. 4 shows the image after it has been separated into the structure and texture components. In the texture image, it can be seen that neither the image’s characteristic nor its shadow component can be found.
PPT Slide
Lager Image
Extraction of texture component
- 3.2 Occlusion detection
We use the Lucas-Kanade’s optical flow method [10] which can estimate motion information between two images in order to determine the occlusion region. After investigating consistency of the motion information between left and right images, inconsistent pixels are defined as the occlusion region.
Optical flow assumes brightness constancy, but in the actual image, the brightness value of left and right image changes because of the camera sensor noise, the object’s respective reflectance, and the shadow. For these reasons, the performance of the optical flow is not good. To determine more accurate motion information, we apply the optical flow to the textural part of the image. As a result, the performance of the optical flow could be improved.
Fig. 5 compares the extraction results of the occlusion region depending on whether the texture component is applied or not. The occlusion region is usually located in one side of object. In Fig. 5(a) and 5(b) , however, the detected occlusion region that did not use texture separation process exists anywhere indiscriminately. Fig. 5(c) and 5(d) represents the result based on texture separation. It is superior to that of the result that did not use texture separation process.
PPT Slide
Lager Image
Result depending on the use of texture image left image, texture nonuse; (b) right image, texture nonuse; (c) left image, texture use (d) right image, texture use
The addition of the occlusion region found using the above method and the detected error region of the disparity map is defined as the new error region.
- 3.3 Disparity map refinement
In this paper, the error regions are rectified by using a joint bilateral filter which fills the holes while preserving the boundaries of the reference image. The joint bilateral filter is defined as in Eq. (5) and Eq. (6).
PPT Slide
Lager Image
PPT Slide
Lager Image
where D is the depth image and I is the intensity image and D p represents the pixel value generated by applying a joint bilateral filter to D and I . G is the Gaussian function and || p - q || is the Euclidean distance between p and q . s is a set of neighboring pixels of p . σs and σr are parameters defining the size of neighborhood and Wp is the normalization constant. Fig. 6(a) is the disparity map obtained by using the stereo matching. Because of the disparity value error from the occlusion area and the boundary region, the image has blurring phenomenon or unclear shape. Fig. 6(b) is the rectified disparity map using the proposed method. This image shows that noise and the error region near the object boundaries are rectified
PPT Slide
Lager Image
Disparity map refinement: (a) before processing and (b) after processing
4. Experimental Results
To evaluate the performance of the proposed algorithm, we used “samgye” and “gyebeck”(MBC Drama) sequences with a size of 1920x1080 as test sequences. In order to detect the occlusion region, we used Lucas-Kanade method, which is a dense optical flow algorithm, and window size is 5x5. A virtual-view image is simply synthesized by applying the bidirectional linear interpolation. The θ value when separating the texture component was set to 0.125 based on the experimental results.
Fig. 7 shows the 1st, 3rd, 5th, 7th, 9th, and 11th view images out of the eleven virtual- views synthesized by using the bidirectional linear interpolation with “Samgye” as the test sequence.
PPT Slide
Lager Image
Virtual view-point image synthesized by using bidirectional linear interpolation: (a) (b) (c) (d) (e) (f) Results of 1st, 3rd, 5th, 7th, 9th, 11th view-points
In Fig. 8 , the generated virtual views by four different algorithms are shown. We compared the performance of the proposed algorithm with before processing [12] , error region + JBF algorithm [6] and occlusion region + JBF algorithm.
PPT Slide
Lager Image
Performance comparison(image quality) : (a) before processing; (b) error region + JBF; (c) occlusion region + JBF; (d) the proposed method
As shown in Fig. 8(a) , 8(b) and 8(c) , especially near the cockscomb area, the quality of the virtual-view images synthesized by using the other algorithms are poor. In Fig. 8(d) , it shows that the cockscomb looks more natural after the proposed algorithm has been applied. It can also show that the quality of the virtual view image improves when using the proposed algorithm by comparing Fig. 9(d) and others.
PPT Slide
Lager Image
Performance comparison(image quality): (a) before processing; (b) error region + JBF; (c) occlusion region + JBF (d) the proposed method
Fig. 10 shows the 1st, 3rd, 5th, 7th, 9th, and 11th viewpoint images out of the eleven synthesized virtual views synthesized by using bidirectional linear interpolation with MBC drama “Gyebeck” as the test sequence.
PPT Slide
Lager Image
Virtual view-point image synthesized by using bidirectional linear interpolation. (a) (b) (c) (d) (e) (f) Results of 1st, 3rd, 5th, 7th, 9th, 11th view-points
Fig. 11 shows the comparison of results of specific regions in order to test the performance of the proposed algorithm. When the existing disparity map is used without refinement, it is shown from the area near the people’s ears in Fig. 11(a) that distortion exists because of the error from the disparity map. When the proposed algorithm is used, it can be shown from Fig. 11(d) and 12(d) that the quality of the virtual-view image improves.
PPT Slide
Lager Image
Performance comparison(image quality): (a) before processing; (b) error region + JBF; (c) occlusion region + JBF (d) the proposed method
PPT Slide
Lager Image
Performance comparison(image quality): (a) before processing; (b) error region + JBF; (c) occlusion region + JBF; (d) the proposed method
Table 1 shows the average PSNR of each method on Middlebury sequences (Tsukuba, Venus, Teddy and Cones) [23] . The PNSR of the proposed method is greater than other methods. It is shown that the disparity map is greatly refined by the proposed algorithm.
PSNR of each method on Middlebury sequence
PPT Slide
Lager Image
PSNR of each method on Middlebury sequence
5. Conclusion
In this paper, we proposed a virtual-view synthesis algorithm using disparity refinement in order to improve the quality of the synthesized image. The disparity map obtained by using the stereo matching contains lots of noise and error regions. Those regions usually exist near the object boundaries and cause the quality of image to deteriorate when the virtual-view image is generated.
In the proposed algorithm, the error region is detected by investigating the consistency between the left and right disparity maps. Also, the texture component of the image that represents the structural characteristics is separated from image. Then, the optical flow algorithm is applied to the obtained texture component in order to extract motion information with high accuracy. After investigating consistency of the motion information between left and right images, inconsistent pixels are defined as the occlusion region. The error region is combined with the occlusion region to define a new region. Then, the joint bilateral filter is applied to this new region in order to acquire the appropriate disparity value. Finally, the virtual-view image is generated by applying bidirectional linear interpolation to the refined disparity map. Experimental results show that the quality of the virtual-view images using the proposed algorithm enhanced.
Acknowledgements
This work was supported by the ICT R&D program of MSIP/IITP. [B0101-15-1360, Loudness Based Broadcasting Loudness and stress Assessment of Indoor Environment Noises]
BIO
Gyu-cheol Lee He received the B.S. and M.S. degrees in electronics engineering from Kwangwoon University, Seoul, Korea in 2013 and 2015, respectively. He is currently a PhD student at Kwangwoon University. His research interests include stereo matching and 3D image processing and object recognition.
Jisang Yoo He was born in Seoul, Korea in 1962. He received the B.S. and M.S. degrees from Seoul national university, Seoul, Korea in 1985 and 1987, all in electronics engineering, and Ph.D. degree from Purdue University, West Lafayette, IN, USA, in electrical engineering in 1993, respectively. From september 1993 to august 1994, he worked as a senior research engineer in industrial electronics R&D center at Hyundai Electronics Industries Co., Ltd, ichon, Korea, in the area of image compression and HDTV. He is currently a professor with the department of electronics engineering, Kwangwoon University, Seoul, Korea. His research interests are in signal and image
References
Retrevo Corporation, Could low interest in 3DTV hurt the TV business? from
Um G. M. , Cheong G. H. , Cheong W. S. , Hur N. H. 2011 “Technical development and standardization trends of multi-view 3D and free-viewpoint video,” 18 - 23
Yasutaka F. , Ponce J. 2009 “Accurate camera calibration from multi-view stereo and bundle adjustment,” International Journal of Computer Vision 84 257 - 268    DOI : 10.1007/s11263-009-0232-2
Kim T.J. , Yoo J. S. 2009 “Hierarchical stereo matching with color information,” The Journal of Korea Institute of Communications and Information Sciences 34 (3) 279 - 287
Lim J. M. , Um G. M. , Shin H. C. , Lee G. S. , Hur N. H. , Yoo J. S. 2013 “Multi-view image generation using grid-mesh based image domain warping and occlusion region information,” The Journal of Korean Society of Broadcast Engineers 18 (6) 859 - 871
Cho S. Y. , Sun I. S. , Ha J. M. , Jeong H. 2012 “Occlusion detection and filling in disparity map for multiple view synthesis,” 8th International Conference on Computing and Networking Technology (ICCNT) 425 - 432
Zitnick C. , Kanade T. 2000 “A cooperative algorithm for stereo matching and occlusion detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (7) 675 - 684    DOI : 10.1109/34.865184
Kang Y. S. , Ho S. Y. 2012 “Generation of high-resolution disparity map using multiple cameras and low-resolution depth camera,” The Conference of Korea Institute of Communications and Information Sciences 287 - 288
Aujol J. , Gilboa G. , Chan T. , Osher S. 2006 “Structure-texture image decomposition - modeling, algorithms, and parameter selection,” International Journal of Computer Vision 67 (1) 111 - 136    DOI : 10.1007/s11263-006-4331-z
Lucas B.D. , Kanade T. 1981 “An iterative image registration technique with an application to stereo vision (IJCAI),” 7th International Joint Conference on Artificial Intelligence (IJCAI) 674 - 679
Zhao L. , Wang H. 2009 “Image denoising using trivariate shrinkage filter in the wavelet domain and joint bilateral filter in the spatial Domain,” IEEE Trans. Image Process. 18 (10) 2364 - 2369    DOI : 10.1109/TIP.2009.2026685
Park C. J. , Ko J. H. , Kim E. S. 2004 “A new intermediate view reconstruction scheme based-on stereo image rectification algorithm,” The Journal of Korea Institute of Communications and Information Sciences 29 (5C) 632 - 641
Lee G. C. , Seo Y. H. , Yoo J. S. 2012 “GPGPU-based multiview synthesis using kinect depth image,” The Conference of Korea Institute of Communications and Information Sciences Yongpyong, Korea
Oh K. H. , Lim S. Y. , Hahn H. I. 2011 “Estimating the regularizing parameters for belief propagation based stereo matching algorithm,” The Journal of the Institute of Electronics Engineers of Korea 47 (1) 112 - 119
Wedel A. , Pock T. , Zach C. , Bischof H. , Cremers D. 2008 “An improved algorithm for TV-L1 optical flow,” In Statistical and Geometrical Approaches to Visual Motion Analysis 5604 23 - 45
Ko M. S. , Yoo J. S. 2012 “Boundary noises removal and hole filling algorithm for virtual viewpoint image generation,” J. KICS 37 (8) 679 - 688
Sun W. , Au O. , Xu L. , Li Y. , Hu W. 2012 “Novel temporal domain hole filling based on bakground modeling for view synthesis,” IEEE International Conference on Image Processing 2012 Florida, USA
Oh K. J. , Yea S. , Ho Y.S. 2009 “Hole filling method using depth based inpainting for view synthesis in free viewpoint television and 3-d video,” Proc. of the 27th conference on Picture Coding Symposium (PCS’09) 233 - 236
Park J. H. , Song C. G. 2012 “Effective shadow removal from aerial image of golf course to extract components,” The Journal of Korean Institute of Information Scientists and Engineers 39 (7) 577 - 582
Lee G. C. , Yoo J. S. 2013 “Real-time virtual-view image synthesis algorithm using Kinect camera,” The Journal of Korea Institute of Communications and Information Sciences 38 (5) 409 - 419
Yang N. E. , Kim Y. G. , Hong R. H. 2012 “Depth hole filling using the depth distribution of neighboring regions of depth holes in the Kinect sensor,” 2012 IEEE International Conference on Signal Processing, Communication and Computing (ICSPCC)
Rudin L. I. , Osher S. , Fatemi E. 1992 “Nonlinear total variation based noise removal algorithm,” physica D: Nonlinear Phenomena 60 259 - 268    DOI : 10.1016/0167-2789(92)90242-F
“”