Advanced
Local stereo matching using combined matching cost and adaptive cost aggregation
Local stereo matching using combined matching cost and adaptive cost aggregation
KSII Transactions on Internet and Information Systems (TIIS). 2015. Jan, 9(1): 224-241
Copyright © 2015, Korean Society For Internet Information
  • Received : August 31, 2014
  • Accepted : December 13, 2014
  • Published : January 31, 2015
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Shiping Zhu
Department of Measurement Control and Information Technology, School of Instrumentation Science and Optoelectronics Engineering, Beihang University, Beijing 100191, China
Zheng Li
Department of Measurement Control and Information Technology, School of Instrumentation Science and Optoelectronics Engineering, Beihang University, Beijing 100191, China

Abstract
Multiview plus depth (MVD) videos are widely used in free-viewpoint TV systems. The best-known technique to determine depth information is based on stereo vision. In this paper, we propose a novel local stereo matching algorithm which is radiometric invariant. The key idea is to use a combined matching cost of intensity and gradient based similarity measure. In addition, we realize an adaptive cost aggregation scheme by constructing an adaptive support window for each pixel, which can solve the boundary and low texture problems. In the disparity refinement process, we propose a four-step post-processing technique to handle outliers and occlusions. Moreover, we conduct stereo reconstruction tests to verify the performance of the algorithm more intuitively. Experimental results show that the proposed method is effective and robust against local radiometric distortion. It has an average error of 5.93% on the Middlebury benchmark and is compatible to the state-of-art local methods.
Keywords
1. Introduction
I n recent years, three-dimensional TV (3DTV) and free-viewpoint TV (FTV) are promising technologies for the next generation of home and entertainment services. The key point in 3DTV and FTV is calculating depth information of the scenes or objects. Binocular stereovision is a popular technique for building a three dimensional description of a scene observed from two slightly different viewpoints. By finding correspondent pixels in the reference and target images, depth information can be gained through disparity. This process is called stereo matching. Stereo matching is a classical and challenging problem in computer vision, which has been a hot research focus for a long time. In the last decade, researchers had put forward a large number of algorithms to solve this problem, but because of the ill-posedness of such a problem, there is not a perfect solution yet. Most stereo matching algorithms focus on establishing an energy function and minimizing such an energy function to estimate disparities. So, stereo matching is essentially a problem of finding an optimized solution. The equation is conducted by establishing reasonable energy functions, adding some constraints and adopting an optimization algorithm, which is also the method for solving all ill-posed problems. A thorough survey and taxonomy of dense stereo techniques was provided by Scharstein and Szeliski [1] . They summarized the stereo matching process into four steps: matching cost computation, cost aggregation, disparity computation and disparity refinement. They also divided stereo matching algorithms into local methods and global methods respectively according to the way of cost aggregation. Global methods can generally acquire a higher accuracy, but with less efficiency. On the contrary, local methods are fast and easy to realize, while it is difficult to choose a proper matching cost function [2] and construct right support windows.
Matching cost is the similarity measure of corresponding points between the left and right images. Most stereo matching algorithms use intensity based similarity measures. For instance, the sum of absolute difference (SAD), sum of square difference (SSD) [1] , Adapt Weight [3] and Segment Support [4] etc. are all in this category. For ideal images, they can produce results with high precision, but these methods are very sensitive to the image radiometric distortion. When the illumination condition and exposure time change, the accuracy will fall down quickly. Thus it is impossible to apply these methods to real images. Fortunately, there are some kinds of matching costs which are robust to radiometric distortion. The normalized cross-correlation (NCC), Gradient [5] [6] [7] , Rank and Census transform [8] [9] are the most commonly used ones.
Local stereo methods need to aggregate single pixels’ matching costs in a support region which is defined by a window. Inevitably, they will run into problems when deciding the window size to be used. Small windows do not contain enough information and can lead to noisy results, while large windows contain enough texture information but encompass pixels at different depths near depth discontinuities, resulting the foreground fattening effect. Fusiello and Roberto [10] proposed to select a best window among multiple predefined windows as the support window; Veksler [11] presented a variable window choosing method by exploring a useful range of interesting window shapes and sizes; Zhang [12] constructed a cross-based adaptive window for every pixel according to the color correlation of adjacent pixels and achieved good results. Qu [13] developed a binary support window by calculating the mean intensity in a predefined fixed window, but this binary support window may have a disconnected structure and would degrade the accuracy.
Global stereo methods consider stereo matching as a labeling problem where the pixels of the reference image are nodes and the estimated disparities are labels. They typically skip the cost aggregation step and define a global energy function that includes a data term and a smoothness term. The former sums pixel-wise matching costs, while the latter supports piece-wise smooth disparity selection. The labeling problem is solved by energy function minimization, using dynamic programming, graph cuts, or belief propagation. Some newest global stereo matching algorithms can be found in [14] [15] [16] [17] .
To address the above matching cost computation and window size selection problems, this paper proposes a stereo matching algorithm based on an improved gradient cost and adaptive cost aggregation. Our main contributions are twofold: First, we improve the gradient matching cost by incorporating the phase information and proposed a hybrid cost function which combines gradient and color matching cost. Second, we develop a four-step disparity refinement method to eliminate mismatches.
The remaining portions of this paper are organized as follows: We first propose our method and describe the algorithm thoroughly in section 2. Section 3 presents the experimental results and we finally conclude our work in section 4.
2. Proposed Method
According to Scharstein and Szeliski’s taxonomy, stereo matching process can be concluded into the following four steps: matching cost computation, cost aggregation, disparity computation and disparity refinement. We will follow this classification to describe our algorithm in detail. The outline of the proposed algorithm is shown in Fig. 1 . Given two rectified images, we first calculate the corresponding gradient images, which is the prerequisite for computing matching cost. Then an adaptive window is constructed for every pixel to meet the need of cost aggregation. After this, by using the Winner-Takes-All strategy, the initial disparity maps are gained. At last, the final depth images are produced after disparity refinement.
PPT Slide
Lager Image
Outline of the proposed algorithm.
- 2.1 Matching cost computation
Matching cost is the similarity measure of corresponding points between the left and right images. Using different cost functions will get different disparity discriminations. As we discussed before, gray or color intensity-based matching costs are very sensitive to radiometric distortion and noise, while gradient-based matching costs are more robust to these factors and have been widely used.
The gradient of an image corresponds to the direction along which the gray value of the image changes most remarkably. In other words, the change of image intensity can be described by image gradient. Mathematically, image gradient is defined as the first-order partial derivatives of image intensity with respect to x and y , which are represented as a vector:
PPT Slide
Lager Image
where I ( x , y ) is the image intensity of an anchor pixel ( x , y ). In practical applications, G can be calculated by convolving the image with gradient masks. Here we just use the simplest gradient mask:
PPT Slide
Lager Image
Thus, we can get the gradient images of both left and right images: GL = ( GLx , GLy ) T , GR = ( GRx , GRy ) T . For rectified images, supposing p = ( x , y ) is a pixel in the left image, then pd = ( x - d , y ) is the corresponding pixel in the right image with disparity d . Hence, the gradient matching cost function CG can be defined as:
PPT Slide
Lager Image
The above cost function only considers the modulus information of the gradient vector. Here, we develop an improved cost function which incorporates the gradient phase, similar to [6] . Using the gradient vector’s two components Gx and Gy , the modulus and the phase are computed as:
PPT Slide
Lager Image
PPT Slide
Lager Image
Generally, the modulus m represents the rate of change and the phase φ represents its direction. To show them intuitively, Fig. 2 gives an example of the computed m and φ for Tsukuba image. We can see that gradient values can reflect the image edges or skeleton to some extent as well as the differences between m and φ .
PPT Slide
Lager Image
(a) Modulus of gradient; (b) Phase of gradient.
As m and φ provide different information about the neighborhood of a pixel, they have different invariance properties with respect to radiometric distortion. For instance, neither the modulus nor the phase is affected by additive (offset) changes in the input images, while multiple variations (gain) affect the modulus but not the phase. So, it is more proper to consider them separately. Our method is based on this idea. To make full use of the gradient information, we combine the modulus and phase linearly with a weight parameter α, forming our new cost function:
PPT Slide
Lager Image
where, mc and φc are the modulus and phase of the gradient operator applied to each color band c∈{R, G, B} respectively; α is the weight of modulus with a range of [0, 1]. Considering the π-periodicity property of the phase, we employ f to normalize it into single period:
PPT Slide
Lager Image
Because we have used a weight parameter α, it is easy to adjust the algorithm’s performance by changing the value of α. This is important as different lighting and exposure time can lead to different degrees of radiometric distortion and noise. From (6), we can see that the larger α is, the bigger effect the modulus will have. On the contrary, the phase will dominate if α is small. According to the radiometric distortion degree, the proper value of α can be set empirically.
As color intensities of an image directly reflect the brightness of pixels, using the gradient similarity alone may lose lots of details of the scene. Thus, we propose a combination of the color based SAD cost and the improved gradient cost, which is simple but very effective as it can yield more reliable similarity measure by compensating one another. The color based SAD matching cost can be represented as:
PPT Slide
Lager Image
Then we use a robust function to normalize the costs into [0, 1]:
PPT Slide
Lager Image
where λ is a controlling parameter. The final integrated matching cost of pixel p corresponding to disparity d is defined as:
PPT Slide
Lager Image
In this way, both G ( p , d ) and C ( p , d ) are in the range of [0, 1] and their contributions to the final cost can be adjusted by setting different values of λ c and λ G . The proper values of these parameters can be got empirically.
- 2.2 Adaptive window construction
As the identification ability of single pixel’s matching cost is weak, we need to propagate the adjacent pixels’ matching costs and aggregate them to improve accuracy. The neighborhood region is determined by a local support window and the pixels in the window will be included for aggregation. So, it is natural to ask how large the window should be. In fact, a fixed window can never get satisfactory results, because image regions with different characters need different windows. In textureless regions, larger windows are needed to provide enough pixels. On the contrary, regions with high texture and depth discontinuities need smaller windows to avoid being over-smoothed. To address this problem, Zhang proposed a cross-based adaptive window construction method which can alter the window’s shape and size adaptively. Such a cross-based support region is achieved by expanding a cross-shaped skeleton around each pixel p to create four segments
PPT Slide
Lager Image
, defining two sets of pixels H( p ), V( p ) in the horizontal and vertical directions. More details about the method can be found in [12] . In their original implementation, only one threshold for color similarity and one threshold for spatial closeness are used, which cannot satisfy all cases. Motivated by [18] , we present a modification of the original cross-based support region approach in this paper.
The key idea of the cross-based support region is to decide an upright cross for every pixel p in the input image, which is based on the color similarity and spatial closeness. As is shown in Fig. 3 , the pixel-wise adaptive cross consists of two orthogonal line segments, intersecting at the anchor pixel p . We use H( p ) and V( p ) to represent the horizontal and vertical segments respectively. Thus, four arms: left, right, up and down are constructed for each pixel and represented as
PPT Slide
Lager Image
. By changing the length of the arms adaptively, we can effectively capture an adaptive support region for each pixel. Here, we use enhanced rules to decide each pixel’s arm length. Just taking p ’s left arm as an example, it stops when it finds an endpoint pixel pi that violates one of the three following rules:
PPT Slide
Lager Image
Construction process of the adaptive window.
1. Dc ( pi , p ) ˂ τ 1 and Dc ( pi , pi +(1, 0)) ˂ τ 1 ;
2. Ds ( pi , p ) ˂ L 1 ;
3. Dc ( pi , p ) ˂ τ 2 , if L 2 ˂ Ds ( pi , p ) ˂ L 1 .
Where, Ds ( pi , p ) is the spatial distance between pi and p ; Ds ( pi , p ) represents the color difference, which is defined as
PPT Slide
Lager Image
are the predefined color thresholds and spatial thresholds. Rule 1 restricts the color difference between pi and p as well as pi and its predecessor pi +(1, 0) on the same arm. This prevents the arm to span over the edges in the image. Rule 2 and 3 provide multiple choices for the arm length. In textureless regions, we use larger threshold L 1 and τ 1 to guarantee enough pixels. But when the arm length exceeds a smaller value L 2 , Rule 3 will play its role by using a much stringent threshold τ 2 to make sure that the arm will extend only in regions with very similar colors.
After the above process, we can get the end pixels of the four arms:
PPT Slide
Lager Image
, then H( p ) and V( p ) can be got by:
PPT Slide
Lager Image
Finally, by iteratively applying this approach for every pixel q along V( p ), we can get the local support window U( p ):
PPT Slide
Lager Image
Fig. 4 shows an example of the adaptive local support windows, which approximates local image structures appropriately.
PPT Slide
Lager Image
Example of the adaptive local support windows
- 2.3 Cost aggregation
Traditional local algorithms only take the reference image’s support region into account. In contrast, we will symmetrically consider support regions of both target and reference images. Considering two corresponding pixels p =( x , y ) and pd =( x - d , y ) in the reference and target images, then we can acquire two local support regions U( p ) and U′( pd ). We will combine them to define the union support region:
PPT Slide
Lager Image
After the support region being prepared, the aggregation matching cost of p is computed as follows:
PPT Slide
Lager Image
where N is the number of total pixels in the support region U d ( p ), and e ( q , d ) is the raw per pixel’s matching cost corresponding to disparity d . At last, we employ the Winner-Takes-All (WTA) strategy to select the best disparity with the lowest matching cost in the disparity range:
PPT Slide
Lager Image
where d ∈ [0, d max ] represents the disparity range, d 0 ( p ) is chosen as the initial disparity of p .
- 2.4 Disparity refinement
The disparity maps obtained after the previous three processes still contain some mismatches and unreliable values. For further refinement, post-processing steps are required. Our post-processing consists of four steps:
First, we apply a 5×5 median filter to both dL and dR which represent the left and right disparity maps respectively for removing isolated outliers.
Second, we implement the common reliable tool: left-right consistency check. A pixel p is characterized as valid if the constraint: dL ( p ) = dR ( p - dL ( p ), 0) holds true. Otherwise, p will be marked invalid and needs to be handled if the constraint is violated. Furthermore, the invalid disparities can be classified into two classes: occlusions and mismatches. We employ Hirschmüller’s approach to decide an invalid point is either occlusion or mismatch [19] .
Third, we present a disparity refinement method based on the local disparity histogram to recover the invalid disparities. For a pixel p in the disparity image, we build a local disparity histogram φp ( d ) in the neighborhood region of p , and count the times that every disparity occurs. Thereby, there will be dmax +1 bins corresponding to each disparity. Here, we do not need to seek for a new neighborhood region, but to reuse the previous local support region U( p ) for pixel p . Thus, this process will not add much computation cost. Let H ( i ) be the length of the i th bin, i = 0 to dmax . We calculate d * as a disparity with the maximum normalized histogram:
PPT Slide
Lager Image
PPT Slide
Lager Image
In statistic, this disparity value is the local optimal one, and h ( d *) represents its confidence level. The initial disparity d 0 ( p ) of pixel p is replaced by the new value d * if h ( d *) is greater than τh ; otherwise, it is left unchanged:
PPT Slide
Lager Image
where τh ∈ [0.1] is a confidence threshold. This step is repeated iteratively until there are no more updates to disparities in the map.
At last, as the invalid disparities may remain unchanged in step 3, there are still some invalid points need to be filled. We then introduce an interpolation strategy which treats occlusion and mismatch points differently. Interpolation is performed by propagating valid disparities to neighboring invalid disparities areas. For invalid pixel p , we find the nearest valid pixels along 8 directions and their disparities dpi are stored. The final disparity of p is created by:
PPT Slide
Lager Image
If p is occluded, we select the second lowest value ( seclow dpi ) to get rid of the preference to foreground or background. If p is mismatched, the median (med dpi ) is used which can maintain discontinuities in cases where the mismatched area is located at the boundary. Experiments show it can get better results.
3. Experimental Results and Discussions
- 3.1 Accuracy of the proposed algorithm
This section presents experimental results as we have programmed and implemented the algorithm in C++. To verify the performance of the proposed method, our experiments are based on the rectified stereo images from the Middlebury stereo benchmark [20] . It offers 4 pairs of stereo images: Tsukuba, Venus, Teddy and Cones, with the sizes of 384×288, 434×383, 450×375 and 450×375 respectively. The disparity ranges of them are also given, which are: 0-15, 0-19, 0-59 and 0-59 pixels correspondingly. By comparing the results with the ground truth disparity images, we can get the quantified errors and make objective evaluation. The parameters in the algorithm are set as in Table 1 , which are kept constant if no special declaring.
Parameter settings for all experiments
PPT Slide
Lager Image
Parameter settings for all experiments
Fig. 5 shows the experimental results of our method on all four stereo pairs of the Middlebury stereo database. The left most column contains the left original images of the four stereo pairs. The ground truth disparity images are shown in the second column, our estimated disparity images are displayed in the third column, and the forth column gives the error maps computed with the ground truth. In the error maps, the white regions denote correctly calculated disparity values which do not differ for more than 1 pixel from the ground truth. Instead, if the estimated disparity differs for more than 1 pixel from the ground truth value, it is marked as an error and displayed in black and gray, where black represents the errors in the non-occluded regions, and gray represents errors in the occluded regions. Table 2 lists the objective evaluation of ours and other methods with the error threshold: δ d = 1 pixel, which means bad pixels are those whose absolute disparity errors are above 1 pixel. Columns Nonocc, All and Disc represent the percentage of bad pixels for pixels in non-occluded regions, for all pixels and for pixels in regions near depth discontinuities.
PPT Slide
Lager Image
Experimental results on Middlebury datasets. From left to right in each row are the original left images, the ground truth disparity maps, the produced disparity maps by our algorithm and the error maps respectively.
Objective evaluation of matching results.
PPT Slide
Lager Image
Objective evaluation of matching results.
From overall performance, the proposed method achieves satisfactory results. Our algorithm correctly estimates the disparities of both textureless and textured surfaces. For instance, the large uniform surfaces in stereo pairs Venus and Teddy are successfully recovered while preserving the disparity edges well. For quantified comparison, the proposed method outperforms many classical global and local methods, like Enhanced BP [21] , GC+occ [22] , SemiGlob [18] , AdaptWeight [3] and so on. Although the NonLocalFilter [2] and P-linearS [23] methods have lower average error than ours, but these methods have not consider image amplitude distortion and are sensitive to radiometric difference as they are intensity-based algorithms. In the next subsection, we will demonstrate our method’s robustness to image radiometric distortion thoroughly.
To clarify the function of our improved gradient matching cost, we conduct a quantitative comparison test of the proposed method with the traditional method of only using modulus information. In addition, to eliminate interferences and show the effect of our four-step disparity refinement method, we use the results without disparity refinement. For simplicity, we only present the errors of the estimated disparities of non-occluded regions in Table 3 . It is clear to see that our proposed matching cost imrpove the resulsts a lot. Also, compared with the results after disparity refinement in Table 2 , the effectiveness of our refinement method is obvious too as the error percentages of disparity maps without refinement are much higher in non-occluded areas.
Comparison of the proposed matching cost with traditional gradient cost
PPT Slide
Lager Image
Comparison of the proposed matching cost with traditional gradient cost
- 3.2 Sensitivity to radiometric distortion
To test stereo algorithms’ sensitivity to radiometric differences, Hirschmüller and Scharstein [20] created 6 datasets: Art, Books, Dolls, Moebius, Laundry and Reindeer, which are shown in Fig. 6 as well as their ground truth disparity maps. We also present the disparity maps produced by the proposed method. Each dataset is taken using three different exposures and under three different configurations of the light sources. Thus, there will be 9 different image combinations that exhibit significant radiometric differences. To demonstrate the performance under radiometric distortion of the proposed method, we keep the right image unchanged and alter the exposure and lighting conditions of the left image. Thus we can consider the two factors separately. We show the experimental results of “Reindeer” as an example in Fig. 7 . Obviously, the qualities of the produced disparity maps are very stable throughout the experiments, which can show the strong robustness of the proposed method.
PPT Slide
Lager Image
More experimental results without radiometric difference. From top to down are accordingly the Art, Books, Dolls, Moebius, Laundry, and Reindeer stereo datasets. From left to right are the original color images, ground truth and disparity maps produced by the proposed method.
PPT Slide
Lager Image
Experimental results of the Reindeer pairs by the proposed method with radiometric difference. The first row are the left images under three different exposures and the second row are the cooresponding disparity maps. The third row are the left images under three different light conditions with the cooresponding disparity maps shown in the last row.
As the sensitivity to radiometric distortion is mainly affected by the similarity measure or matching cost, we test three different matching costs including our proposed one. To highlight our proposed matching cost, all of the three compared methods use the adaptive window based cost aggregation to exclude the influence of aggregation ways. The resulting curves are shown in Fig. 8 . The experiments cover all 3×3 combinations of exposure and light changes which are represented as 1/1 to 3/3. The error rates are the average of all 6 datasets. Seeing from the plots, in every exposure and lighting configuration, the proposed method has the best performance while the SAD method is the worst one. All of the 3 methods have better performance when the two images are under the same exposure and lighting configurations than when they are under different exposure and lighting configurations. The SAD method is very sensitive to radiometric distortion as its error percentage rise dramatically when left/right images are under different configurations. The gradient method is much better but still not satisfactory. The proposed method is very robust to radiometric distortion as its error rates keep in a low level and vary little throughout when exposure and lighting condition differs. This is because SAD is an intensity based similarity measure and depends on pixels’ color or gray intensities which are hypersensitive to radiometric difference. Instead, the proposed method utilizes the gradient information and designs a new matching cost function by integrating the gradient modulus and phase. Hence, our method is not sensitive to color variance and keeps strong robustness to radiometric distortion.
PPT Slide
Lager Image
Performance comparison under 3×3 left/right image combinations that differ in exposure and lighting conditions. (a). Different lightings; (b). Different exposures.
- 3.3 Stereo scene reconstruction
There are many applications for stereo matching, and three dimension (3D) scene reconstruction is an important one. By re-projecting an image pixel to the 3D space using its depth information, we can reconstruct a complete 3D object model from the 2D images. The quality of scene reconstruction is influenced by the accuracy of acquired depth map to a large extent. To illustrate the quality of the derived matching results, we present reconstructed views of the previous test images in Fig. 9 in order to gain a further impression of the accuracy and details of the computed depth information. The reconstructing results show that our estimated depth maps are competent to 3D reconstruction tasks.
PPT Slide
Lager Image
3D scene reconstruction results by using the produced disparity images.
5. Conclusion
This paper presents a novel stereo matching method based on a combined cost function and adaptive window cost aggregation. The improved cost function integrates both modulus and phase components of the gradient vector and then combines them with SAD cost, leading to a superior accuracy. In order to address the window size selecting problem, we introduce an adaptive window solution. The algorithm constructs an adaptive support region for every pixel according to the local color similarity and spatial closeness. Thus, every pixel can get a proper support region for aggregation. In addition, this support region can be reused in the later disparity refinement step. We explore a four-step refinement process, including median filter, left-right consistence checking, invalid pixels recovering and holes filling. We evaluate our algorithm on the stereo pairs from the Middlebury database. The proposed algorithm matches textureless as well as textured surfaces equally well and can preserve depth discontinuities at the same time. The experimental result comparisons have demonstrated that the proposed method outperforms many local and global methods. Furthermore, the proposed algorithm handles well with radiometric differences, showing strong robustness to radiometric distortion of input images.
Though the proposed method achieves good performance, there are still some aspects to be improved, such as redundancy among the disparity search range, more sophisticated disparity refinement process and parallel implementation for the proposed method will be considered in the next step research.
BIO
Shiping Zhu received the B.Sc. and M.Sc. degrees from Xi’an University of Technology, Xi’an, China, in 1991 and 1994, and the Ph.D. degree from Harbin Institute of Technology, Harbin, China, in 1997. From 1997 to 1999, he was a Postdoctoral Fellow with Beihang University, Beijing, China. From 2000 to 2002, he was a Postdoctoral Fellow with the Brain and Cognition Research Center, Université Paul Sabatier, Toulouse, France. From 2002 to 2004, he was a Postdoctoral Fellow with the Department of Computer Science and Department of Electrical and Computer Engineering, Université de Sherbrooke, Sherbrooke, QC, Canada. Since 2005, he has been an associate professor with the Department of Measurement Control and Information Technology, School of Instrumentation Science and Optoelectronics Engineering, Beihang University, Beijing, China. (E-mail: spzhu@163.com)
Zheng Li received the B.Sc. degree in Measurement and Control Technology and Instrumentation from China University of Geosciences, Wuhan, China in 2012, and he is currently pursuing the M.Sc. degree in Instrumentation Science and Technology at Beihang University, Beijing, China. His research interests include stereo vision, view synthesis and image processing. (E-mail: lizheng900911@163.com)
References
Scharstein Daniel , Szeliski Richard 2002 “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International Journal of Computer Vision 47 (1) 7 - 42    DOI : 10.1023/A:1014573219977
Yang Qinxiong “A non-local cost aggregation method for stereo matching,” in Proc. of IEEE International Conference on Computer Vision and Pattern Recognition June 16-21, 2012 1402 - 1409
Yoon Kuk-Jin , Kweon In-So 2006 ‘Locally adaptive support weight approach for visual correspondence search,” IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (4) 924 - 931
Tombari Federico , Mattoccia Stefano , Di Stefano Luigi “Segmentation based adaptive support for accurate stereo correspondence,” in Proc. of the 2nd Pacific Rim conference on Advances in image and video technology December 17, 2007 no. 4872 427 - 438
Scharstein Daniel , Phd thesis 1997 View synthesis using stereo vision Phd thesis
De-Maeztu Leonardo , Villanueva Arantxa , Cabeza Rafael 2011 “Stereo matching using gradient similarity and locally adaptive support-weight,” Pattern Recognition Letters 32 (13) 1643 - 1651    DOI : 10.1016/j.patrec.2011.06.027
Zhou Xiaozhou , Boulanger Pierre “Radiometric invariant stereo matching based on relative gradients,” in Proc. of IEEE International Conference on Image Processing September 30-October 3, 2012 2989 - 2992
Zabih Ramin , Woodfill John “Non-parametric local transforms for computing visual correspondence,” in Proc. of European Conference on Computer Vision May2-6, 1994 151 - 158
Humenberger Martin , Zinner Christian , Weber Michael , Kubinger Wilfried , Vincze Markus 2010 “A fast stereo matching algorithm suitable for embedded real-time systems,” Computer Vision and Image Understanding 114 (11) 1180 - 1202    DOI : 10.1016/j.cviu.2010.03.012
Fusiello Andrea , Roberto Vito , Truco Emanuele “Efficient stereo with multiple windowing,” in Proc. of IEEE International Conference on Computer Vision and Pattern Recognition June 17-19, 1997 858 - 863
Veksler Olga “Fast variable window for stereo correspondence using integral image,” in Proc. of IEEE International Conference on Computer Vision and Pattern Recognition June 18-20, 2003 556 - 561
Zhang Kang , Lu Jiangbo , Lafruit Gauthier 2009 “Cross-based local stereo matching using orthogonal integral images,” IEEE Transactions on Circuits and Systems for Video Technology 19 (7) 1073 - 1079    DOI : 10.1109/TCSVT.2009.2020478
Qu Yufu , Jiang Ji Xiang , Deng Xiangjin 2014 “Robust local stereo matching under varying radiometric conditions,” IET Computer Vision 8 (4) 263 - 276    DOI : 10.1049/iet-cvi.2013.0117
Besse Frederic , Rother Carsten , Fitzgibbon Andrew , Kautz Jan 2014 “PMBP: PatchMatch belief propagation for correspondence field estimation,” International Journal of Computer Vision 110 (1) 2 - 13    DOI : 10.1007/s11263-013-0653-9
Wang Liang , Yang Ruigang “Global stereo matching leveraged by sparse ground control points,” in Proc. of IEEE International Conference on Computer Vision and Pattern Recognition June 20-25, 2011 3033 - 3040
Barzigar Nafise , Roozgard Aminmohammad , Cheng Samuel , Verma Pramood 2013 “SCoBeP: Dense image registration using sparse coding and belief propagation,” Journal of Visual Communications and Image Representation 24 (2) 137 - 147    DOI : 10.1016/j.jvcir.2012.08.002
Papadakis Nicolas , Caselles Vicent 2010 “Multi-label depth estimation for graph cuts stereo problems,” Journal of Mathematical Imaging and Vision 38 (1) 70 - 82    DOI : 10.1007/s10851-010-0212-8
Mei Xing , Sun Xun , Zhou Mingcai , Jiao Shaohui , Wang Haitao , Zhang Xiaopeng “On building an accurate stereo matching system on graphics hardware,” in Proc of IEEE International Conference on Computer Vision Workshops November 6-13, 2011 467 - 474
Hirschmüller Heiko 2008 “Stereo processing by semiglobal matching and mutual information,” IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (2) 328 - 341    DOI : 10.1109/TPAMI.2007.1166
Scharstein Daniel , Szeliski Richard “The middlebury stereo vision page,” http://vision.Middlebury.edu/stereo/
Larsen E. Scott , Mordohai Philippos , Pollefeys Marc , Fuchs Henry “Temporally consistent reconstruction from multiple video streams using enhanced belief propagation,” in Proc. of IEEE International Conference on Computer Vision October 14-21, 2007 1 - 8
Kolmogorov Vladimir , Rabih Ramin “Computing visual correspondence with occlusions using graph cuts,” in Proc. of IEEE International Conference on Computer Vision July 7-14, 2001 508 - 515
De-Maeztu Leonardo , Mattoccia Stfano , Villanueva Arantxa , Cabeza Rafeal “Linear stereo matching,” in Proc. of IEEE International Conference on Computer Vision November 6-13, 2011 1708 - 1715