Advanced
Distributed Coding Scheme for Multi-view Video through Efficient Side Information Generation
Distributed Coding Scheme for Multi-view Video through Efficient Side Information Generation
Journal of Electrical Engineering and Technology. 2014. Sep, 9(5): 1762-1773
Copyright © 2014, The Korean Institute of Electrical Engineers
  • Received : September 23, 2013
  • Accepted : April 28, 2014
  • Published : September 01, 2014
Download
PDF
e-PUB
PubReader
PPT
Export by style
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Jihwan Yoo
LG Electronics Inc. Seoul, Korea. (uzhwan@naver.com)
Min Soo Ko
Dept. of Electronic Engineering, Kwangwoon University, Korea. (kmsqwet@kw.ac.kr)
Soon Chul Kwon
Dept. of Information & Contents, Kwangwoon University, Korea. (ksc0226@kw.ac.kr)
Young-Ho Seo
College of Liberal Arts, Kwangwoon University, Korea. (yhseo @kw.ac.kr)
Dong-Wook Kim
Dept. of Electronic Materials Engineering, Kwangwoon University, Korea. (dwkim@kw.ac.kr)
Jisang Yoo
Corresponding Author: Dept. of Electronic Engineering, Kwangwoon University, Korea. (jsyoo@kw.ac.kr)

Abstract
In this paper, a distributed image coding scheme for multi-view video through an efficient generation of side information is proposed. A distributed video coding technique corrects the errors in the side information, which is generated with the original image, by using the channel coding technique at the decoder. Therefore, the more correct the generated side information is, the better the performance of distributed video coding. The proposed technique is to apply the distributed video coding schemes to the image coding for multi-view video. It generates side information by selectively and efficiently using both 3-dimensional warping based on the depth map with spatially adjacent frames and motion-compensated temporal interpolation with temporally adjacent frames. In this scheme the difference between the adjacent frames, the sizes of the motion vectors for the adjacent blocks, and the edge information are used as the selection criteria. From the experiments, it was observed that the quality of the side information generated by the proposed technique was improved by the average peak signal-to-noise ratio of 0.97dB than the one by motion-compensated temporal interpolation or 3-dimensional warping. The result from analyzing the rate-distortion curves revealed that the proposed scheme could reduce the bit-rate by 8.01% on average at the same peak signal-to-noise ratio value, compared to previous work.
Keywords
1. Introduction
In the last few decades, broadcasting technologies have grown from black and white TV to HDTV. Especially in recent years, almost all countries in the world have been replacing traditional analog broadcasting with digital. Despite such progress, TV viewers have been demanding more realistic broadcasting and accordingly research institutes (R&D centers) and companies in the world have been developing 3-dimensional television (3DTV) and ultra high definition television (UHDTV) as the solution to broadcast more realistic image content. Under ISO / IEC, the Moving Picture Experts Group (MPEG) has worked on standardization for a 3D image coding scheme by the name of 3DAV from 2001 [1] . In particular, this group standardized the multi-view video coding (MVC) in July 2008 [2] . As the MVC refers to both spatially and temporally adjacent images, it has a drawback that the encoding is very complex. A distributed video coding (DVC) encodes a video by dividing its frames into two kinds: key frames and Wyner-Ziv frames. The key frames are encoded with an intra coding technique, while for the Wyner-Ziv frames only the parity bits are transmitted. Therefore, a DVC scheme can reduce the encoding complexity quite a lot. Meanwhile, the decoder generates side information similar to the Wyner-Ziv frames from the received key frames. The error between the original Wyner-Ziv image and the generated side information is corrected by a channel coding technique with the transmitted parity bits. The research on DVC has been mainly in progress by Girod at Stanford University [3 - 5] , Ramchandran at Berkeley University [6] , and a collaborative research group of DISCOVER in Europe [7] . The distributed multi-view video coding (DMVC) [8 - 10] generates side information by a Motion-compensated Temporal Interpolation (MCTI) technique with temporallyadjacent images or from spatial prediction with spatiallyadjacent view images at the same instant. Also, a technique that mixes the above two schemes has been proposed in which the differences in pixel values and the magnitudes of motion vectors are used to obtain more accurate side information [8] .
However, the method in [8] failed to show better performance than when only MCTI was used. The one in [9] employed a Homography-compensated Inter-view Interpolation (HCII) technique that uses the homography of images to generate side information from the adjacent view images. But because of the inaccuracy in the corresponding view points in the process of homography estimation and the limitation of interpolation performance in warping, it showed better image quality of only about 0.2~0.5dB in peak signal-to-noise ratio (PSNR) than MCTI. Also [10] used the reference image(s) in various ways such as left, right, and average values in using HCII, but it could not overcome the inherent limitation of HCII.
To solve those problems, we proposed a DMVC scheme in this paper. It selectively uses both 3D warping [11] and MCTI according to the characteristics of the target and adjacent images. In selecting the techniques, it considers the intensity difference between the previous and the next time-adjacent frames, the magnitudes of the motion vectors of the current block and adjacent blocks, edge information resulting from the depth map, and the intensity of the residual signals obtained by motion compensation. Here, we assumed that the depth map (or image) is given for the two multi-view videos titled Breakdancers and Ballet provided by MPEG [12] , which we take as our test multiview videos. A depth map is an image or image channel that contains information relating to the distance of the surfaces of scene objects from a viewpoint [13] . Also many depth cameras are available in public these days [14] that provide depth information. With that information, the proposed method finds the characteristics of the block to be reconstructed and selects an appropriate reconstruction technique to generate more accurate side information.
First, the existing methods of multi-view video coding are shortly introduced in the next section. Then, the proposed method to solve the problems that the existing methods have is explained in Section 3. The performance of the proposed method is experimentally proved in Section 4 and this paper is concluded in Section 5 on the basis of the experimental results.
Fig. 1 shows the current reference structure for MVC. This scheme uses a prediction structure with hierarchical B pictures for each view. Additionally, inter-view prediction is applied to every second view: S1, S3 and S5. When the total number of views is even, the prediction structure of the last view (S7) is similar to those of even views. While B pictures in the even views do not use any inter-view references, B pictures in the last view use one inter-view reference. To allow random access, we start each GOP (S0/T0, S0/T8) with the I-frame. Fig. 1 also depicts that if the total length of the sequence does not fit an integer multiple of the GOP length, a shortened tail GOP can be realized at the end of the sequence. In this figure, the GOPlength is 8 [15] .
PPT Slide
Lager Image
Prediction structure for multi-view video coding
2. Existing Multi-view Video Coding Methods
- 2.1 Multi-view video coding
Fig. 1 shows a basic prediction frame structure for multiview video coding, MVC, which has been standardized by MPEG. Its coding structure has a mixture of two schemes to maximize coding efficiency: a hierarchical B picture structure for frame sequence in the time axis and a view prediction structure for frames in the view point axis. Accordingly it has very high coding complexity compared to other techniques. Thus, to reduce the complexity, some have tried to discard the motion estimation step by using the characteristic that the motions in the adjacent view images are very similar to the current image [16] . But it could not reduce the complexity that much.
- 2.2 Distributed video coding
As mentioned before, a distributed video coding technique is used to transfer some of the encoding complexity to a decoder. An existing conventional video coding technique removed the correlation between the image X to be coded and the side information Y as much as possible in the encoder, as shown in Fig. 2(a) . Here, the side information means the predicted images obtained by an intra-prediction or inter-prediction, which is accessible at both the encoding and decoding sides. Meanwhile in a distributed video coding, the correlation is removed by the decoder. In this case, differently from the case of (a), the side information generated by the decoder is only accessible in the decoder. Note that the performance of a distributed video coding technique could be the same as that of the conventional coding technique, which is based on the two theories from [17] and [18] .
PPT Slide
Lager Image
Approaches to generated side information: (a) conventional video coding; (b) distributed video coding
In [18] , Slepian-Wolf showed that removing the correlation at the decoder side can have the same performance as removing the encoder. Fig. 3 shows the possible bit-rate regions for Slepian-Wolf compression coding. In the figure, RX and RY are the bit-rate of X and Y , respectively, and H(X) and H(Y) are the quantitative entropies of X and Y , respectively. If X and Y are statistically independent, the achievable bit-rates have the relationship of (1) and they correspond to Region A.
PPT Slide
Lager Image
Achievable bit-rate regions for Slepien-Wolf distributed compression
PPT Slide
Lager Image
But if the two pieces of data are statistically correlated rather than independent, the relationship of the bit-rates is changed as in (2). In this case, the bit-rate region is extended to region B.
PPT Slide
Lager Image
That is, as the correlation between X and Y increases, the amount of the occurring bits decreases, which means the compression ratio increases.
Wyner-Ziv extended the Slepian-Wolf's coding theory to the lossy compression cases by adding a quantization step [18] . It is general that the relationship between the rate-distortion (R-D) function of the conventional coding RX|Y(d) and that of the distributed coding R*(d) is RX|Y(d)<R*(d) . RX|Y(d) and R*(d) represents the rate-distortion functions of the conventional video coding and the distributed video coding, respectively. d is defined as bit rate. But it was proved that if the data has some special characteristics such that they have Gaussian distributions the relationship could be RX|Y(d) = R*(d) .
Fig. 4 shows a usual structure of distributed coding by the Wyner-Ziv method. First, the original image frames are divided into two kinds: key frames and Wyner-Ziv frames. At this time a group of pictures (GOP) consists of several key frames and more than one Wyner-Ziv frame [3] . A key frame is coded by one of the existing intra coding techniques such as JPEG or H.264/AVC. But a Wyner-Ziv image is coded with a channel coding technique such as turbo code [19] or Low Density Parity Check (LDPC) code [20] after quantization. Here, only the parity bits are stored in the buffer for the Wyner-Ziv image. Before quantization a transformation step to transform to a frequency domain can be accomplished. In general, inserting a transformation step removes spatial redundancy quite a lot and the coding efficiency increases in turn.
PPT Slide
Lager Image
Structure of distributed video coding
Meanwhile, the decoder generates the side information similar to the Wyner-Ziv image, which is the target image to reconstruct. At this time the decoder tries to correct the error between the side information and the Wyner-Ziv image using a strong channel coding technique with the parity bits by regarding the error as a virtual channel noise. The decoder continuously requests additional parity information to the encoder until the reconstruction is succeeded. Accordingly the performance of a distributed coding technique totally depends on correctness of the side information and the performance of the channel coding technique.
- 2.3 Distributed multi-view video coding
In the environments of the multi-view video it is possible to generate the side information more correctly because the spatially adjacent images (the images at the different viewpoints at the same time) as well as the temporally adjacent images (the images at the different time at the same view point) can be used. The general techniques to generate side information in a Distributed Multi-view Video Coding (DMVC) are as follows.
The first one is an MCTI technique that uses the motion vectors between temporally-adjacent images. That is, it generates side information of the k th frame with the previous ( k −1) th and the next ( k +1) th images, as shown in Fig. 5(a) . For this, the motion vector between the ( k −1) th and ( k +1) th frame is obtained by motion estimation. Then the side information is interpolated from the adjacent frames with half of the motion vector. But because there might be some empty or overlapped regions between the interpolated blocks, the motion vector in the point A is shifted to the point B. After that the final motion vector is found by motion estimations from the ( k −1) th and ( k +1) th frames as shown in Fig. 5(b) . Finally, each pixel value in the side information is interpolated with the average value of the corresponding pixels in the ( k −1) th and ( k +1) th frames.
PPT Slide
Lager Image
Side information generation by MCTI: (a) selection of motion vectors, (b) bidirectional motion estimation
Another method to generate side information is Homography Compensated Inter-view Interpolation (HCII) that uses spatially adjacent images (adjacent view images) with the same time stamp. It extracts the corresponding disparities from the view-adjacent images to generate the side information as shown in Fig. 6 .
PPT Slide
Lager Image
Side information generation by HCII
Finally, there is a side information generation method that mixes the above two methods, MCTI and HCII. Because the motion vector for a region having a large motion is possibly incorrect, the corresponding interpolated block by MCTI might show a large error. Therefore a region whose motion vector is larger than a pre-defined value is interpolated by HCII rather than MCTI.
In a conventional multi-view video coding technique, the temporally adjacent images are more likely to be selected as reference images than the spatially adjacent images because the temporal correlation is higher than the spatial correlation. In a DMVC technique, however, the distance between the key frames that are used to predict the motions might be so large that the temporal correlation could be too small to be used. To get more accurate side information, a mixed or fused method to compensate these problems is highly required.
3. The proposed Distributed Multi-view Coding Method
The DMVC method to be proposed in this paper uses both the 3D warping and MCTI appropriately according to the characteristics of the target and the adjacent image blocks. It assumes that the necessary depth map and the camera parameters are transmitted and available at the decoder side. Camera matrix is used to denote a projective mapping from world coordinates to pixel coordinates in 3D warping. Generally two kinds of matrices are required in 3D warping, which are the intrinsic and extrinsic matrices. The intrinsic matrix contains 5 intrinsic parameters for focal length, and image format. The extrinsic parameters define the position of the camera center and the camera's heading in world coordinates [21] . Of course transmitting depth information increases the bit-rate. But it is more effective than extracting depth information by a stereo matching technique at the decoder side. Thus, MPEG is processing standardization for encoding a depth map also. So we also assume that the necessary depth information is provided, not generated at either the encoder or decoder, because it is quite common to provide depth information and is not hard to obtain depth information with a publically available depth camera.
Fig. 7 shows a brief block diagram of the DMVC method to be proposed in this paper. As can see in the figure, it uses correlations for both time-adjacent images (MCTI) and space-adjacent images (3D warping) to generate side information on the basis of the usual DMVC structure.
PPT Slide
Lager Image
Block diagram of the proposed distributed multiview video coding method
The frame structure to apply the proposed method is shown in Fig. 8 , which consists of two kinds of frames: a key frame (I) that is intra-coded and a Wyner-Ziv frame (WZ) that is channel-coded. The key frames and the Wyner-Ziv frames are alternatively placed in both the time axis and the viewpoint axis. Thus, a Wyner-Ziv frame can be reconstructed with the four adjacent key frames. In this paper, we propose a method to mix or selectively use the MCTI technique and the 3D warping technique. Also in Fig. 9 , the proposed selection procedure of the two techniques to generate the side information (Wyner-Ziv frame) is shown. Each condition of the following process is explained in the following section.
PPT Slide
Lager Image
Frame structure to apply the proposed method
PPT Slide
Lager Image
The technique selection scheme to generate side information
- 3.1 Generating side information by 3D warping
The procedure to generate side information by 3D warping with the space-adjacent frames is shown in Fig. 10 . First, a Sobel mask is applied to the depth map of the right image (refer to Fig. 11(a) ) to extract the edge information ( Fig. 11(b) ). In general the edge information extracted with this method does not exactly correspond to the actual edge information of the image. Thus, it is highly possible that the side information by 3D warping has errors in the edge regions. To avoid this error, we remove the regions of the original image corresponding to the edge regions.
PPT Slide
Lager Image
Generation of side information by 3D warping
The next step is to convert the 2D-coordinate of the 2D image into the corresponding 3D-coordinate with the depth map and the camera parameters. (3) shows the relationship between the 2D-coordinate and the corresponding 3Dcoordinate, which is obtained by the geometric structure of a pin-hole camera.
PPT Slide
Lager Image
Here, ( x, y ) is the 2D-coordinate in the image, while ( X, Y, Z ) is the 3D-coordinate in the real world. K is the intrinsic parameters of the camera consisting of a 3x3 matrix. R is the rotation parameters and is expressed by a 3x3 matrix, while T is the translation parameters with a 3x1 matrix. [ R|T ] stand for the concatenation matrix of R and T . Depth map stores depth information as 8-bit gray values with the gray level 0 specifying the furthest value and the gray level 255 defining the nearest value [ 22 , 23 ]. The real depth value Z( i, j ) which corresponds to the pixel ( x, y ) is transformed into the 8-bit gray value P( i, j ) as (4).
PPT Slide
Lager Image
Where, P( i, j ) is the depth value at ( i, j ) in the depth map and MinZ and MaxZ are the minimum and maximum depth values, respectively. The symbol ⎣ α ⎦ means the largest integer smaller than or equal to α . The real depth value Z( i, j ) which corresponds to the pixel ( x, y ) is expressed as (5) from (4).
PPT Slide
Lager Image
The converted 3D coordinate is again projected into the 2D plane of the target Wyner-Ziv image by using (3) as Fig. 11(c) . The camera parameters at this time are the ones belonging to the Wyner-Ziv image. But as can be seen in Fig. 11 (c) , the projected image has some holes and occluded regions. The holes occur during the coordinate conversion process and the occluded regions are the ones occluded before the view point change and appearing after. The holes can be filled by interpolating the adjacent images and the occluded regions can be filled with the data in the left image. Finally, the resulting image from filling the holes and the occluded regions is filtered with a median filter to get the final side information, as in Fig. 11 (d) .
PPT Slide
Lager Image
Example of side information generation by 3D warping; (a) depth map, (b) edge of depth map, (c) projected image, (d) generated side information
- 3.2 Generating of side information with low error
Because the previous multi-view video coding techniques could access both the time-adjacent frames and the space-adjacent frames to use as reference images, they calculate the rate-distortion (R-D) value for each prediction mode for each reference image and select the one with the minimum value. However, it is impossible for a Wyner-Ziv frame in coding by DMVC because only the parity bits are sent without the image information. Therefore, our method generates side information with the difference between the previous and next frames to the current frame, the edge information of the depth map, and the intensity of the residual signal generated with the motion vector and the compensated block.
In general, side information by 3D warping has a high possibility of errors in the boundary regions as Fig. 12(a) and the one by MCTI in the regions with large motions as Fig. 12(b) . Thus, it can result in more accurate side information to use MCTI in the boundary regions and 3D warping in the regions having large motions. In using MCTI in the boundary regions also, the characteristic of inaccuracy for the large motion is still applied. Thus, if the motion vectors of the neighboring macro-blocks are greater than a pre-defined threshold value (Th2, refer to Fig. 9 ) or the signal intensity of the residual block by motion compensation is greater than another pre-defined threshold value (Th3, refer to Fig. 9 ), it uses 3D warping.
PPT Slide
Lager Image
Errors in the generated side information; (a) boundary regions by 3D warping, (b) regions with large motions by MCTI
However, the motion estimation itself is not always correct in that a motionless region such as a boundary region may have error resulting from MCTI, as can be seen in Fig. 13(a) . Thus, the proposed method first compares the difference between the two time-adjacent blocks to the current frame with a pre-defined threshold Th1 (refer to Fig. 9 ). If the difference is less than Th1 the block is regarded as the boundary region without motion and the side information corresponding to the block is generated with the average value of the two blocks. An example is shown in Fig. 13 , where the errors in by MCTI in (a) is corrected by the above process in (b).
PPT Slide
Lager Image
Processing background error; (a) before process, (b) after process
When motion vector should be used the motion vectors of the neighboring macro-blocks are considered together. For example, Fig. 14(a) shows a case when the center block has a small motion even though the neighboring blocks have large ones. In this case it is highly possible that the estimated motion for the center block is not correct. It can result in inaccurate side information if the neighboring blocks are not considered, as shown in Fig. 14(c) . To reduce the amount of these kinds of errors, 3D warping rather than MCTI is used in such a case or vice versa, as shown in Fig. 14(b) (refer to Fig. 9 ). Fig. 14(d) shows the improved result after applying this scheme.
PPT Slide
Lager Image
Improvement for the inaccurate estimated motion region: (a) inaccurate motion region before process, (b) inaccurate motion region after process, (c) residual image before process, (d) residual image after process
4. Experiments and Results
We have performed some experiments to figure out the performance of the proposed method. The multiview video sequences used in the experiments were the “Breakdancers” and “Ballet” videos provided by Microsoft Research, “Poznan Street” video provided by Poznan University of Technology, and “Dancer” video provided by Nokia Inc. The sequences have 8/3 view-points and depth maps. Each sequence consists of 100 frames for each viewpoint. The resolution and the frame rate of the former two sequences were XVGA (1024x768) and 15fps respectively, but the resolution and the frame rate of the latter two sequence were full HD (1,920x1,080) and 25fps respectively.
In the experiments, the transform unit for DCT was 8×8 and, for the quantization factors, the results from multiplying the scaling factors (Q = 0.5, 1, 2, 4) of the quantization table in JPEG [24] were used. As the channel code Low-Density Parity-Check Accumulate (LDPCA) code was used, which was proposed by Varodayan et al. [25] . LDPCA has the advantage of having an adaptive bitrate, while Low-Density Parity Check (LDPC) code has a fixed bit-rate. After testing the coding efficiency of various videos with various thresholds for finding the best result, we empirically choose the threshold values Th1, Th2, and Th3 which have the best coding performance. All experiments were executed by the same threshold values which are not changed according to images.
Fig. 15 shows examples of the generated side information for Break-dancers ((a), (b), (c)) and Ballet ((d), (e), (f)) by MCTI ((a), (d)), by 3D warping ((b), (e)) and by the proposed method ((c), (f)). Also Fig. 16 shows the different (residual) images resulting from subtracting the generated image in Fig. 15 from the originals are shown with the same order. It is obvious that the regions with large motions in Fig. 15(a) or (d) show large errors, while in Fig. 15(b) and (e) large errors appear in the boundary regions. As can see in Fig. 15(c) and (f) or Fig. 16(c) and (f) , the proposed method greatly reduces those errors. The experimental results are summarized in Table 1 , where the PSNR values are the averages for all the frames (30 frames per sequence) of all the view-points in the two test video sequences. As can be seen in the table, the proposed method that used both 3D warping and MCTI appropriately showed 0.14~1.8dB higher PSNR values than the cases using the single technique of 3D warping or MCTI. Here the PSNR values are quite different according to the kind of test sequences and the applied technique because the two have quite different image characteristics. The results from applying MCTI to the Breakdancers video sequence show relatively low quality because it has relatively large motions. Meanwhile, the Ballet sequence has the characteristics that the distance between the camera and the objects are small so that the occluded regions are relatively large. Thus Ballet shows relatively low quality when 3D warping is used.
PPT Slide
Lager Image
Generated side information examples for Breakdancers by: (a) MCTI, (b) 3D warping, (c) proposed method, and for Ballet by: (d) MCTI, (e) 3D warping, (f) proposed method
PPT Slide
Lager Image
The difference images of Fig. 15 images to the original images. For Breakdancers by: (a) MCTI, (b) 3D warping, (c) proposed method, and for Ballet by: (d) MCTI, (e) 3D warping, (f) proposed method
Average PSNR value comparison
PPT Slide
Lager Image
Average PSNR value comparison
Fig. 17 shows the R-D curves for the Breakdancers, Ballet, Poznan Street, and Dancer sequences resulting from applying various techniques to generate side information. If observing the graphs of Fig. 17 , HCII generally has low performance, especially it has lower performance in the case of large motion. In case of small motion such as the Ballet sequence, 3D warping-based techniques show better performance, but they have worse performance in case of large motion.
PPT Slide
Lager Image
R-D curves for; (a) Breakdancers, (b) Ballet, (c) Poznan Street, (d) Dancer
After the average of the bitrates for each PSNR was calculated, the average bitrate of the proposed algorithm was compared with those of the other methods. In the case of the Breakdancers sequence in Fig. 17(a) , each PSNR is 41.5dB, 40.1dB, 38.4dB, and 36dB and the ratios of bitrate reduction between the proposed and the others are 7.91% (MCTI), 49% (HCII), 36.71% (3D warping), and 7.08% (MCTI+HCII). In the case of the Ballet sequence in Fig. 17(b) , each PSNR is 42.7dB, 41.1dB, 39dB, and 36.3dB and the ratios of bitrate reduction are 7.91% (MCTI), 49% (HCII), 36.71% (3D warping), and 7.08% (MCTI+HCII). In the case of the Poznan Street sequence in Fig. 17(c) , each PSNR is 41.3dB, 40.8dB, 38.7dB, and 36.3dB and the ratios of bitrate reduction are 34.25% (MCTI), 9% (HCII), 38.44% (3D warping), and 29.47% (MCTI+HCII). In the case of the Dancer sequence in Fig. 17(d) , each PSNR is 42.1dB, 40.5dB, 38.3dB, and 36.2dB and the ratios of bitrate reduction are 33.13% (MCTI), 8.21% (HCII), 36.43% (3D warping), and 28.83% (MCTI+HCII).
Table 2 shows the average bitrates of the depth maps. The depth map was compressed under the same condition as Table 1 . Since the resolution of Poznan Street and Dancer is full HD, they have somewhat higher bitrate than other sequences. The total average bitrate which includes the resultant bitrate of depth map is shown in Table 3. Because the resolution of Break Dancers and Ballet is XVGA (1,024x768) but the resolution of Poznan Street and Dancer is full HD (1,920x1,024), the amount of the average bitrate is different each other. When the average bitrates are compared between the proposed algorithm and the others, they are reduced to 27.64%, 24.24%, 27.89%, and 25.02%, respectively.
Average bitrate of depth map
PPT Slide
Lager Image
Average bitrate of depth map
5. Conclusion
In this paper we proposed a more efficient distributed multi-view video coding method based on depth information. That is, it proposed a method to generate more exact side information to increase the performance of distributed multi-view video coding. In its scheme, 3D warping and MCTI are selectively used with three threshold parameters according to the characteristics of the image. The three threshold parameters are to determine which technique is applied with the difference between the previous and the next time-adjacent frames, the edge information extracted from the depth map, the magnitude of the motion vector, and the residual signal value generated by motion compensation. Also, we have considered the motion vectors for the neighboring macro-blocks to reduce the errors resulting from incorrect motion estimation.
Acknowledgements
This work was supported by the IT R&D program of MSIP/KEIT. [10039199, A Study on Core Technologies of Perceptual Quality based Scalable 3D Video Codecs].
BIO
Jihwan Yoo He received B.S. and M.S. degrees in electronics engineering from Kwangwoon University in 2005 and 2010, respectively. He is currently a research engineer at LG Electronics. His research interests include video coding, free-view point system and GPGPU.
Min Soo Ko He received the B.S. and M.S. degrees in electronics engineering from Kwangwoon University, Seoul, Korea in 2010 and 2012, respectively. He is currently a PhD student at Kwangwoon University. His research interests include stereo matching and 3D image processing and image compression.
Soon Chul Kwon He received the B.S. degree in industrial engineering from Hanyang University, Seoul, Korea in 2002. He received the M.S. degree in digital contents in 2008 and Ph.D degree in information display in 2012 from Kwangwoon University. From september 2005 to may 2009, he worked as an assistant manager of network engineering at LG Telecom, Seoul, Korea. He is currently an assistant professor with the department of media contents, graduation school of information and contents, Kwangwoon University, Seoul, Korea. His research interests are in 3D display and holography.
Young-Ho Seo He has received his M.S and Ph.D degree in 2000 and 2004 from Dept. of Electronic Materials Engineering of Kwangwoon University in Seoul, Korea. He was a researcher at Korea Electrotechnology Research Institute (KERI) in 2003 to 2004. He was a research professor of Dept. of Electronic and Information Engineering at Yuhan College in Buchon, Korea. He was an assistant professor of Dept. of Information and Communication Engineering at Hansung University in Seoul, Korea. He is now an associated professor of College of Liberal Arts at Kwangwoon University in Seoul, Korea and a director of research institute in TYJ Inc. His research interests include realistic media, digital holography, SoC design and contents security.
Dong-Wook Kim He has received his M.S degree in 1985 from Dept. of Electronic Engineering of Hangyang University in Seoul, Korea and his Ph.D degree in 1991 from Dept. of Electrical Engineering of Georgia institute of Technology in GA, U.S.A. He is a Professor of Dept. of Electronic Materials Engineering at Kwnagwoon University in Seoul, Korea.
Jisang Yoo He received the B.S. and M.S. degrees from Seoul national university, Seoul, Korea in 1985 and 1987, all in electronics engineering, and Ph.D. degree from Purdue university, West Lafayette, IN, USA, in electrical engineering in 1993, respectively. From september 1993 to august 1994, he worked as a senior research engineer in industrial electronics R&D center at Hyundai Electronics Industries Co., Ltd, ichon, Korea, in the area of image compression and HDTV. He is currently a professor with the department of electronics engineering, Kwangwoon university, Seoul, Korea. His research interests are in signal and image processing, nonlinear digital filtering, and computer vision. He is now leading 3DTV broadcast trial project in Korea.e received B.S degree in electrical engineering from Yonsei university. His research interests are robust control, filtering and signal processing.
References
2002 ISO/IEC JTC1/SC29/WG11, “Requirements for Standardization of 3D Video,” m8107 Jeju Island, Korea
2008 ISO/IECJTC1/SC29/WG11, “Text of ISO/IEC 14496-10:200X/FDAM 1 Multi-view Video Coding,” N9978 Hannover, Germany
Girod B. , Aaron A. , Rane S. , Rebollo Monedero D. 2005 “Distributed video coding,” Proc. IEEE 93 447 - 460
Aaron A. , Rane S. , Setton E. , Girod B. 2004 “Transform-domain Wyner-Ziv codec for video,” SPIE Visual Communications and Image Processing Conference San Jose, CA 5308 520 - 528
Aaron A. , Zhang R. , Girod B. 2002 “Wyner-Ziv coding of motion video,” Proceedings of Asilomar Conference on Signals and Systems Pacific Grove, CA
Puri R. , Ramchandran K. 2002 “PRISM: A new robust video coding architecture based on distributed compression principles,” Proc. Allerton Conference on Communication, Control, and Computing Allerton, IL
http://www.discoverdvc.org
Guo X. , Lu Y. , Wu F. , Gao W. , Li S. 2006 “Distributed Multi-view Video Coding,” Visual Communications and Image Processing 2006 San Jose, CA
Ouaret M. , Dufaux F. , Ebrahimi T. 2006 “Fusion-based Multiview Distributed Video Coding,” 4th ACM international workshop on video surveillance and sensor networks 2006 Santa Barbara, CA
Dufaux F. , Ouaret M. , Ebrahimi T. 2007 “Recent Advances in Multi-view Distributed Video Coding,” SPIE Mobile Multimedia/Image Processing for Military and Security Applications Orlando, FL
Zitnick C. L. , Kang S. B. , Uyttendaele M. , Winder S. , Szeliski R. 2004 “High-Quality Video View Interpolation Using a Layered Representation,” ACM SIGGRAPH and ACM Trans. On Graphics Los Angeles, CA, USA 23 (3) 600 - 608
http://www.research.microsoft.com/ImageBased Realities//3DVideoDownload/
http://en.wikipedia.org/wiki/Depth_map
http://www.mesa-imaging.ch/
2006 “Joint Draft 1.0 on Multiview Video Coding,” JVT-U209
Koo H.-S. , Jeon Y.-J. , Jeon B.-M. 2007 “MVC Motion Skip Mode,” JVT-W081 ITU-T and ISO/IEC JTC1 San Jose, California, USA
Slepian D. , Wolf J. 1973 “Noiseless coding of correlated information sources,” IEEE Trans. Inform., Theory 19 471 - 480    DOI : 10.1109/TIT.1973.1055037
Wyner A. , Ziv J. 1976 “The rate-distortion function for source coding with side information at the receiver,” IEEE Trans. Inform., Theory 22 1 - 11    DOI : 10.1109/TIT.1976.1055508
Garcia-Frias J. 2001 “Compression of correlated binary sources using Turbo codes,” Communications Letters, IEEE 5 (10)
Liveris A. , Xiong Z. , Georghiades C. 2002 “Compression of binary sources with side information at the decoder using LDPC codes,” IEEE Commun. Lett. 6 (10) 440 - 442    DOI : 10.1109/LCOMM.2002.804244
http://en.wikipedia.org/wiki/Camera_resectioning
Hartley Richard , Zisserman Andrew 2003 “Multiple View Geometry,” Second Edition Cambridge University 152 - 247
Tanimoto Masayuki , Fujii Toshiaki , Suzuki Kazuyoshi 2008 Improvement of Depth Map Estimation and View Synthesis, ISO/IEC JTC1/SC29/WG11 M15090
“Digital compression and coding of continuous-tone still images,” ISO/IEC 10918-1-ITU-T Recommendation T.81 (JPEG)
Varodayan D. , Aaron A. , Girod B. 2005 “Rate-adaptive distributed source coding using low-density paritycheck codes,” Signals Systems and Computers Conference Record of the Thirty-Ninth Asilomar Conference