Multiple Description (MD) coding is a promising alternative for the robust transmission of information over errorprone channels. Lattice vector quantization (LVQ) is a significant version of MD techniques to design an MD image coder. However, different from the traditional 2D texture image, the 3D depth image has its own special characteristics, which should be taken into account for efficient compression. In this paper, an optimized MDLVQ scheme is proposed in view of the characteristics of 3D depth image. First, due to the sparsity of depth image, the image blocks can be classified into edge blocks and smooth blocks, which are encoded by different modes. Furthermore, according to the boundary contents in edge blocks, the step size of LVQ can be regulated adaptively for each block. Experimental results validate the effectiveness of the proposed scheme, which show better rate distortion performance compared with the conventional MDLVQ.
1. Introduction
I
n recent years, 3D image compression has attracted significant research attention, especially depth image in 3D image data format. Depth image represents the distance information between a camera and the objects in the scene. Depth images are often treated as grayscale image sequences, which are similar to the luminance component of texture videos. However, different from the texture image, the depth image has its own special characteristics. First, the depth image signal is much sparser than that of the texture video. It contains no texture but has sharp object boundaries because the gray levels are nearly the same in most regions within an object but change abruptly across boundaries. Furthermore, the depth image is not directly used for display, but it plays an important role in the virtual view synthesis. The distortion of depth data, especially around object boundaries, seriously degrades the quality of the rendered virtual views
[1]
. Therefore, determining the means to employ depth image characteristics for efficient compression is an essential part in 3D systems.
Multiple Description (MD) is a coding technique that has emerged as a promising approach to enhance the fault tolerance of a video delivery system
[2]
. In 1993, the first work on MD coding was introduced in
[3]
. MD coding can also be used in watermarking. In
[4]
, the ownership of an image that has little perceptual distortion can be identified by image watermarking. For many applications, an MD coder generates multiple descriptions, and the packets of each description are routed over the same or multiple partial paths. To decode the media stream, any description can be used. If one description is received, we can reconstruct a lowquality image that can be measured by side distortion. However, the greater the number of received descriptions, the better the image quality will be reconstructed. In a simple architecture of two channels, the distortion with two received descriptions is called central distortion. Quantizationbased MD coding scheme mainly includes multiple description scalar quantization (MDSQ)
[5]
and multiple description lattice vector quantization (MDLVQ)
[6]
. The MDSQ incorporated with wavelet transformation is first presented in
[5]
, in which the input stream is made into two descriptions by two scalar quantizers. However, due to the good symmetrical structure of the lattice, the nonexistent need to design a code book, and the low storage space it takes up, MDLVQ will simplify the calculation of the conventional vector quantization. In
[7]
, the performance of the MDLVQ image encoder was enhanced by using some algorithms that are more effective. The definition of Lattice Vector Quantization (LVQ), the specific algorithm, and the optimized scheme all have been introduced in
[7]
. In
[7]
, an effective MD image coding scheme is introduced based on the MD Lattice Vector Quantization (MDLVQ) for the wavelet transformed images. Another LVQ based MD coding work is presented in
[8]
, whereby the authors use an asymmetric MDLVQ scheme to a wide range of distortion profiles. There is also one paper
[9]
, in which a new MDcoinciding lattice vector quantizer (MDCLVQ) is presented. The design of the quantizer is based on coinciding 2D hexagonal sublattices. The coinciding sublattices are geometrically similar sublattices, with the same index but generated by different generator matrices. However, no singledesigned MDLVQ scheme for the depth image has been reported yet.
In this paper, considering the special characteristics of depth information, an optimized MDLVQ on depth map is proposed. Compared with the basic MDLVQ, the proposed scheme for depth image has some improvements. Given that the depth image contains no texture but only object edges, the blocks can be classified into two classes, edge blocks and smooth blocks, which can then be compressed using different modes. Furthermore, instead of the fixed step size of LVQ over the whole map, the step size can be adaptively assigned to different blocks according to the boundary contents of the edge blocks.
The rest of this paper is organized as follows. In Section 2, an overview of the conventional MDLVQ coding scheme is presented for general natural images. In Section 3, the optimization of MDLVQ encoding and decoding is proposed for the depth image in detail. The performance of the proposed scheme is examined against the other coders in Section 3. Conclusions are presented in Section 4.
2. Basic MDLVQ Image Coding
Fig. 1
illustrates the framework of the MDLVQ scheme for image coding. Here, two balance channels are considered, that is to say, the bit rate and side distortion produced by the two side decoders are approximately the same for the two channels. Next, a stepbystep recipe will be given as follows.
Block diagram of the basic MDLVQ scheme.
Step 1: Block splitting and transformation
First, a given input image is decomposed into blocks of the same size. In
[7]
, the whole image can be decomposed into subbands (subband 1, subband 2 ,…, subband
m
, denoted by
S_{i}
,
i
=1, 2,...,
m
) by DWT. In this paper, DCT transform is applied in each block. Similar to that done in
[5]
, small DCT coefficients in high frequency are set to zeros in view that the information with high frequency in a natural image is not particularly important after DCT transform.
Step 2: Vector organization
In the basic MDLVQ, the LVQ is based on the
A
_{2}
lattice, which is a 2D lattice. In view of the high compression efficiency, the correlations between the 2D vectors should be exploited appropriately. In
[7]
, the wavelet coefficients in each subband have different directional correlations. Therefore, organizing coefficients in every subband according to their directional correlations is more efficient. For example, HL is scanned to form vectors along the vertical direction, LH is scanned in the horizontal direction, and HH is scanned in zigzag way. In addition, spiral scan is applied in the LL subband because of the strong correlation among neighboring coefficients. However, the correlations among DCT coefficients are different from that of DWT coefficients because most nonzero coefficients focus on the upper left corner in the 2D matrix after DCT quantification. Each block can thus be scanned in zigzag manner.
Step 3: Lattice vector quantization (LVQ)
Here, we utilize the LVQ based on an
A
_{2}
lattice.
A
_{2}
lattice is also equivalent to the hexagonal lattice
[10]
. The hexagonal lattice can be spanned by the vectors (1, 0) and
and the generator matrix will be as follows.
In each block, every two coefficients form a 2D vector according to the special scanning. A lattice vector quantizer is then applied to such 2D vectors, thus producing a quantized symbol
λ
,
λ
⊂
A
_{2}
. The process is similar to that of scalar quantization, in which the lattice vector can also implement quantization in a different accuracy by adjusting the “step size.” Different from the conventional vector quantization, LVQ does not require performing the computationintensive nearest neighboring search based on squared distance calculation. Therefore, the complexity of LVQ on
A
_{2}
is considered very low.
Step 4: Index assignment
After LVQ, a quantized point
λ
is mapped into two sublattice points as two descriptions, which is then called index assignment. Here, a labeling function
[11]
maps
λ
⊂ Λ to a pair (
λ
_{1}
^{'}
,
λ
_{2}
^{'}
)∈ Λ' x Λ'. where Λ' is a sublattice of
^{Λ}
with the index
^{N}
. The index
^{N}
determines the coarse degree of the sublattice that can control the amount of redundancy in the MDLVQ encoder
[6]
.
Fig. 2
is an example of an
A
_{2}
sublattice with index
N
=13 . In the case of
N
=13 , we can obtain a labeling function as in
Table 1
, where each fine lattice point
λ
is mapped to a unique label (
λ
_{1}
^{'}
,
λ
_{2}
^{'}
）, with
λ
_{1}
^{'}
and
λ
_{2}
^{'}
being the two sublattice points as close to
λ
as possible. Note that the proposed mapping scheme shown in the table is slightly different from the index assignment developed by Servetto, Vaishampayan, and Sloane
[6]
(known as SVS technique). In our proposed scheme,
λ
_{1}
^{'}
is always closer to
λ
, and thus
λ
_{1}
^{'}
is denoted as the near sublattice point and
λ
_{2}
^{'}
is the far sublattice point. To strike a balance of reconstruction quality with any single description sequence,
λ
_{1}
^{'}
and
λ
_{2}
^{'}
are alternately transmitted over two channels.
Example of A_{2} with index 13: Fine lattice points are labeled by small letters, and sublattice points are labeled by capital letters.
Index assignment forV0（0） in the hexagonal lattice withN=13
Index assignment for V_{0}（0） in the hexagonal lattice with N =13
As a simple example, if we have a quantized sequence of fine lattice points{
λ
(1),
λ
(2),...,
λ
(8) }={ a, a, a, b, b, b, i, i}, then the two sequences of sublattice points that use the labeling function in
Table 1
are {
λ
_{1}
^{'}
(1),
λ
_{1}
^{'}
(2),...,
λ
_{1}
^{'}
(8) }={ O, O, O, O, O, O, D, D } and {
λ
_{2}
^{'}
(1),
λ
_{2}
^{'}
(2),...,
λ
_{2}
^{'}
(8) }={ A, A, A, B, B, B, B, B }. Based on the alternative transmission scheme, the sequence
is transmitted over channel 1 and
over channel 2.
Step 5: Center decoder and side decoder
At the receiver, if both channels are working properly, then the two descriptions will be processed by the central decoder after arithmetic decoding, and the sequence of fine lattice points{
λ
} can be reconstructed with the central distortion. If either description is lost, a side decoding effect can be obtained by performing lost information prediction if necessary, based on the neighboring intervector correlation and the mentioned alternative transmission scheme above. However, this option results in a larger distortion than that obtained by the central decoder.
3. Improved MDLVQ for depth image
 3.1 Overview
Based on the basic MDLVQ image coding in
Fig. 1
, an optimized MDLVQ scheme is proposed in view of the characteristics of the 3D depth image shown in
Fig. 3
. First, the depth image is sparse, and each gray value represents the distance between the corresponding pixel point and the camera. Therefore, the edge regions in color image tend to be smooth in the depth map. Moreover, the distortion of the smooth area in the depth map has a low impact on the quality of the synthesized virtual viewpoint image. Therefore, the depth image is classified into edge blocks and smooth blocks, which are encoded in different modes. The specific classification results and the two coding modes will be presented in more detail below. Second, because the depth image contains no texture, only object edges, edge information encoding tends to be particularly important. Here, we regulate the step size of MDLVQ adaptively for each block according to the boundary contents.
Block diagram of the improved scheme.
 3.2 Smooth block encoding and decoding
After block splitting, the mean value of each block is calculated first. If the values of each pixel in the corresponding block are equal to the mean value, then the block is called smooth block; otherwise, it is called edge block.
After classification, the smooth blocks will be marked with the flag bit “0.” Then only the flag bit together with the DC component of the blocks will be compressed with arithmetic coding. The information of smooth blocks will also be transmitted together with the two descriptions that are processed after index assignment. At the decoder, if the flag bit “0” is received, the block will be decoded with all zeros. The block can then be reconstructed by adding its DC component. As a result, the block classification optimization can largely reduce the bit rate, saving coding time at the same time. In addition, the smoother the blocks are, the greater the reduction in bit rate.
The results of block classification are shown in
Table 2
. For the sequence Pantomime, 77.3% of the blocks in the left view and 78.8% in the right view are smooth blocks. For the Kendo sequence, smooth blocks account for 69.1% in the left view and 61.2% in the right view. Although the proportion of smooth blocks for the Balloons sequence is relatively smaller, there are still 20.3% and 32.1% in the left and the right views respectively.
Block classification results
Block classification results
Fig. 4
shows the subjective block classification of the Pantomime sequence. Here, the smooth blocks are tagged in green. The figure clearly shows that most areas in both the left and right views of the depth image are smooth.
Subjective pantomime sequence: (a) left view, (b) right view.
 3.3 Edge block encoding and decoding
In the basic MDLVQ image encoding
[7]
, two important factors will affect the reconstruction image quality and the bit rate. The first one is the area of the hexagonal lattice (in Step 3), i.e., the quantization “volumesize” used in LVQ, and the other is the choice of sublattice index (in Step 4).
As the depth image contains no texture, only object edges, a quantizer designed according to the boundary contents seems to be essential. The edge of depth image is more aggressive than any other ordinary image edge, so it is easy to use an edge detection operator to extract the edge. Canny operator directly detects the weak and strong edge in two different thresholds; it can restrain noise and obtain an accurate edge. Therefore, to achieve accurate edge detection, a Canny operator is selected to extract the edge of depth image before LVQ, as shown in
Fig. 3
.
The lattice
A
_{2}
is the space that can be spanned by two vectors (1, 0) and
and thus the area of the hexagonal lattice can be determined by the two vectors. However, we can retain the shape of the hexagonal lattice and change its area by multiplying the generator matrix
U
by a factor
δ
,(
δ
∈
R
,
δ
˃0) . If there is more edge information in one block, then the factor
δ
value is also relatively smaller. The parameter
δ
in the LVQ is similar to the stepsize in scalar quantization (SQ). The central distortion
D
_{0}
and its associated bit rate can be adjusted by changing
δ
.
Finding the optimal parameters
δ
and
N
is necessary to strike the best tradeoff among central distortion, side distortion, and their associated bit rates. With the analysis of analogies between MDLVQ and MDSQ, we can perform the optimization of parameters
δ
and
N
in MDLVQ encoding similar to the optimization method for MDSQ encoding in
[5]
. Therefore, the MD design problem can be formulated as yielding optimal performance in the presence of the constraints of side distortion and its bit rate. To facilitate the description, consider the following definitions.
Let
I
denote an image, and
M
={m
_{1}
,m
_{2}
,...,m
_{i}
,} denote edge blocks after the block classification.
Let
δ_{mi}
refer to the magnified degree of the lattice area (i.e., quantization “volumesize”) used for the corresponding block.
N_{m}
= {
N_{mj}

j
=1,2,...,
i
} represents the set of the index numbers used in the labeling function for different edge blocks.
Let
D
_{0}
= (
M
,
δ_{m}
,
N_{m}
),
D
_{1}
= (
M
,
δ_{m}
,
N_{m}
) and
D
_{2}
= (
M
,
δ_{m}
,
N_{m}
) denote the mean squared errors (MSE) from the central decoder and the side decoders for the input image.
Let
R
_{1}
(
M
,
δ_{m}
,
N_{m}
) and
R
_{2}
(
M
,
δ_{m}
,
N_{m}
) denote the number of bits required to encode each description of
I
using the given central quantizer and index assignments.
Then, our goal is to find a pair (
δ_{m}
,
N_{m}
) to solve
subject to
where the userspecified parameters are
R_{budget}
(the available bit rate to encode each description) and
D_{budget}
(the maximum distortion acceptable for singlechannel reconstructions).
Next, we present an algorithm to find parameters that can solve (2)–(4) with the constraints on the bit rate per channel and the side distortion. Here,
δ_{m}
and
N_{m}
are adjusted accordingly to minimize the central distortion.
The basic idea is to take advantage of the monotonicity of both
R
and
D
as functions of
δ_{m}
. First, after initialization a relatively minimal
δ_{m}
is searched to minimize
D
_{0}
subject to Condition 1. Second, according to condition 2,
N_{m}
can be updated sequentially from a low index to high ones. The updated
N_{m}
then affects
R
_{1}
(
M
,
δ_{m}
,
N_{m}
) and
R
_{2}
(
M
,
δ_{m}
,
N_{m}
) in Condition 1 and, in turn,
δ_{m}
will be updated to minimize
D
_{0}
further. Thus, the two steps will be iterated to update
δ_{m}
and
N_{m}
until
D
_{0}
has little change. A pseudocode description of the proposed algorithm is presented below.
4. Experimental Results and Analysis
To highlight the performance of the proposed scheme, the experiments are implemented on three standard sequences of depth images, including the Balloons sequence (1024 x 768), Kendo sequence (1024 x 768), and Pantomime sequence (1280 x 960). This paper focuses on not only the comparison of our proposed optimized scheme against the conventional MDLVQ , but also the same experimental setup for the MDLVQ scheme in
[7]
, which is based on wavelet domain. To prove the universality of the experiment, four groups of data in each sequence are selected for comparison. According to the MDC quality assessment, we compare not only the rate central distortion performance when two descriptions can be received correctly, but also the rate side distortion performance when only one description can be received. The comparison for depth images is presented in
Fig. 5
, where the horizontal axis represents the bit rate, and the vertical axis represents the PSNR values. Here, one view is chosen for each three sequences to compare: the 1st view of the Balloons and Kendo, and the 37th view of the Pantomime.
Objective quality comparison for the depth image sequences Balloons, Kendo, and Pantomime. (a), (b), and (c): rate side distortion performance; (d), (e), and (f): rate central distortion performance.
Given that the depth map is not directly used for display, the objective and subjective qualities of the rendered virtual views should be taken into account. In the objective aspect, the synthesized virtual viewpoint image can be achieved by two original camera images. For example, for the tested sequences Balloons and Kendo, the depth and texture from the 1st and 3rd views can be used to synthesize the texture of the 2nd view, while for the sequence Pantomime, the depth and texture from the 37th and 39th views can generate the texture of the 38th view. A comparison of the synthesized images is presented in
Fig. 6
.
Objective quality comparison for the synthesized virtual viewpoint sequences of Balloons, Kendo, and Pantomime. (a), (b), and (c): rate side distortion performance; (d), (e), and (f): rate central distortion performance.
The figures show that the PSNR values reconstructed by the proposed scheme can be improved significantly in both single and central channels.
Fig. 5
has shown the objective quality comparison for the three tested sequences. In the single channel, compared with the basic MDLVQ, the proposed scheme can achieve 3.86.4 dB improvements for “Balloons”, 3.85.5 dB for “Kendo” and 3.35.8 dB for “Pantomime”. Compared with the reference scheme in
[7]
, the proposed scheme can obtain 0.60.8 dB gains for “Balloons”, 0.91.4 dB for “Kendo” and 6.06.5 dB for “Pantomime”. At the same time, in the central channel, compared with the basic MDLVQ, the proposed scheme can achieve 4.57.8 dB improvements for “Balloons”, 4.95.7 dB for “Kendo” and 5.08.0 dB for “Pantomime”. Compared with the reference scheme in
[7]
, the proposed scheme can obtain 3.64.5 dB gains for “Balloons”, 3.34.4 dB for “Kendo” and 4.75.0 dB for “Pantomime”.
As for the synthesized virtual viewpoint sequences, in
Fig.6
we can see clearly that the proposed scheme outperforms the two schemes we choose to compare. In the single channel, the proposed scheme can gain around 1.43.9 dB for “Balloons”, 3.27.1 dB for “Kendo”, and 2.13.4 dB for “Pantomime”, compared against the basic MDLVQ. And the proposed scheme can obtain around 2.62.8 dB for “Balloons”, 2.72.8 dB for “Kendo”, and 7.67.8 dB for “Pantomime”, compared against the reference scheme in
[7]
. Furthermore, in the central channel, the proposed scheme can achieve around 0.82.8 dB for “Balloons”, 4.57.0 dB for “Kendo”, and 5.27.9 dB for “Pantomime”, compared against the basic MDLVQ. And the proposed scheme can obtain around 3.54.9 dB for “Balloons”, 3.24.5 dB for “Kendo”, and 4.54.9 dB for “Pantomime”, compared against the reference scheme in
[7]
.
Furthermore, the advantages of the proposed scheme can be more clearly seen in
Fig. 7
, in which the subjective quality of the synthesized virtual viewpoint of Balloons is presented, especially in some parts denoted by red rectangle.
Subjective quality comparison of synthesized virtual viewpoint for the Balloons sequences. (a), (c), and (e): the proposed scheme; (b), (d), and (f): basic MDLVQ.
5. Conclusion
An LVQbased MD depth image coding scheme was developed in this study. An effective optimization scheme in LVQ encoding was accommodated in the proposed system to achieve better rate and central/side distortion performance. By considering the smooth blocks and the appropriate QP assignment, we can clearly see that the proposed MDLVQ demonstrates superior ratedistortion performance and considerably lower bit rate. The PSNR of reconstruction image by the proposed scheme improves significantly at the same bit rate. Thus, our proposed scheme is clearly a worthy choice for depth map coding.
BIO
Huiwen Zhang received her B.S. degree from Hebei Normal University, China, in 2013. She is currently a Master student in Beijing Jiaotong University, China. Her research interests are 3D image compression and transmission.
Huihui Bai received her B.S. degree from Beijing Jiaotong University, China, in 2001, and her Ph.D. degree from Beijing Jiaotong University, China, in 2008. She is currently an associate professor in Beijing Jiaotong University. She has been engaged in R&D work in video coding technologies and standards, such as HEVC, 3D video compression, multiple description video coding (MDC), and distributed video coding (DVC).
Meiqin Liu received her B.S. degree from Changsha University of Science and Technology, China, in 2004, and her M.E. degree from Beijing Jiaotong University, China, in 2007. She is currently a lecturer in Beijing Jiaotong University. Her research interests are depth image coding and transmission.
Yao Zhao received his Ph.D. degree from Beijing Jiaotong University, China, in 1996. He worked as a postdoctoral researcher in Delft University of Technology in Netherlands from 2001 to 2002. His research interests include image and video coding, digital watermark, and contentbased video retrieval. He has published more than 130 papers in IEEE Transactions on CSVT, Electronics Letters, Signal Processing: Image Communication, and so on. He has finished 25 national projects sponsored by NSFC, 973 program, and 863 program.
Zhu C.
,
Zhao Y.
,
Yu L.
,
Tanimoto M.
2012
“3DTV System with DepthImageBased Rendering: Architectures,”
Techniques and Challenges
Wang Y.
,
Reibman A. R.
,
Lin S.
2005
“Multiple description coding for video delivery,”
Proceedings of the IEEE
93
(1)
57 
69
DOI : 10.1109/JPROC.2004.839618
Vaishampayan V.
1993
“Design of multiple description scalar quantizers,”
IEEE Trans. on Information Theory
39
(3)
821 
834
DOI : 10.1109/18.256491
Hsia Y.
,
Liao J.
2010
“Multipledescription iterative coding image watermarking,”
Digital Signal Processing
20
(4)
1183 
1195
DOI : 10.1016/j.dsp.2009.12.011
Servetto S.
,
Ramchandran K.
,
Vaishampayan V.
,
Nahrstedt K.
2000
“Multiple description wavelet based image coding,”
IEEE Trans. on Image Processing
9
(5)
813 
826
DOI : 10.1109/83.841528
Servetto S.
,
Vaishampayan V.
,
Sloane N.
“Multiple description lattice vector quantization,”
in Proc. of IEEE Data Compression Conf.
Snowbird, UT
1999
13 
22
Bai H.
,
Zhu C.
,
Zhao Y.
2007
“Optimized multiple description lattice vector quantization for wavelet image coding,”
IEEE Trans. on Circuits Systems for Video Technology
17
(7)
912 
917
DOI : 10.1109/TCSVT.2007.898646
Diggavi S.
,
Sloane N.
,
Vaishampayan V.
2002
“Asymmetric multiple description lattice vector quantizers,”
IEEE Trans. on Information Theory
48
(1)
174 
191
DOI : 10.1109/18.971747
Akhtarkavan E.
,
Salleh M.
2012
“Multiple descriptions coinciding lattice vector quantizer for wavelet image coding,”
IEEE Trans. on Image Processing
21
(2)
653 
661
DOI : 10.1109/TIP.2011.2164419
Sloane J.
,
Conway N.
1998
Sphere Packings, Lattices and Groups
3rd ed.
SpringerVerlag
New York
108 
117
Vaishampayan V.
,
Sloane N.
,
Servetto S.
2001
“Multiple description vector quantization with lattice codebooks: design and analysis,”
IEEE Trans. on Information Theory
47
(5)
1718 
1734
DOI : 10.1109/18.930913