Advanced
Fast Encoder Design for Multi-view Video
Fast Encoder Design for Multi-view Video
KSII Transactions on Internet and Information Systems (TIIS). 2014. Jul, 8(7): 2464-2479
Copyright © 2014, Korean Society For Internet Information
  • Received : February 11, 2014
  • Accepted : May 10, 2014
  • Published : July 28, 2014
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Fan Zhao
Department of Information Science, Xi’an University of Technology, Xi’an 710048, China
Kaiyang Liao
Department of Information Science, Xi’an University of Technology, Xi’an 710048, China
Erhu Zhang
Department of Information Science, Xi’an University of Technology, Xi’an 710048, China
Fangying Qu
Department of Mathematics, Northwest University, Xi’an 710127, China

Abstract
Multi-view video coding is an international encoding standard that attains good performance by fully utilizing temporal and inter-view correlations. However, it suffers from high computational complexity. This paper presents a fast encoder design to reduce the level of complexity. First, when the temporal correlation of a group of pictures is sufficiently strong, macroblock-based inter-view prediction is not employed for the non-anchor pictures of B-views. Second, when the disparity between two adjacent views is above some threshold, frame-based inter-view prediction is disabled. Third, inter-view prediction is not performed on boundary macroblocks in the auxiliary views, because the references for these blocks may not exist in neighboring views. Fourth, finer partitions of inter-view prediction are cancelled for macroblocks in static image areas. Finally, when estimating the disparity of a macroblock, the search range is adjusted according to the mode size distribution of the neighboring view. Compared with reference software, these techniques produce an average time reduction of 83.65%, while the bit-rate increase and peak signal-to-noise ratio loss are less than 0.54% and 0.05dB, respectively.
Keywords
1. Introduction
A s a result of their greatly enhanced viewing experience and high interactivity, 3D video and free viewpoint television are attracting attention in various industries and research institutes [1 - 3] . Multiple synchronized cameras are usually used to capture the same scene from different viewpoints to form a 3D system, and the resulting multi-view video brings not only a whole new 3D impression, but also a large amount of data for storage and transmission. Encoding all the views efficiently is a crucial issue for future multi-view applications. Multi-view video coding (MVC) is a key technique for distributing multi-view video content through networks withlimited bandwidth. MVC was developed to improve coding efficiency. In 2005, MPEG and ITU-T started the procedure of MVC standardization [4 - 5] . Although the standard activity for MVC (multi-view extension of H.264/AVC) is largely complete, there is a lot of room for improvement in reducing its complexity.
By fully utilizing the redundancy amongst temporally successive frames and adjacent views, many practical schemes for multi-view video compression have been proposed. Their common feature is the flexible use of prediction schemes. These typical prediction structures include the simulcast scheme, the CIP (Cross Island Parkway) scheme, and the HHI (Heinrich Hertz Institute) scheme [6 - 10] . The simulcast scheme encodes each single view independently using the hierarchical B-pictures, and is supported by H.264/AVC. Its performance is generally lower, for it does not employ inter-view correlations. The CIP scheme employs inter-view prediction only for the first picture of every GOP (Group of Pictures), whereas other pictures are encoded using the same prediction technique as in the simulcast scheme. Thus, its coding performance is only improved to a very limited extent. The HHI scheme, proposed by Fraunhofer HHI, exhibits the best compression performance by fully utilizing the prediction both in the time axis and in the view direction [11] . Its prediction structure is shown in Fig. 1 [6 , 11 , 15] , where Sn denotes the n th individual view sequence and Tn denotes the n th time-instant frame. All the pictures in Fig. 1 make up an MVC prediction unit, in which hierarchical B-pictures are used for each view, as in H.264, while in the view direction, only the I-, P-, and B-pictures are employed. In H.264, the prediction unit is applied to each GOP, whereas in MVC, the prediction unit operates on a GoGOP (group-of-GOPs). In an MVC prediction unit, views are named after the starting picture. For example, views starting withpicture I 0 are called I-views, like S 0 ; views starting withpicture B 0 are called B-views, such as S 3 and S 5 ; and views starting withpicture P 0 are called P-views, such as S 2 , S 4 , S 6 , and S 7 . Although the HHI prediction structure exhibits the best coding performance, its complexity is too high. Thus, algorithms for practical applications must speed up the prediction process, especially in the view direction. This is because there are more reference frames in the MVC scheme, and predicting B-views is more complex than for other views.
PPT Slide
Lager Image
JMVM reference prediction structure.
MVC improves the coding efficiency by utilizing motion- and disparity-compensated prediction (M/DCP). However, the complexity of inter-frame prediction is very high, especially when rate-distortion optimization is used. Temporal prediction is applied first, and inter-view prediction will be skipped if the sum of absolute differences of the temporal prediction is small [12 - 13] . A fast strategy for deciding between MCP and DCP was designed in [14] , where the GOP size of each view is set to four. This short GOP size hinders the full utilization of correlation in the temporal direction. At the same time, the motion vectors (MVs) must be stored, and the locations of static macroblocks (MBs) of the base view and anchor pictures are needed, so the search for co-located MBs takes a considerable amount of time. Reference [15] investigated the influence of removing inter-view prediction from the higher temporal decomposition levels, but failed to implement their algorithm properly. Fortunately, based on an analysis of the contribution of inter-view prediction to the coding gain at different temporal layers, a simplified prediction structure was proposed in [16] , in which inter-view prediction was disabled if the temporal dependency dominates the current view. However, this is based on the assumption that the contribution of inter-view prediction is less in higher temporal layers. However, inter-view estimation works well at higher temporal decomposition levels, and the experimental results in [17] indicate that inter-view prediction is often more efficient than temporal prediction for a significant number of blocks under fast camera motion in the temporal direction.
Most previous complexity reduction techniques for fast MVC focus on the selection of mode size, prediction direction, reference frames, and search range. Experiments have shown that most MBs in regions withhomogeneous motion or relatively static backgrounds would select the large mode size (16×16) for motion estimation (ME), while only MBs in the region withcomplex motion need disparity estimation (DE) and small mode size ME. To reduce the computational complexity of MVC, [18] utilized the spatial property of the motion field to limit the candidate prediction modes to a small subset. Zhang et al. noted that there is a high probability that MBs withsmaller partition sizes will eventually select the same reference frame and prediction direction as those withlarger partition sizes [19] . Thus, they proposed an algorithm for B-pictures in which MBs withsmaller mode sizes follow the decisions of MBs withhigher mode sizes to select the best prediction direction and reference frame. The approach in [20] increases the overall speed of MVC by reducing the size of the search range for both ME and DE in regions withhomogeneous motion. Inter-view SKIP mode correlation has been exploited by Shen et al. [21] . Reference [22] reports a low-complexity DE technique that chooses the previous disparity vector (PDV) instead of the median prediction vector as the search center for DE. By combining the MVC methods in [19 - 21] withPDV-DE, a novel complexity reduction technique can be derived. The correlation of rate distortion (RD) costs among inter-views in SKIP mode are studied and used to reduce the computational complexity of MVC in [23] . Khattak et al. have used the correlation between the RD costs of the SKIP mode in neighboring views and Bayesian decision theory to reduce the number of candidate coding modes for a given MB [24] . A hybrid optimal stopping model to solve the mode decision problem has been developed, whereby the predicted mode probability and estimated coding time are jointly investigated withinter-view correlations [25] . Yeh et al. proposed a fast mode decision algorithm to avoid the high computational complexity of MVC [26] . The minimum and maximum values of the RD cost in the previously encoded view are used to compute a threshold for each mode in the current view. Using these thresholds, the mode decision process becomes more efficient. A fast mode decision process is adopted to reduce the complexity of HEVC (the High Efficiency Video Coding) [30] .
The frame at instant T 6 would select its reference from pictures at T 0 and T 12 , witha GOP length span. The distribution of the encoding modes for MBs at T 6 reflects the temporal correlation in a given GOP, and our first contribution uses this fact to develop a method of MB-based inter-view prediction skipping. Inter-view correlations are high in a GOP (or a half-GOP). Pictures at instants T 0 and T 12 are always processed before those between them (i.e., T 1 to T 11 ) for any of the views according to the encoding order. Thus, our second contribution is to use the inter-view dependencies at instant T 0 or T 12 to judge whether frame-based inter-view prediction can be skipped at other instants. For the sake of the arrangement of multi-view cameras, boundary MBs may not find their reference in the neighboring views. Our third contribution is inter-view prediction skipping for these boundary MBs. To realize a low coding gain, finer partitions are skipped in the inter-view prediction for MBs in static image areas, such as those in the background. Because of the correlation in the mode size distribution of MBs in adjacent views, we propose to dynamically adjust the search range for the current MB.
The paper is organized as follows. The framework of our flexible prediction selection method is proposed in Section 2. Experimental results are presented in Section 3, and some conclusions are drawn in Section 4.
2. Flexible Prediction Selections in MVC
For convenience, we refer to a picture by both its view and temporal indices. Si / Tn (0≤ i I ,0≤ n N ) represents the picture of the i th view at instant Tn , where I and N represent the number of views and GOP length in an MVC unit, respectively. For simplicity, a prediction structure withthree views is shown in Fig. 2 . Pictures Si / T 0 and Si / T 12 are called the anchor pictures, and the others are non-anchor pictures. The flexible prediction selections are detailed as follows.
PPT Slide
Lager Image
JMVM reference prediction structure withthree views.
- 2.1 MB-based inter-view prediction skipping for non-anchor pictures of B-views
As for the B-view pictures, the reference selection method for anchor pictures is the same as in JMVM. The MB-based prediction of non-anchor pictures of B-views proposed here differs from the method employed in JMVM. For MBs of B-views at all instants except T 0 , T 6 , and T 12 , inter-view prediction is not employed if the temporal correlation of the encoding GOP unit is sufficiently strong. The temporal correlation is obtained by analyzing the prediction modes of the MBs at instant T 6 , and this is derived before coding the current picture. First, we use the flag “ flag_skip ” to indicate whether inter-view prediction should be used. A value of 0 means inter-view prediction is employed, and 1 denotes that its use will be reconsidered. Two parameters are introduced to determine the value of flag_skip after coding picture S 1 / T 6 . One is the ratio of the number of intra-coded MBs of the picture at instant T 6 , which is labeled Ratio_Intra_T 6 . The other is the ratio of the number of MBs of picture S 1 / T 6 whose references are at S 1 / T 0 and those whose references are at S 1 / T 12 , which is labeled Ratio_For_Bac . There exists a certain degree of temporal correlation at the current GOP if Ratio_Intra_T 6 is lower than the predefined threshold thresh_Intra . In such cases, stronger temporal correlation exists in the half of this GOP that depends on Ratio_For_Bac . If Ratio_For_Bac is more than a predefined thresh_Upper , the period from T 0 to T 6 exhibits more temporal correlation than the period from T 6 to T 12 . If Ratio_For_Bac is lower than a predefined thresh_Lower , the period from T 6 to T 12 is more temporally correlated than the period from T 0 to T 6 . If Ratio_For_Bac is between thresh_Lower and thresh_Upper , the temporal correlation of the period from T 0 to T 6 is assumed to be very similar to that of T 6 to T 12 . The flag_skip value is determined by combining these two parameters, as shown in Fig. 3 . The coding performance of temporal prediction is better than that of inter-view prediction for static regions [12] . Hence, the MV of the corresponding MB is used to further determine whether inter-view prediction is employed when flag_skip = 1. In other words, when predicting the non-anchor pictures of B-views, the two temporal reference pictures are definitively selected, but the two inter-view reference pictures, say S 0 / Tn or S 2 / Tn , are chosen as follows. First, a prediction is made using the forward reference picture in the temporal direction. If the MV is (0, 0), inter-view prediction using S 0 / Tn as a reference is skipped; otherwise, inter-view prediction is performed. Second, a prediction is made using the backward reference picture in the temporal direction. If the MV is (0, 0), inter-view prediction using S 2 / Tn as a reference is skipped; otherwise, inter-view prediction is performed.
PPT Slide
Lager Image
Pseudo-code for determining flag_skip.
We conducted statistical tests on four multi-view video (MVV) sequences, “Ballroom”, “Exit”, “Rena”, and “Race1”, using a JMVM encoder to examine Ratio_Intra_T 6 and Ratio_For_Bac . The statistical results for the two parameter distributions at instant T 6 are shown in Table 1 . On average, the percentage of MBs is 58.75 when Ratio_Intra_T 6 was less than 0.1 The percentage of MBs is 36.25 when Ratio_Intra_T 6 is in [0.1, 0.2], The percentage of MBs is 5 when Ratio_Intra_T 6 is greater than 0.2 . Thus, Ratio_Intra_T 6 at instant T 6 is less than 0.2 for most MBs about 95% percent. Hence, we set thresh_Intra to 0.2. As the percentage of MBs is greater than 50 when Ratio_For_Bac is in [0.8, 1.25], we set thresh_Lower and thresh_Upper to 0.8 and 1.25, respectively.
Statistical analysis of parameter distributions
PPT Slide
Lager Image
Statistical analysis of parameter distributions
The proposed scheme is realized in the MVC reference software JMVM 8.0 [27] . The features of six test datasets are listed in Table 2 . The experimental results are presented in Table 3 , where “DPSNR”, “Dbitrate (%)”, and “Dtime (%)” represent the peak signal-to-noise ratio (PSNR) change, percentage bitrate change, and the percentage change in total coding time, respectively, between our method and the reference software. From Table 3 , we can see that the average time reduction is 25.41%. The average PSNR loss is 0.03 dB, and the increase in bitrate is about 0.29% on average. The proposed algorithm exhibits consistent speedup for all the sequences, witha minimum gain of 21.38% for the “Ballroom” sequence and a maximum gain of 29.04% for “Rena”. These data show that the algorithm and parameter thresholds are feasible and effective for all video sequences.
Features of the test datasets
PPT Slide
Lager Image
Features of the test datasets
Experimental results: difference between our method and the reference software JMVM
PPT Slide
Lager Image
Experimental results: difference between our method and the reference software JMVM
- 2.2 Frame-based inter-view prediction skipping when large differences exist between adjacent views
As can be seen from Fig. 1 , the pictures at instant T 0 and T 12 are always processed before T 1 - T 11 for any of the views according to the encoding order. Hence, we can use the statistical dependencies of inter-view pictures at instants T 0 and T 12 to estimate those at other instants. To simplify the description, we take the basic framework with S 0 , S 1 and S 2 as an example, as shown in Fig. 2 . Let RNum _ s 0 _ s 1 ( T 0 ) denote the number of MBs withintra-mode at S 1 / T 0 by forward DE, (that is, estimating S 1 with S 0 ), and let RNum _ s 2 _ s 1 ( T 0 ) be the number of MBs with intra-mode by backward DE (that is, estimating S 1 with S 2 ). RNum _ s 0 _ s 1 ( T 12 ) and RNum _ s 2 _ s 1 ( T 12 ) denote similar quantities at T 12 . When the length of a GOP is sufficiently large, the inter-view disparity cannot be manifested. Thus, we separate the GOP into two halves, and consider the inter-view similarities independently. Because the inter-view similarities do not change significantly if the time interval of each half is short, the following hypothesis is put forward:
PPT Slide
Lager Image
PPT Slide
Lager Image
PPT Slide
Lager Image
PPT Slide
Lager Image
On the assumption of high inter-view similarities within a half-GOP unit, we propose the frame-based inter-view prediction scheme in Fig. 4 . That is, if the number of intra-coded MBs is beyond Cor_Thr at T 0 and T 12 , a large difference is considered to exist between the two adjacent views, and frame-based inter-view prediction is disabled accordingly.
PPT Slide
Lager Image
Frame-based inter-view prediction skipping scheme.
- 2.3 Inter-view prediction skipping for boundary MBs
MVVs are captured by various multiple camera arrangements, such as 1D and 2D arrays, 1D arcs, and so on. The arrangement of the multiple cameras means that some border MBs cannot find reference images in the adjacent views, regardless of whether the arrangement is regular. This is illustrated in Fig. 5 , which shows the current view picture and its inter-view reference picture as well as the predicted picture. The white blocks at the border of the predicted picture show that matching MBs cannot be found in the inter-view reference picture. Omitting the inter-view prediction for these MBs would undoubtedly reduce the computational load.
PPT Slide
Lager Image
(a) Current view, (b) inter-view reference picture, and (c) predicted picture.
Similar to temporal prediction, DE is also conducted by minimizing the Lagrange cost function, which is the sum of the distortion D and the rate R , weighted by the Lagrange factor λ . For each MB Bi of the current view, the DE algorithm chooses a disparity vector di = ( di_x,di_y ) within a search range in the adjacent view so as to minimize J :
PPT Slide
Lager Image
Here, the distortion D is calculated as the sum of the squared errors between the current block Bi and the previously reconstructed reference block B ʹ i in the adjacent view:
PPT Slide
Lager Image
The intra-mode is normally chosen when its Lagrange cost is minimized, even though the current MB can find a reference from adjacent views. Therefore, the MBs withintra prediction modes can be classified into two categories: those whose elements could not determine their references because of large distance displacements in the cameras, and those whose elements have a cost function relatively smaller than any of the inter-modes. A texture descriptor Text_D can be introduced to derive MBs of the first type:
PPT Slide
Lager Image
The inter-view prediction of the border MB is skipped if the prediction is in intra mode and the corresponding prediction distortion is large than the texture descriptor. The decision condition is added to the distortion function as follows:
PPT Slide
Lager Image
In comparison withthe reference software JMVM, the experimental results given by the programs proposed in Sections 2.2 and 2.3 are listed in Table 4 . Here, Cor_Thr is set to 0.25. The selection of the border MBs depends on the position relation of the cameras, which can be derived from the camera parameters. The border MBs are those within three MBs of the edge. The experimental results show an average runtime saving of 34.39% withonly 0.04dB reduction in PSNR and 0.48% increase in bit rate. An even greater time reduction is obtained for sequences containing fast motion, such as Race1.
Experimental results given by the proposed schemes compared with the reference software JMVM
PPT Slide
Lager Image
Experimental results given by the proposed schemes compared with the reference software JMVM
- 2.4 Finer partitions skipping in inter-view prediction for MBs in static image areas
The supporting block sizes of motion compensation in H.264 vary from 16×16 to 4×4, and the luminance samples have many options between the two. In general, a larger partition size (such as 16×16, 16×8, or 8×16) is appropriate for homogeneous areas of the frame, and a smaller partition size (such as 8×8, 8×4, 4×8, or 4×4) is likely to be beneficial to areas withrich textures. The MB partitioning method also adapts to the disparity in motion compensation. Note that static image areas, such as background, cause a fundamental disadvantage to inter-view prediction. Hence, only the large partition size is expected in static image areas for inter-view prediction, and the unnecessary DE using finer partitions will be canceled. MBs in static image areas are picked out by the following method. First, we change the prediction order by ranking the intra-mode options at the top of the queue, followed by the inter-mode options. Second, the optimal intra-mode is recorded as Mintra_ when all the intra-mode options are finished. Third, the current inter-view mode is labeled Minter_ after inter-view predictions withlarge partitions have been performed. Finally, whether the finer partitions are used for inter-view prediction depends on:
PPT Slide
Lager Image
Thus, for scenes withlarge areas consisting of static background, as many finer partitions as possible are canceled.
- 2.5 Disparity search range adjustment based on mode distribution correlation
In the JMVM reference software, prediction can be done for either a 16×16 MB or its sub-block partitions (16×8, 8×16, 8×8, 8×4, 4×8, and 4×4). Large modes are suitable for predicting regions of homogeneous motion, whereas small modes are appropriate for areas of complex motion. Because the mode size can reflect motion activity to a certain extent, the search range for the current MB can be dynamically adjusted according to the mode size distributions of MBs in the previously coded neighbor view.
The current MB and the corresponding MB in the neighboring view are shown in Fig. 6 . The modes of the corresponding MB and its eight neighbor MBs are used to estimate the motion activity of the current MB, thus determining its disparity search range. When all nine MBs have been predicted using either 16×16 or skip mode, the current MB is considered withthe simple mode. When at least one of the nine MBs is predicted using a small mode (16×8 or 8×16), the current MB is considered withthe medium mode. When at least one of the nine MBs is predicted using a smaller mode (8×8, 8×4, 8×8, or 4×4), it is considered withthe complex mode.
PPT Slide
Lager Image
Corresponding MB and neighboring MBs in previously coded view.
After the mode type of the current MB has been determined, the proposed disparity search strategy shown in Fig. 7 is applied. Two conditions are checked: (i) Does the frame belong to TL4 or TL3? (ii) Does the MB belong to a simple, medium, or complex mode? Different search ranges are selected according to the answers.
PPT Slide
Lager Image
Proposed disparity vectors search range adjustment algorithm. SR denotes the maximum search range, which is 64. As defined in [22], TL3 and TL4 are Temporal Level 3 and Temporal Level 4, respectively.
3. Experimental Results and Analysis
In this section, we evaluate the results of the overall algorithm, which incorporates the proposed five techniques. The comparison results of the overall algorithm and JMVM [27] are given in Table 5 , where BDPSNR and BDBR [28] are used to represent the average PSNR and bit rate differences withQP under 10, 20, 24, 28, 32, 36 and 40. The six test sequences [29] are evaluated in the experiment, which is run on a machine withIntel Core i7 CPU 860@2.8 GHz and 4 GB RAM.
Experimental results compared withthe reference software JMVM .
PPT Slide
Lager Image
Experimental results compared withthe reference software JMVM .
As can be seen from Table 5 , the overall algorithm reduces the encoding time by 83.65% on average and the coding efficiency loss is negligible, with0.05 dB PSNR loss or 0.54% bit rate increase. The results show that the proposed approach achieves a consistent gain in coding speed, withthe lowest gain of 70.32% for Race1 and the highest gain of 88.76% for AkkoKayo. The maximum PSNR loss is 0.13 dB and the maximum bitrate increment is 1.05%, which are negligible compared withthe saving in encoding time. The proposed algorithm exhibits considerable and consistent speedup for all sequences withlarge or small disparities, slow or fast motion. For example, for sequences containing still backgrounds and large disparities, such as Rena and Exit, the proposed algorithm can reduce the encoding time by more than 86%. For sequences containing complicated motion and less disparity, such as Race1 and Ballroom, the time saving is lower, but still significant, at about 70.32–82.87%.
To the best of our knowledge, the state-of-the-art results are those reported in [22] , which combines state-of-the-art methods ( [19] , [20] , and [21] ) withits PDV-DE technique. Compared withthe methods in [19] , [20] , and [21] , PDV-DE increases the time saving by only 1.5%. To evaluate the performance of each algorithm alone, we further compare the MVC methods in [19] and [20] withour overall MVC encoding technique proposed in Section 2. Three views from the standard test sequences Ballroom and Exit are used. For each sequence, results are obtained for QP values of 20, 24, 28, 32, and 36.
In addition to JMVM, we also compare our algorithm to MVC methods in [19] and [20] in Table 6 . Compared to MVC methods in [19] and [20] , the overall algorithm can obtain better results on average in the evaluated performances. The proposed algorithm reduces 86.17% coding time for these two sequences while [19] and [20] only reduce 69.6% and 81.03% coding time respectively. Meanwhile, the proposed algorithm can achieve a better RD performance with0.7% bit rate increase. Note that our method is particularly useful for sequences withlarge disparities. For example, the time saving for “Exit” was over 4% greater than that for “Ballroom”, which has less disparity
Experimental results of proposed overall algorithm and the methods in[19]and[20].
PPT Slide
Lager Image
Experimental results of proposed overall algorithm and the methods in [19] and [20].
4. Conclusion
A fast encoder design for MVC has been proposed. The primary contributions are our frame- and MB-based inter-view prediction skipping techniques, which are based on the statistical properties of previous intra-coded MBs. Based on the mode size distribution of MBs in the previously coded neighbor view, we also developed a dynamical adjustment method for the search range of the current MB. Experimental results showed that the proposed algorithm can significantly reduce the computational complexity of MVC and maintain almost the same coding efficiency, demonstrating its applicability to various types of videos.
BIO
Fan Zhao received a Ph.D. in Information and Communication Engineering from Xi’an Jiaotong University, Xi’an, China, in 2009. She is an associate professor in the Department of Information Science at Xi'an University of Technology, Xi'an, China, and is now working as a postdoctoral fellow in the Department of Computer Science and Engineering, Xi’an Jiaotong University. Her research interests include image processing, video compression, and pattern recognition. (E-mail: vcu@xaut.edu.cn).
Kaiyang Liao received a Ph.D. in Information and Communication Engineering from Xi’an Jiaotong University, Xi’an, China, in 2013. He is currently a lecturer with the School of Printing and Packaging Engineering, Xi'an University of Technology, Xi'an, China. His research interests include data mining, pattern recognition, video analysis and retrieval.
(E-mail: liaokaiyang@xaut.edu.cn).
Erhu Zhang received a Ph.D. in Biomedical Engineering from Xi’an Jiaotong University in 2008. He is currently a professor in the Department of Information Science, Xi’an University of Technology, Xi’an, China. His research interests include digital image processing, pattern recognition, and intelligence information processing.
(E-mail: eh-zhang@xaut.edu.cn)
Fangying Qu is currently studying for a Bachelor's degree in Applied Mathematics at the Northwestern University, Xi'an, China.
(E-mail: fangyingqu@gmail.com)
References
2003 ISO/IEC 14 496-10, “Information Technology-Coding of Audio-Visual Objects-Part 10: Advanced Video Coding” Final Draft International Standard Article (CrossRef Link)
2003 ISO/IEC JTC/SC29/WG11, “Report on 3DTV Exploration” N5878 Article (CrossRef Link)
Tanimoto Masayuki 2006 “Overview of Free Viewpoint Television” Signal Processing: Image Communication Article (CrossRef Link) 21 (6) 454 - 461    DOI : 10.1016/j.image.2006.03.009
Ho Y.-S. , Oh K.-J. 2007 “Overview of multi-view video coding” in Proc. of IEEE Int. Workshop on Signal, System, and Image Processing June Article (CrossRef Link) 5 - 12
2007 Joint Video Team of ISO/IEC & MPEG & ITU-T VCEG, “Joint Draft 4.0 on Multi-view Video Coding” Doc. JVT-X209 Article (CrossRef Link)
Pan Huaiyi , Pan Feng 2008 “Development of Multi-view Video Coding Using Hierarchical B Pictures” in Congress on Image and Signal processing May vol.1, Article (CrossRef Link) 497 - 503
Merkle Philipp , Smolic Aljoscha , Müller Karsten , Wiegand Thomas 2007 “Efficient Prediction Structures for Multiview Video Coding” IEEE Transactions on circuits and systems for video technology Article (CrossRef Link) 17 (11) 1461 - 1473    DOI : 10.1109/TCSVT.2007.903665
2006 ISO/IEC JTC1/SC29/WG11, “Multi-view Coding using AVC” JVT Doc. m12945 Article (CrossRef Link)
Leontaris Athanasions , Cosman Pamela C. 2007 “Compression Efficiency and Delay Tradeoffs for Hierarchical B-Pictures and Pulsed-Quality Frames” IEEE Transactions on image processing Article (CrossRef Link) 16 (7) 1726 - 1740    DOI : 10.1109/TIP.2007.896681
Schwarz Heiko , Marpe Detlev , Wiegand Thomas 2005 “Hierarchical B Pictures” ITU-T and ISO/IEC Joint Video Team, Doc. P014 Article (CrossRef Link)
2006 ISO/IEC JTC1/SC29/WG11, “Description of Core Experiments in MVC” MPEG2006/W7798 Article (CrossRef Link)
Ding LiFu , Chien ShaoYi , Huang YuWen , Chang YuLin , Chen LiangGee 2005 “Stereo video coding system with hybrid coding based on joint prediction scheme” IEEE International Symposium on Circuits and Systems May vol. 6, Article (CrossRef Link) 6082 - 6085
Ding L.-F. , Chien S.-Y. , Chen L.-G. 2006 “Joint prediction algorithm and architecture for stereo video hybrid coding systems” IEEE Trans. Circuits and Systems for Video Technology Article (CrossRef Link) 16 (11) 1324 - 1337    DOI : 10.1109/TCSVT.2006.883510
Lin Jheng-Ping , Tang A.C.-W. 2009 “A fast direction predictor of inter frame prediction for multi-view video coding” IEEE International Symposium on Circuits and Systems Article (CrossRef Link) 2589 - 2592
2006 ISO/IEC JTC1/SC29/WG11, “Core Experiments on view-temporal prediction structures” MPEG2006/M13196 Article (CrossRef Link)
Huo Junyan , Chang Yilin , Ph. D. Thesis 2008 “Study on improving the coding efficiency of multiview video coding” Xi'an University of Electronic Science and Technology Ph. D. Thesis Article (CrossRef Link)
Fan Zhao , Guizhong Liu , Feifei Ren , Zhang Na 2009 “Flexible predictions selection for multi-view video coding” in Proc. of 2009 Data Compression (DCC) March Article (CrossRef Link) 471 - 471
Shen Liquan , Liu Zhi , Liu Suxing , Zhang Zhaoyang , An Ping 2009 "Selective Disparity Estimation and Variable Size Motion Estimation Based on Motion Homogeneity for Multi-View Coding" IEEE Transactions on Broadcasting Article (CrossRef Link) 55 (4) 761 - 766    DOI : 10.1109/TBC.2009.2030453
Zhang Y. , Kwong S. , Jiang G. , Wang H. 2011 "Efficient multi-reference frame selection algorithm for hierarchical B pictures in multi-view video coding" IEEE Transaction on Broadcasting Article (CrossRef Link) 57 (1) 15 - 23    DOI : 10.1109/TBC.2010.2082670
Shen L. , Liu Z. , Yan T. , Zhang Z. , An P. 2010 "View-adaptive motion estimation and disparity estimation for low-complexity multiview video coding" IEEE Transactions on Circuits Systems for video technology Article (CrossRef Link) 20 (6) 925 - 930    DOI : 10.1109/TCSVT.2010.2045910
Shen L. , Liu Z. , Yan T. , Zhang Z. , An P. 2010 "Early SKIP mode decision for MVC using inter-view correlation" Signal Processing: Image Communication Article (CrossRef Link) 25 88 - 93    DOI : 10.1016/j.image.2009.11.003
Khattak S. , Hamzaoui R. , Ahmad S. , Frossard P. 2012 "Low-complexity multi-view video coding" in Proceedings of Picture Coding Symposium (PCS'2012) May Article (CrossRef Link) 97 - 100
Shen Liquan , Liu Zhi , An Ping , Ma Ran , Zhang. Zhaoyang 2011 "Low-Complexity Mode Decision for MVC" IEEE Transactions on Circuits and Systems for Video Technology Article (CrossRef Link) 21 (6) 837 - 843    DOI : 10.1109/TCSVT.2011.2130310
Khattak Shadan , Hamzaoui Raouf , Maugey Thomas , Ahmad Shakeel , Frossard Pascal 2013 “Bayesian Early Mode Decision Technique for View Synthesis Prediction-Enhanced Multiview Video Coding” IEEE Signal Processing Letters Article (CrossRef Link) 20 (11) 1126 - 1129    DOI : 10.1109/LSP.2013.2281607
Zhao Tiesong , Kwong S. , Wang Hanli , Wang Zhou , Pan Zhaoqing , Kuo C.-C.J. 2013 "Multiview Coding Mode Decision With Hybrid Optimal Stopping Model" IEEE Transactions on Image Processing Article (CrossRef Link) 22 (4) 1598 - 1609    DOI : 10.1109/TIP.2012.2235451
Yeh Chia-Hung , Li Ming-Feng , Chen Mei-Juan , Chi Ming-Chieh , Huang Xin-Xian , Chi Hao-Wen 2014 “Fast Mode Decision Algorithm Through Inter-View Rate-Distortion Prediction for Multiview Video Coding System” IEEE Transactions on Industrial Informatics Article (CrossRef Link) 10 (1) 594 - 603    DOI : 10.1109/TII.2013.2273308
2008 ISO/IEC Standard ITU-T VCEG, “Joint Draft 8.0 on Multiview Video Coding” JVT-AB204 Article (CrossRef Link)
Bjontegaard G. 2001 “Calculation of average PSNR difference between RD curves” ITU-T Q.6/SG16 VCEG 13th Meeting, Doc. VCEG-M33 April Article (CrossRef Link)
2006 ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, “Common test conditions for multiview video coding” Doc. JVT-T207 Article (CrossRef Link)
Huang Han , Zhao Yao , Lin Chunyu , Bai Huihui 2014 “Fast Intraframe Coding for High Efficiency Video Coding” KSII Transactions on Internet and Information Systems Article (CrossRef Link) 8 (3) 1093 - 1104    DOI : 10.3837/tiis.2014.03.022