Advanced
A Study on Projection Conversion for Efficient 3DoF+ 360-Degree Video Streaming
A Study on Projection Conversion for Efficient 3DoF+ 360-Degree Video Streaming
Journal of Broadcast Engineering. 2019. Dec, 24(7): 1209-1220
Copyright © 2016, Korean Institute of Broadcast and Media Engineers. All rights reserved.
This is an Open-Access article distributed under the terms of the Creative Commons BY-NC-ND (http://creativecommons.org/licenses/by-nc-nd/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited and not altered.
  • Received : October 01, 2019
  • Accepted : October 24, 2019
  • Published : December 01, 2019
Download
PDF
e-PUB
PubReader
PPT
Export by style
Article
Author
Metrics
Cited by
About the Authors
Jong-Beom, Jeong
Department of Computer Education, Sungkyunkwan University
Soonbin, Lee
Department of Computer Engineering, Gachon University
Dongmin, Jang
Department of Computer Education, Sungkyunkwan University
Sungbin, Kim
Department of Computer Engineering, Gachon University
Sangsoon, Lee
Department of Computer Engineering, Gachon University
Eun-Seok, Ryu
Department of Computer Education, Sungkyunkwan University
sryu@skku.edu

Abstract
The demand for virtual reality (VR) is rapidly increasing. Providing the immersive experience requires much operation and many data to transmit. For example, a 360-degree video (360 video) with at least 4K resolution is needed to offer an immersive experience to users. Moreover, the MPEG-I group defined three degrees of freedom plus (3DoF+), and it requires the transmission of multiview 360 videos simultaneoulsy. This could be a burden for the VR streaming system. Accordingly, in this work, a bitrate-saving method using projection conversion is introduced, along with experimental results for streaming 3DoF+ 360 video. The results show that projection conversion of 360 video with 360lib shows a Bjontegaard delta bitrate gain of as much as 11.4%.
Keywords
Ⅰ. Introduction
Recently, the demand for efficient virtual reality (VR) technology has been increasing, because the scale of the VR market has expanded. For an immersive experience for users with head-mounted display (HMD) in VR, at least a video resolution of 4K is required. As a result, the number of transmission data and amount of calculation become a huge burden for the system. Especially on mobile platforms, because the amount of computing power is limited, asymmetric multicore processing [9] and 360-degree video (360 video) streaming over milimeter-wave communication [5] have been proposed. Similarly, tile-based 360 video streaming [11] [13] has been proposed to process tiles that are needed by users. To overcome the difficulties, the moving picture experts group (MPEG) established MPEG-I to support standardization for immersive media until 2021. For example, bitrate-efficient technologies such as a motion-constrained tile set (MCTS) [10] , were proposed for MPEG-I and implementation was reported [12] .
In MPEG-I, the degrees of freedom (DoF) for immersive media is divided into 3DoF, 3DoF+, and 6DoF. Among them, 3DoF+ enables a user sitting in a chair to watch a video with limited movement. For this, multiple 360 video transmission is required, which means that the individual quality of video is declines, because there are a number of videos to be sent, and the bandwidth is limited. Accordingly, a method of saving the bitrate by down-sampling the 3DoF+ 360 video was proposed [2] . However, if the 360 video is down-sampled, loss of the information of the original video must occurs. To minimize this problem, a bitrate-saving method is proposed for 3DoF+ 360 video transmission with appropriate projection conversion in 360 omnidirectional video. Because the 360 video can be represented by various projections, it is possible to reduce the size of 360 video by projection conversion. In this study, 360 library was used to convert the projection of 360 video, and a high-efficiency video coding (HEVC) test model (HM) was adopted to encode and decode.
As shown in Fig. 1 , multiview 360 video compression with projection conversion is proposed. first, multiple 360 videos are acquired from 360 cameras. In pre-processing, the projection type of the videos is converted to reduce the size of the videos. Subsequently, the videos are compressed and transmitted through a network. In post-processing, the compressed videos are decompressed, and converted to the original projection.
PPT Slide
Lager Image
Multiview 360 video transmission with projection conversion conceptual diagram
This work consists of five sections: section II introduces the related work. For instance, 360 video standardization from MPEG, 360 video projection, and view synthesis. Section III shows the overall experiment, including the bitrate reducing method. Section IV summarizes the experimental results for the proposed method. Finally, in Section 5, the conclusions are presented, and future work is discussed.
Ⅱ. Related Work
- 1. 360 Video Standardization from MPEG
The MPEG-I ad-hoc group was established to take charge of 360 video standardization. As mentioned in Section I, in MPEG-I, DoF for immersive media is divided into 3DoF+, 3DoF+, and 6DoF [16] , as shown in Fig. 2 . In 3DoF, an user sitting in a chair can watch the 360 video acquired from one viewpoint, which restricts the user's immersive experience. In 3DoF+, an user is also supposed to sit in a chair. However, it provides a viewport in accordance with the user's head movement. Finally, in 6DoF, it makes it possible for the user to walk around while watching 360 video, with support of the user's movement.
PPT Slide
Lager Image
Viewing angle and degree of freedom : (a) 3DoF, (b) 3DoF+, and (c) 6DoF
In 3DoF+ and 6DoF, it is essential to synthesize the intermediate view which did not exist before with source views. Source views mean the videos acquired from cameras. Because the bandwidth to transmit 360 videos is limited, it is inevitable that the quality of each 360 video declines. Otherwise, sending the subset of source views is considerable in 3DoF+ and 6DoF. That is why MPEG-I defined the anchor view, which means the source view to transmit, as shown in Fig. 3 .
PPT Slide
Lager Image
Definition of the anchor in 3DoF+
- 2. 360 Video Projection
360 video is mapped into a sphere. However, when transmitting the 360 video, it is necessary to map it to the 2D plane, and this is called “projection”. Omnidirectional media format (OMAF) [8] , defines the standards for omnidirectional 360 video. For instance, equirectangular projection (ERP), cubemap projection (CMP), adjusted cubemap projection (ACP), octahedron projection (OHP), segmented sphere projection (SSP), and rotated sphere projection (RSP) are the projection types for 360 video defined in OMAF.
Among them, ERP is the general and most widely used format for representing 360 video. It projects the video on the sphere into the rectangular 2D plane, as shown in Fig. 4 . However, the top and bottom areas have distortion because pixels of those areas mapped in the sphere overlap when they are projected onto the rectangular 2D plane.
In CMP, which is widely used, there are six square faces, as shown in Fig. 4 . By inscribing the sphere with video in a regular hexahedron, each face is projected into the squares. In some cases, it can reduce the size of the video compared with the ERP. However, there are distortions on each edge of the squares. Video with CMP inefficiently represents the sphere because it samples the pixels nonuniformly. In Fig. 5 , the areas closer to the cube side edges are more densely sampled. Consequently, ACP was developed to overcome these problems. It processes approximate uniform sphere sampling while preserving the packing scheme. As a result, ACP enlarges the center of the cubemap in CMP, as shown in Fig. 6 .
PPT Slide
Lager Image
Examples of 360 video projection format: (a) Equirectangular projection and (b) Cubemap projection
PPT Slide
Lager Image
Cubemap sampling
PPT Slide
Lager Image
Front face representations of CMP and ACP: (a) 3×2 cubemap projection (CMP) and (b) 3×2 adjusted cubemap projection (ACP)
- 3. View Synthesis
3DoF+, as described in Section II, Subsection 1, supports an user's head movements while the user is sitting in a chair. Therefore, in 3DoF+, it is necessary to synthesize the video of the user's viewpoint, which is called “view synthesis”. Previously, free viewpoint television (FTV) [7] from MPEG, adopted view synthesis reference software (VSRS) [6] as a tool for view synthesis in multi-view video system. In 2018, VSRSx [15] , which supports 360 video view synthesis, was proposed, and it supports two or four input views to synthesize a virtual view. However, there are more than four videos in 3DoF+ test sequences which are introduced in common test conditions (CTC) for 3DoF+ [3] .
Consequently, a reference view synthesizer (RVS) [1] was adopted as a tool for view synthesis in 3DoF+, because it supports more input views than VSRS. Fig. 7 . shows the conceptual diagram and example of view synthesis in RVS. It obtains texture and depth as input views. “Texture” means the video that represents the color of the objects, and “depth” means the distance between the camera and the objects. If a number of input views and the location of the target view are provided to the RVS, it generates target virtual view texture and depth for each input view using warping. Then, it blends the synthesized virtual views.
PPT Slide
Lager Image
(a) Conceptual diagram of RVS and (b) simple example of view synthesis in RVS
Ⅲ. 3DoF+ 360 Video Transmission Bitrate Saving with Projection Conversion
In this section, a 3DoF+ 360 video bitrate saving method with projection conversion when transmitting, as shown in Fig. 8 , is explained. In pre-processing, n 360 ERP videos are converted with different projection techniques such as CMP. The purpose of this phase is to reduce the size of the 360 videos with minimum loss of information. Then, the converted videos are encoded by an HEVC encoder for transmission. Next, the encoded bitstreams are decoded and delivered to the next phase. In post-processing, the projection of the reconstructed videos is converted to ERP, which is the original projection format of the source videos. Finally, the weighted-to-spherically-uniform peak signal-to-noise ratio (WS-PSNR) [14] [17] between the source views and the converted views is measured. Because the source view projection is ERP, WS-PSNR, a weighted metric to measure the quality of reconstructed video, is selected as a tool for objective quality evaluation.
PPT Slide
Lager Image
3DoF+ 360 video transmission bitrate saving with projection conversion
- 1. Projection Conversion with 3DoF+ Test Sequences
Here, projection conversion with a 3DoF+ test sequence is introduced. MPEG has approved ClassroomVideo [4] , TechnicolorMuseum, and TechnicolorHijack for 3DoF+ test materials, as shown in Fig. 9 . Table 1 contains the general characteristics of the test sequences. ClassroomVideo was selected as the test material in this study, because it is the omnidirectional 360 video among the test sequences, which is appropriate for the projection con- version. An experiment for projection conversion was conducted to find an appropriate projection format that shows a few difference values of WS-PSNR in comparison with ERP, the original projection format. Another purpose of this experiment was to find the projection format that is suitable for real-time processing in 3DoF+, because it requires the ability to process a number of 360 videos simultaneously.
PPT Slide
Lager Image
(a) ClassroomVideo texture, (b) TechnicolorMuseum texture, (c) TechnicolorHijack texture, (d) ClassroomVideo depth, (e) TechnicolorMuseum depth, and (f) TechnicolorHijack depth
General characteristics of 3DoF+ test sequences
PPT Slide
Lager Image
General characteristics of 3DoF+ test sequences
For appropriate projection format selection, CMP3×2, ACP, compact OHP option 1 (COHP1), and compact OHP option 2 (COHP2) were selected for the experiment. Fig. 10 shows the converted pictures of position v0 with CMP3×2, ACP, COHP1, and COHP2, and Table. 2 shows the projection conversion results including converted view resolution and processing time.
Projection conversion results for texture
PPT Slide
Lager Image
Projection conversion results for texture
PPT Slide
Lager Image
Converted projection. (a) CMP3×2, (b) ACP, (c) COHP1, (d) COHP2
Ⅳ. Experimental Results
In Section III, projection conversion with CMP3×2, ACP, COHP1, and COHP2 was mentioned. The 360lib 5.0, WS-PSNR, and HM 16.16 are used as, discussed in Section III. In this experiment, a Linux server with 2 Intel Xeon E5-2687w v4 CPU and 128GB memory was used. Table 3 shows the anchor views for each class of Classroom Video. Class A1 contains all the views, while A2 contains a subset of source views. In this study, an experiment for A1 was introduced to check the result for all the source views.
Anchor views per class
PPT Slide
Lager Image
Anchor views per class
Tables 4 and 5 summarize the experimental results. Table 4 shows the Bjontegaard delta bitrate (BD-rate), encoding time saving, and pixel rate saving for each projection compared with the anchor, which is ERP. ACP showed the best results among the introduced projection types. For the Y value, it showed an 11.4% BD-rate saving compared with the anchor. In terms of the complexity, encoding time and pixel rate were measured. For example, CMP3×2 showed a 0.03% encoding time increase, which is still considerable as long as the BD-rate is better than the anchor. The pixel rate saving is the reduced number of pixels. CMP3×2 and ACP showed a 0.05% pixel saving, which decreased the burden for the decoders at the client side. In Table 5 , the details of experimental results for both texture and depth projection conversion results are shown. Fig. 11 shows the rate distortion (RD)-curve between the bitrate and WS-PSNR_Y, which is the luma value of the WS-PSNR, to visualize the results individually, and Fig. 11(e) integrates the results shown in Fig. 11 . Among the projection types, ACP showed the best performance considering the bitrate and WS-PSNR_Y. Because ACP considers the pixel distortion of CMP3×2, it showed a better result than the traditional cubemap. The result of CMP3×2 was better than that of the anchor, which was encoded with the ERP format, when the QP value was high. However, COHP1 and COHP2 showed lower WS-PSNR_Y values for all QPs compared with the anchor. Each side of the COHP1 and COHP2 is not represented as vertical or horizontal lines, which lowers the encoding efficiency. Therefore, the converted video for the aforementioned projections showed less promising results compared with the anchor.
Results for texture between ERP and the proposed projection types
PPT Slide
Lager Image
Results for texture between ERP and the proposed projection types
Bitrate and WS-PSNR_Y of A1 texture and depth
PPT Slide
Lager Image
Bitrate and WS-PSNR_Y of A1 texture and depth
PPT Slide
Lager Image
RD-curve for texture between bitrate and WS-PSNR_Y for each projection type. (a) CMP3×2, (b) ACP, (c) COHP1, (d) COHP2, and (e) summary
Ⅴ. Conclusion
Projection conversion based a bitrate-saving method was proposed. In detail, the original ERP video is converted by the 360lib converter to reduce the size of resolution, and an HEVC encoder performs encoding for transmission. From the client side, the bitstreams are decoded using an HEVC decoder, and decoded videos are reconstructed by the 360lib converter. Among the introduced projections, ACP showed an 11.4% BD-rate gain for luma value and a 0.05% pixel rate saving. However, the introduced method causes pixel loss, which leads to the loss of WS-PSNR. Consequently, research about an efficient projection type that represents the original video with less pixel loss is needed, and intensive experiments must be performed to deduce an equation that defines the relation between the original and converted videos.
This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No.2017-0-00307, Development of Tiled Streaming Technology for High Quality VR Contents RealTime Service). This work was also supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No.2018-0-00765, Development of Compression and Transmission Technologies for Ultra High-Quality Immersive Videos Supporting 6DoF).
BIO
Jong-Beom Jeong
- 2018. 8. : Received B.S. degree in Department of Computer Engineering from Gachon University
- 2018. 9. ~ 2019. 8. : Pursued M.S. degree in Department of Computer Engineering from Gachon University
- 2019. 9. ~ Current : Pursuing M.S. degree in Department of Computer Education from Sungkyunkwan University (SKKU)
- Research of Interest : Multimedia communication and system, video compression standard
Soonbin Lee
- 2014. 3. ~ Current : Pursuing B.S. degree in Department of Computer Engineering from Gachon University
- Research of Interest : Multimedia communication and system, video compression standard
Dongmin Jang
- 2019. 2. : Received B.S. degree in Department of Computer Engineering from Gachon University
- 2019. 3. ~ 2019. 8. : Pursued M.S. degree in Department of Computer Engineering from Gachon University
- 2019. 9. ~ Current : Pursuing M.S. degree in Department of Computer Education from Sungkyunkwan University (SKKU)
- Research of Interest : Multimedia communication and system, video compression standard
Sungbin Kim
- 2013. 3. ~ Current : Pursuing B.S. degree in Department of Computer Engineering from Gachon University
- Research of Interest : Multimedia communication and system, video compression standard
Sangsoon Lee
- 1982. 2. : Received B.S. degree in Department of Electronic Engineering from Inha University
- 1986. 2. : Received M.S. degree in Department of Computer Engineering from Inha University
- 2005. 2. : Received Ph.D. degree in Department of Computer Engineering from Incheon University
- 1994. 2. ~ Current : Associate professor in Department of Computer Engineering from Gachon University
- Research of Interest : Computer network, system software, IoT
Eun-Seok Ryu
- 1999. 8.:Received B.S. degree in Department of Computer Science from Korea University
- 2001. 8.:Received M.S. degree in Department of Computer Science from Korea University
- 2008. 2.:Received Ph.D. degree in Department of Computer Science from Korea University
- 2008. 3. ~ 2008. 8.:Research professor from Korea University
- 2008. 9. ~ 2010. 12.:Postdoctoral Research Fellow in the School of Electrical and Computer Engineering from Georgia Centers for Advanced Telecommunications Technology (GCATT)
- 2011. 1. ~ 2014. 2.:Staff engineer from InterDigital Labs
- 2014. 3. ~ 2015. 2.:Principal Engineer from Samsung Electronics
- 2015. 3. ~ 2019. 8.:Associate professor in Department of Computer Engineering from Gachon University
- 2019. 9. ~ Current : Associate professor in Department of Computer Education from Sungkyunkwan University (SKKU)
- Research of Interest : Multimedia communication and system, video compression and international standard, application field of HMD/VR
References
Kroon Bart , Lafruit Gauthier 2018 Reference view synthesizer (rvs) 2.0 manual 123th MPEG meeting of ISO/IEC JTC1/SC29/ WG11 MPEG123/n17759
Jeong Jong-Beom , Jang Dongmin , Son Jangwoo , Ryu Eun-Seok 2018 3DoF+ 360 Video Location-Based Asymmetric Down-Sampling for View Synthesis to Immersive VR Video Streaming Sensors 18 (9) 3148 -    DOI : 10.3390/s18093148
Jung Joel , Kroon Bart , Doré Renaud , Lafruit Gauthier , Boyce Jill 2018 CTC on 3DoF+ and Windowed 6DoF (v2) 123th MPEG meeting of ISO/IEC JTC1/SC29/ WG11 MPEG123/n17726
Kroon Bart 2018 3DoF+ test sequence ClassroomVideo 122th MPEG meeting of ISO/IEC JTC1/SC29/ WG11 MPEG2018/m42415
Thanh Le Tuan , Ngyuen Van Dien , Ryu Eun-Seok 2018 Real-time 360-degree video streaming over millimeter wave communication IEEE 2018 International Conference on Information Networking (ICOIN) 857 - 862
Tanimoto Masayuki , Fujii Toshiaki , Suzuki Kazuyoshi 2009 View Synthesis Algorithm in View Synthesis Reference Software 2.0 (VSRS2.0) 87th MPEG meeting of ISO/IEC JTC1/SC29/ WG11 MPEG2009/m16090
Tanimoto Masayuki , Fujii Toshiaki 2010 FTV - Free Viewpoint Television 61th MPEG meeting of ISO/IEC JTC1/SC29/ WG11 MPEG2010/m8595
Oh Sejin 2017 Projections under Considerations for ISO/IEC 23090-2 Omnidirectional MediA Format 118th MPEG meeting of ISO/IEC JTC1/SC29/ WG11 MPEG2017/n16828
Roh Hyun-Joon , Han SungWon , Ryu Eun-Seok 2017 Prediction complexitybased HEVC parallel processing for asymmetric multicores Multimedia Tools and Applications 76 (23) 25271 - 25284    DOI : 10.1007/s11042-017-4413-7
Skupin Robert , Sanchez Yago , SÃijhring Karsten , Schierl Thomas , Ryu Eun-Seok , Son Jangwoo 2018 Temporal MCTS Coding Constraints Implementation 122th MPEG meeting of ISO/IEC JTC1/SC29/ WG11 MPEG 122/m42423
Son Jang-Woo , Jang Dongmin , Ryu Eun-Seok 2018 Implementing 360 video tiled streaming system ACM Proceedings of the 9th ACM Multimedia Systems Conference 521 - 524
Son Jang-Woo , Jang Dongmin , Ryu Eun-Seok 2018 Implementing Motion-Constrained Tile and Viewport Extraction for VR Streaming ACM Network and Operating System Support for Digital Audio and Video 2018 (NOSSDAV2018) 61 - 66
Son Jang-Woo , Ryu Eun-Seok 2018 Tile-Based 360-Degree Video Streaming for Mobile Virtual Reality in Cyber Physical System Elsevier, Computers and Electrical Engineering
Sun Yule , Ang Lu , Yu Lu 2017 Weighted-to-spherically-uniform quality evaluation for omnidirectional video IEEE signal processing letters 24 (9) 1408 - 1412
Senoh Takanori , Tetsutani Nobuji , Yasuda Hiroshi 2018 MPEG-I-Visual: View Synthesis Reference Software (VSRSx) 123th MPEG meeting of ISO/IEC JTC1/SC29/ WG11 MPEG 123/m42911
Wang Xin , Chen Lulin , Zhao Shuai , Lei Shawmin 2017 From OMAF for 3DoF VR to MPEG-I Media Format for 3DoF+, Windowed 6DoF and 6DoF VR 119th MPEG meeting of ISO/IEC JTC1/SC29/WG11 MPEG 119/m41197
Sun Yule , Wang Bin , Yu Lu 2018 ERP WS-PSNR Software Manual 123th MPEG meeting of ISO/IEC JTC1/SC29/WG11 MPEG 123/n17760