簡易檢索 / 詳目顯示

研究生: 唐盛銘
Tang, Sheng-Ming
論文名稱: 支援多使用者六自由度線上瀏覽的盲串流系統
A Blind Streaming System for Multi-client Online 6-Degree-of-Freedom View Touring
指導教授: 徐正炘
Hsu, Cheng-Hsin
口試委員: 胡敏君
Hu, Min-Chun
黃俊穎
Huang, Chun-Ying
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 71
中文關鍵詞: 盲串流六自由度瀏覽系統多使用者
外文關鍵詞: blind, streaming, touring, systems, multi-user
相關次數: 點閱:35下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • N/A


    Online 6 degree of freedom (6-DoF) view touring has become increasingly popular due to hardware advances and the recent pandemic. One way for content creators to support many 6-DoF clients is by transmitting 3D content to them, which unfortunately leads to content leakage. Another way for content creators is to render and stream novel views for 6-DoF clients, which unfortunately incurs staggering computational and networking workloads. In this thesis, we develop a blind streaming system as a better solution that leverages cloud service providers between content creators and 6-DoF clients. The proposed blind streaming system has two core design objectives: (i) to generate high-quality novel views for 6-DoF clients without retrieving 3D content from content creators, (ii) to support many 6-DoF clients without overloading the content creators. We achieve these two goals in the following steps. First, we design a source view request/response interface between cloud service providers and content creators for efficient communications. Second, we present novel view optimization algorithms for cloud service providers to intelligently select the minimal set of source views while considering the workload of content creators. Third, we employ scalable client-side view synthesis for 6-DoF clients with heterogeneous device capabilities and personalized 6-DoF client poses and preferences. Our evaluation results demonstrate the merits of our blind streaming system; compared to the state-of-the-art solution, our system: (i) improves synthesized novel views by 2.27 dB in PSNR and 12 in VMAF on average, and (ii) reduces the bandwidth consumption by 94% on average. In fact, our blind streaming system approaches the performance of an unrealistic optimal solution with unlimited source views, achieving performance gaps as small as 0.75 dB in PSNR and 3.8 in VMAF. We also empirically demonstrate that our blind streaming system is not vulnerable to 3D content reconstruction algorithms such as Structure-from-Motion (SfM).

    Contents Abstract i Acknowledgments ii 1 Introduction 1 1.1 Contributions ................................2 1.2 Limitations ................................. 3 1.3 Organization................................. 3 2 Background 5 2.1 Head mounted display (HMD) and 6 degree of freedom . . . . . . . . . . 5 2.2 Virtual reality (VR) and view touring.................... 6 2.3 Common streaming media formats..................... 7 3 Related Work 9 3.1 Novel view synthesis ............................ 9 3.2 Coverage optimization and view selection ................. 9 4 High-Level Design 11 5 Novel View Optimization: Problem and Solution 13 5.1 Problem formulation ............................ 13 5.2 System specification for S-CC and P-CC.................. 14 6 Pose Predictor 15 7 Candidate Generator 16 7.1 Candidate generator for S-CC (S-Cdd)................... 17 7.2 Candidate generator for P-CC (P-Cdd)................... 18 7.2.1 Optimal number of partitions (M)................. 18 7.2.2 Source view transformation for all poses of each partition . . . . 20 8 Coverage Estimator 22 8.1 Scalar coverage estimator for S-CC..................... 22 8.1.1 First order cvg1(·) ......................... 23 8.1.2 Second order cvg2(·)........................ 23 8.2 Pixel level coverage estimator cvgP(·) for P-CC . . . . . . . . . . . . . . 25 8.2.1 Mesh creation............................ 25 8.2.2 Disocclusion removal........................ 26 9 Solver 30 9.1 Solver for S-CC............................... 30 9.1.1 Integer programming based solvers ................ 30 9.1.2 Greedy based solvers........................ 31 9.2 Solver for P-CC............................... 32 9.2.1 Uniform (Uni) ........................... 33 9.2.2 Branch&Bound (BB) ....................... 33 9.2.3 Uniform&Modify (UM)...................... 34 10 Performance Evaluations 37 10.1 Evaluations of S-CC............................ 37 10.1.1 Testbed implementation ...................... 37 10.1.2 Setup for S-CC........................... 38 10.1.3 Results for S-CC .......................... 39 10.2 Evaluations of P-CC............................. 43 10.2.1 System implementation....................... 43 10.2.2 Experiment setup.......................... 44 11 Conclusion 63 11.1Remarks................................... 63 11.2 Attack using structure-from-motion(SfM) . . . . . . . . . . . . . . . . . 63 11.3 Future work................................. 64 Bibliography

    [1] alvr-org. ALVR - Air Light VR, 2023. https://github.com/alvr-org/ALVR.
    [2] B. Attal, S. Ling, A. Gokaslan, C. Richardt, and J. Tompkin. MatryODShka: Real- time 6-DoF video view synthesis using Multi-Sphere images. In Proc. of Euro- pean Conference on Computer Vision (ECCV’20), pages 441–459, Glasgow, United Kingdom, August 2020.
    [3] K. Bestuzheva, M. Besanc ̧on, W.-K. Chen, A. Chmiela, T. Donkiewicz, J. van Doornmalen, L. Eifler, O. Gaul, G. Gamrath, A. Gleixner, et al. The SCIP opti- mization suite 8.0. Technical report, Optimization Online, 2021.
    [4] D. Bonatto, S. Fachada, S. Rogge, A. Munteanu, and G. Lafruit. Real-time depth video-based rendering for 6-DoF HMD navigation and light field displays. IEEE Access, 9:146868–146887, 2021.
    [5] J. Boyce, R. Dore ́, A. Dziembowski, J. Fleureau, J. Jung, B. Kroon, B. Salahieh, V. K. M. Vadakital, and L. Yu. MPEG immersive video coding standard. Proc. of the IEEE, 109(9):1521–1536, September 2021.
    [6] P. B. Bullen. How to choose a sample size (for the statistically challenged), 2022. https://tools4dev.org/resources/how-to-choose-a-sample-size/.
    [7] S.-C. Chen. Multimedia research toward the metaverse. IEEE MultiMedia, 29(1):125–127, 2022.
    [8] I. Choi, O. Gallo, A. Troccoli, M. Kim, and J. Kautz. Extreme view synthesis. In Proc. of IEEE/CVF International Conference on Computer Vision (ICCV’19), Seoul, Korea, October 2019.
    [9] I.Cohen,Y.Huang,J.Chen,J.Benesty,J.Benesty,J.Chen,Y.Huang,andI.Cohen. Pearson correlation coefficient. Noise Reduction in Speech Processing, pages 1–4, 2009.
    [10] Epic Games. Unreal engine, 2019. https://www.unrealengine.com.
    [11] S. Gu ̈l, S. Bosse, D. Podborski, T. Schierl, and C. Hellge. Kalman filter-based head motion prediction for cloud-based mixed reality. In Proc. of ACM International Con- ference on Multimedia (MM’20), pages 3632–3641, Seattle, United States, October 2020.
    [12] J. Haas. A history of the Unity game engine. Diss. Worcester Polytechnic Institute, 483(2014):484, March 2014.
    [13] J. Hladky, M. Stengel, N. Vining, B. Kerbl, H.-P. Seidel, and M. Steinberger. Quad- Stream: A quad-based scene streaming architecture for novel viewpoint reconstruc- tion. ACM Transactions on Graphics, 41(6):1–13, November 2022.
    [14] A. Hore ́ and D. Ziou. Image quality metrics: PSNR vs. SSIM. In Proc. of IEEE International Conference on Pattern Recognition (ICPR’20), pages 2366–2369, Is- tanbul, Turkey, August 2010.
    [15] X. Hou and S. Dey. Motion prediction and pre-rendering at the edge to enable ultra- low latency mobile 6DoF experiences. IEEE Open Journal of the Communications Society, 1:1674–1690, 2020.
    [16] M.L.Ha ̈nelandC.-B.Scho ̈nlieb.Efficientglobaloptimizationofnon-differentiable, symmetric objectives for multi camera placement. IEEE Sensors Journal, 22(6):5278–5287, March 2022.
    [17] IBM ILOG Cplex. V12. 1: User’s manual for CPLEX. International Business Machines Corporation, 46(53):157, 2009.
    [18] KrisHolt.MetaquestprowillsoonsupportWiFi6E,2023.https://reurl.cc/3xxmvL.
    [19] B. Kroon and G. Lafruit. Reference View Synthesizer (RVS) 2.0 manual, 2018.
    [20] MarketWatch. Metaverse market global analysis 2023-2030, 2023. https://reurl.cc/944QrY.
    [21] L. Markley, Y. Cheng, J. Crassidis, and Y. Oshman. Averaging quaternions. Journal of Guidance, Control, and Dynamics, 30(4):1193–1197, May 2007.
    [22] MIV. MPEG IMMERSIVE VIDEO (MIV), 2022. https://mpeg-miv.org/.
    [23] MPEG. The GitLab of mpeg test model for immersive video, 2019. https://gitlab.com/mpeg-i-visual/tmiv/-/tree/v10.0.1.
    [24] V. Munishwar and N. Abu-Ghazaleh. Scalable target coverage in smart camera networks. In Proc. of ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC’10), pages 206–213, Atlanta, United States, August 2010.
    [25] L. Myers and M. J. Sirois. Spearman correlation coefficients. Encyclopedia of Statistical Sciences, 12, 2004.
    [26] Netflix. VMAF - Video Multi-Method Assessment Fusion, 2021.
    [27] NVIDIA, P. Vingelmann, and F. H. Fitzek. CUDA, release: 10.2.89, 2020. https://developer.nvidia.com/cuda-toolkit.
    [28] E. Park, J. Yang, E. Yumer, D. Ceylan, and A. Berg. Transformation-grounded image generation network for novel 3D view synthesis. In Proc. of IEEE Interna- tional Conference on Computer Vision and Pattern Recognition (CVPR’17), Hon- olulu, United States, July 2017.
    [29] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neu- ral Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
    [30] C. Peng and V. Isler. Adaptive view planning for aerial 3D reconstruction. In Proc. of IEEE International Conference on Robotics and Automation (ICRA’19), pages 2981–2987, Montreal, Canada, May 2019.
    [31] I. E. Richardson. The H.264 Advanced Video Compression Standard. Wiley Pub- lishing, 2nd edition, 2010.
    [32] G. F. Riley and T. R. Henderson. The NS-3 network simulator. In Modeling and Tools for Network Simulation, pages 15–34. Springer, 2010.
    [33] J. L. Scho ̈nberger and J.-M. Frahm. Structure-from-motion revisited. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16), Las Ve- gas, United States, June 2016.
    [34] S.Shah,D.Dey,C.Lovett,andA.Kapoor.Airsim:High-fidelityvisualandphysical simulation for autonomous vehicles. In Field and Service Robotics, pages 621–635. Springer, November 2018.
    [35] Y.-C. Sun, S.-M. Tang, C.-T. Wang, and C.-H. Hsu. On objective and subjective quality of 6DoF synthesized live immersive videos. In Proc. of ACM Interna- tional Workshop on Quality of Experience in Visual Multimedia Applications (Qo- EVMA’22), pages 49–56, Lisboa, Portugal, October 2022.
    [36] S. Suresh, A. Narayanan, and V. Menon. Maximizing camera coverage in multicam- era surveillance networks. IEEE Sensors Journal, 20(17):10170–10178, September 2020.
    [37] S.-M. Tang, C.-H. Hsu, Z. Tian, and X. Su. An aerodynamic, computer vision, and network simulator for networked drone applications. In Proc. of ACM Annual International Conference on Mobile Computing and Networking (MobiCom ’21), pages 831–833, New Orleans, United States, 2021.
    [38] S.-M. Tang, Y.-C. Sun, J.-W. Fang, K.-Y. Lee, C.-T. Wang, and C.-H. Hsu. Opti- mal camera placement for 6 Degree-of-Freedom immersive video streaming without accessing 3D scenes. In Proc. of ACM International Workshop on Interactive EX- tended Reality (IXR’22), pages 31–39, Lisboa, Portugal, October 2022.
    [39] The Freeport Player Authors. Freeport player demo (intel), 2022.
    [40] S. Tomar. Converting video formats with ffmpeg. Linux Journal, 2006(146):10, 2006.
    [41] Unreal Engine. Blueprints visual scripting in unreal engine, 2022. https://docs.unrealengine.com/5.0/en-US/blueprints-visual-scripting-in-unreal- engine/.
    [42] V. V. Vazirani. Approximation Algorithms. United States, 2003.
    [43] X. Wang, A. Chowdhery, and M. Chiang. Networked drone cameras for sports streaming. In Proc. of IEEE International Conference on Distributed Computing Systems (ICDCS’17), pages 308–318, Atlanta, United States, June 2017.
    [44] M. Woo, J. Neider, T. Davis, and D. Shreiner. OpenGL programming guide: the of- ficial guide to learning OpenGL, version 1.2. Addison-Wesley Longman Publishing Co., Inc., 1999.
    [45] J. Zhang, Y. Yao, and B. Deng. Fast and robust iterative closest point. IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 44(7):3450–3466, 2021.
    [46] Q. Zhang, S. He, and J. Chen. Toward optimal orientation scheduling for full-view coverage in camera sensor networks. In Proc. of IEEE International Conference on Global Communications Conference (GLOBECOM’16), pages 1–6, Washington, United States, December 2016.
    [47] Q.-Y. Zhou, J. Park, and V. Koltun. Open3D: A modern library for 3D data process- ing, 2018. http://www.open3d.org/.

    QR CODE