研究生: |
洪梓寬 Hung, Tzu-Kuan |
---|---|
論文名稱: |
動態點雲串流的錯誤隱藏研究 On Error Concealment of Dynamic 3D Point Cloud Streaming |
指導教授: |
徐正炘
Hsu, Cheng-Hsin |
口試委員: |
陳健
黃俊穎 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 英文 |
論文頁數: | 100 |
中文關鍵詞: | 點雲 、串流 、錯誤隱藏 |
外文關鍵詞: | point clouds, streaming, error concealment |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近來動態影像專家小組 (MPEG) 所制定的V-PCC編解碼器在點雲壓縮率上展現超凡壓縮率,它將三
維的幾何資訊投影至二維,並使用了舊有且成熟的二維影片壓縮技術。但在錯誤隱藏方面,V-PCC
可說是表現得很差。在不穩定的網際網路情況下,當有錯誤的位元流被傳送,接收端收到後解碼的
三維點雲將會有嚴重的扭曲。為了解決點雲傳輸錯誤的這個問題,我們提出一個通用的錯誤隱藏架構。
我們也提出一系列包含設計,實作與比較的錯誤隱藏工具與演算法。在我們提出的五個錯誤隱藏
演算法中,其中四個可以處理錯誤的幾何資訊。這些演算法囊括了點對點,三角形法,與方塊配對法。
對上述方法,我們除了比較各種表現量尺之外,也提供了程式執行時間,讓開發者得以在模組品質
與執行時間進行權衡比較。為了分析我們提出的演算法的優點與缺點,我們在錯誤隱藏實驗中比較
了七組擁有不同特徵的點雲人像影片。我們的實驗得出以下結論:(一)我們的演算法在GPSNR量尺
中,至少贏過了V-PCC達3.58分貝。在VMAF量尺中則至少多出10.68。(二)我們的演算法在GPSNR
量尺中,至少贏過了三圍複製法達5.8分貝。在VMAF量尺中則至少多出12。這項工作可以往下列幾點
作延伸:(一)以通用圖形處理器與平行運算加速錯誤隱藏程式。(二)更深入的研究與改善運動向量
預測並預先在編解碼器的元資料中儲存運動向量殘值。(三)實作帶有自動位元率調整的點雲影片
串流系統。(四)更佳地運用幀內尚餘的幾何資訊,在少部分資訊流失時實現幀間錯誤隱藏。
Recently standardized MPEG Video-based Point Cloud Compression (V-PCC) codec
has shown promise in achieving a good rate-distortion ratio of dynamic 3D
point cloud compression by building on top of state-of-the-art techniques for
2D video compression. Current error concealment methods of V-PCC, however, lead
to significantly distorted 3D point cloud frames under imperfect network
conditions. To address this problem, we propose a general framework for
concealing distorted and lost 3D point cloud frames due to packet loss. We
also design, implement, and evaluate a suite of tools for each stage of our
framework, which can be combined into multiple variants of error concealment
algorithms. We propose five error concealment algorithms, four of which can
conceal the geometry losses. These algorithms span from point-to-point,
triangular, and cube-based matching methods, which offer wide variant of
tradeoff between computational complexity and visual quality. We conduct
extensive experiments using seven dynamic 3D point cloud sequences with diverse
characteristics to understand the pros/cons of our proposed error concealment
algorithms.
Our experiment results show that our error concealment algorithms
outperform: (i) the method employed by V-PCC by at least 3.58 dB in Geometry
Peak Signal-to-Noise Ratio (GPSNR) and 10.68 in Video Multi-Method Assessment
Fusion (VMAF) and (ii) point cloud frame copy method by at most 5.8 dB in (3D)
GPSNR and 12.0 in (2D) VMAF.
This work can both be broadened and deepen by:
(i) accelerating the running time exploiting the parallelization ability of
graphics processing units (GPUs).
(ii) looking deeper into better matching of motion cubes and store residual
values in the metadata of the codec.
(iii) applying real streaming system with adaptive bitrate mechanism.
(iv) make use of profound spatial info remaining in the distorted 3D point
cloud and conduct spatial concealment.
[1] A. Aaron, Z. Li, M. Manohara, J. Y. Lin, E. C.-H. Wu, and C.-C. J. Kuo. Challenges
in cloud based ingest and encoding for high quality streaming media. In
Proc. of IEEE International Conference on Image Processing (ICIP), pages 1732–
1736, Quebec City, QC, Canada, 2015.
[2] ASKA3D. ASKA3D, 2022. https://aska3d.com/en/.
[3] R. Bar-Yehuda and C. Gotsman. Time/space tradeoffs for polygon mesh rendering.
ACM Transactions on Graphics, 15(2):141–152, April 1996.
[4] P. Besl and N. D. McKay. A method for registration of 3-d shapes. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 14(2):239–256, February 1992.
[5] M. Botsch and L. Kobbelt. High-quality point-based rendering on modern gpus. In
Proc. of 11th Pacific Conference on Computer Graphics and Applications, 2003.,
pages 335–343, Canmore, Canada, October 2003.
[6] J. M. Boyce, R. Dor´e, A. Dziembowski, J. Fleureau, J. Jung, B. Kroon, B. Salahieh,
V. K. M. Vadakital, and L. Yu. Mpeg immersive video coding standard. Proceedings
of the IEEE, 109(9):1521–1536, 2021.
[7] C. Cao, M. Preda, and T. Zaharia. 3D point cloud compression: A survey. In Proc.
of ACM International Conference on 3D Web Technology (Web3D’19), pages 1–9,
Los Angeles, CA, July 2019.
[8] B.-Y. Chen and T. Nishita. Multiresolution streaming mesh with shape preserving
and QoS-like controlling. In Proc. of ACM International Conference on 3D Web
Technology (Web3D’02), pages 35–42, Tempe, Arizona, February 2002.
[9] J. Chen, J. S. K. Yi, M. Kahoush, E. S. Cho, and Y. K. Cho. Point cloud scene
completion of obstructed building facades with generative adversarial inpainting.
Sensors, 20(18):5029, 2020.
[10] W. Cheng and W. T. Ooi. Receiver-driven view-dependent streaming of progressive
mesh. In Proc. of ACM International Workshop on Network and Operating Systems
Support for Digital Audio and Video (NOSSDAV’08), pages 9–14, Braunschweig,
Germany, May 2008.
[11] P. Cignoni, M. Callieri, M. Corsini, M. Dellepiane, F. Ganovelli, and G. Ranzuglia.
Meshlab: an open-source mesh processing tool. In Eurographics Italian chapter
conference, pages 129–136, Salerno, Italy, 2008.
[12] A. Clemm, M. T. Vega, H. K. Ravuri, T. Wauters, and F. D. Turck. Toward truly
immersive holographic-type communication: Challenges and solutions. IEEE Communications
Magazine, 58(1):93–99, 2020.
[13] A. Collet, M. Chuang, P. Sweeney, D. Gillett, D. Evseev, D. Calabrese, H. Hoppe,
A. Kirk, and S. Sullivan. High-quality streamable free-viewpoint video. ACM Transactions
on Graphics, 34(4):1–13, 2015.
[14] C. Conti, L. D. Soares, and P. Nunes. Dense light field coding: A survey. IEEE
Access, 8:49244–49284, 2020.
[15] K. De Miguel, A. Brunete, M. Hernando, and E. Gambao. Home camera-based fall
detection system for the elderly. Sensors, 17(12), December 2017.
[16] E. d’Eon, B. Harrison, T. Myers, and P. A. Chou. 8i voxelized full bodies - a voxelized
point cloud dataset. ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG)
input document WG11M40059/WG1M74006, 2017. Geneva, Switzerland.
[17] M. Doma´nski, O. Stankiewicz, K. Wegner, and T. Grajek. Immersive visual media
—mpeg-i: 360 video, virtual navigation and beyond. In Proc. of International Conference
on Systems, Signals and Image Processing (IWSSIP’17), pages 1–9, Poznan,
Poland, July 2017.
[18] Filament, 2022. https://github.com/google/filament.
[19] T. Forgione, A. Carlier, G. Morin, W. T. Ooi, V. Charvillat, and P. K. Yadav. DASH
for 3D networked virtual environment. In Proc. of ACM International Conference on
Multimedia (MM’18), pages 1910–1918, Seoul, Republic of Korea, October 2018.
[20] Z. Fu, W. Hu, and Z. Guo. 3d dynamic point cloud inpainting via temporal consistency
on graphs. In Proc. of IEEE International Conference on Multimedia and
Expo (ICME’20), pages 1–6. IEEE, 2020.
[21] T. Georgiev, Z. Yu, A. Lumsdaine, and S. Goma. Lytro camera technology: Theory,
algorithms, performance analysis. Multimedia Content and Mobile Devices,
8667:458 – 467, 2013.
[22] D. Graziosi, O. Nakagami, S. Kuma, A. Zaghetto, T. Suzuki, and A. Tabatabai.
An overview of ongoing point cloud compression standardization activities: videobased
(V-PCC) and geometry-based (G-PCC). APSIPA Transactions on Signal and
Information Processing, 9, 2020.
[23] A. Gu´eziec, G. Taubin, B. Horn, and F. Lazarus. A framework for streaming geometry
in VRML. IEEE Computer Graphics and Applications, 19(2):68–78, 1999.
[24] B. Han, Y. Liu, and F. Qian. Vivo: Visibility-aware mobile volumetric video streaming.
In Proc. of ACM Annual International Conference on Mobile Computing and
Networking (MobiCom’18), pages 1–13, New Delhi, India, October 2020.
[25] G. Haßlinger and O. Hohlfeld. The gilbert-elliott model for packet loss in real time
services on the internet. In 14th GI/ITG Conference-Measurement, Modelling and
Evalutation of Computer and Communication Systems, pages 1–15. VDE, 2008.
[26] J. He, Z. Fu, W. Hu, and Z. Guo. Point cloud attribute inpainting in graph spectral
domain. In Proc. of IEEE International Conference on Image Processing (ICIP),
pages 4385–4389. IEEE, 2019.
[27] HEVC Test Model (HM) documentation, September 2014.
http://hevc.hhi.fraunhofer.de/HM-doc/.
[28] S. Hojati, M. Kazemi, and P. Moallem. Error concealment with parallelogram partitioning
of the lost area. Multimedia Tools and Applications, pages 1–21, 2019.
[29] M. Hosseini and C. Timmerer. Dynamic adaptive point cloud streaming. In Proc. of
ACM Packet Video Workshop (PV’18), pages 25–30, Amsterdam, The Netherlands,
June 2018.
[30] S.-Y. Hu, T.-H. Huang, S.-C. Chang, W.-L. Sung, J.-R. Jiang, and B.-Y. Chen.
Flod: A framework for peer-to-peer 3D streaming. In Proc. of IEEE Conference
on Computer Communications (INFOCOM’08), pages 1373–1381, Phoenix, AZ,
April 2008.
[31] W. Hu, Z. Fu, and Z. Guo. Local frequency interpretation and non-local selfsimilarity
on graph for point cloud inpainting. IEEE Transactions on Image Processing,
28(8):4087–4100, 2019.
[32] B. Hwang, J. Jo, and C. Ri. An improved multi-directional interpolation for spatial
error concealment. Springer Multimedia Tools and Applications, 78(2):2587–2598,
2019.
[33] Intel. Intel RealSense technology, 2022.
[34] ITU-R. Parameter values for the HDTV standards for production and international
programme exchange. Recommendation ITU-R BT.709-6, 2015.
[35] E. S. Jang, M. Preda, K. Mammou, A. M. Tourapis, J. Kim, D. B. Graziosi, S. Rhyu,
and M. Budagavi. Video-based point-cloud-compression standard in MPEG: From
evidence collection to committee draft [standards in a nutshell]. IEEE Signal Processing
Magazine, 36(3):118–123, 2019.
[36] L. Keselman, J. I. Woodfill, A. Grunnet-Jepsen, and A. Bhowmik. Intel(R) realsense(
TM) stereoscopic depth cameras. In Proc. of IEEE Conference on Computer
Vision and Pattern Recognition Workshops (CVPRW’17), pages 1267–1276,
Honolulu, USA, July 2017.
[37] C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and M. Gross. Scene reconstruction
from high spatio-angular resolution light fields. ACM Transactions on
Graphics, 32:1–12, 2013.
[38] K. Lee, J. Yi, Y. Lee, S. Choi, and Y. M. Kim. GROOT: a real-time streaming system
of high-fidelity volumetric videos. In Proc. of ACM Annual International Conference
on Mobile Computing and Networking (MobiCom’20), pages 1–14, London,
United Kingdom, September 2020.
[39] L. Li, Z. Li, V. Zakharchenko, J. Chen, and H. Li. Advanced 3D motion prediction
for video-based dynamic point cloud compression. IEEE Transactions on Image
Processing, 29:289–302, 2020.
[40] Looking Glass Factory. Looking Glass Factory, 2022.
https://lookingglassfactory.com/.
[41] K. Mamou, T. Zaharia, and F. Preteux. Famc: The mpeg-4 standard for animated
mesh compression. In Proc. of 15th IEEE International Conference on Image Processing
(ICIP’08), pages 2676–2679, San Diego, USA, October 2008.
[42] D. Miyazaki, N. Hirano, Y. Maeda, S. Yamamoto, T. Mukai, and S. Maekawa. Floating
volumetric image formation using a dihedral corner reflector array device. Applied
Optics, 52(1):A281–A289, Jan 2013.
[43] V-PCC test model v11. Document, ISO/IEC JTC1/SC29/WG11 MPEG 3DG, 2020.
Meeting held Online.
[44] U. Muhammad, H. Xiangjian, L. Kin-Man, X. Min, B. S. M. Matloob, and C. Jinjun.
Frame interpolation for cloud-based mobile video streaming. IEEE Transactions on
Multimedia, 18(5):831–839, 2016.
[45] OpenCV. Depth Map from Stereo Images, 2022.
https://docs.opencv.org/4.x/dd/d53/tutorial py depthmap.html/.
[46] P. Paudyal, J. Guti´errez, P. Le Callet, M. Carli, and F. Battisti. Characterization and
selection of light field content for perceptual assessment. In Proc.of Ninth International
Conference on Quality of Multimedia Experience (QoMEX’17), pages 1–6,
Erfurt, Germany, May 2017.
[47] R. Placket. The Analysis of Permutations. Journal of the Royal Statistical Society.
Series C (Applied Statistics), 24(2):193–202, 1975.
[48] X. Qin, G. Wu, J. Lei, F. Fan, X. Ye, and Q. Mei. A novel method of autonomous
inspection for transmission line based on cable inspection robot lidar data. Sensors,
18(2):596, 2018.
[49] S. Rusinkiewicz and M. Levoy. Streaming QSplat: A viewer for networked visualization
of large, dense models. In Proc. of ACM Symposium on Interactive 3D
Graphics (I3D’01), pages 63–68, Chapel Hill, NC, March 2001.
[50] R. Rusu and S. Cousins. 3D is here: Point cloud library (PCL). In Proc. of IEEE International
Conference on Robotics and Automation (ICRA’11), pages 1–4, Shanghai,
China, May 2011.
[51] N. Sabater, G. Boisson, B. Vandame, P. Kerbiriou, F. Babon, M. Hog, R. Gendrot,
T. Langlois, O. Bureller, A. Schubert, and V. Allie. Dataset and pipeline for multiview
light-field video. In Proc. of the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR) Workshops, pages 30–40, Hawaii, HI, USA, July 2017.
[52] A. Sankisa, A. Punjabi, and A. K. Katsaggelos. Video error concealment using deep
neural networks. In IEEE International Conference on Image Processing (ICIP’18),
pages 380–384, Athens, Greece, October 2018.
[53] A. Sankisa, A. Punjabi, and A. K. Katsaggelos. Temporal capsule networks for video
motion estimation and error concealment. Signal, Image and Video Processing,
pages 1–9, 2020.
[54] S. Schwarz, M. Preda, V. Baroncini, M. Budagavi, P. Cesar, P. A. Chou, R. A. Cohen,
M. Krivoku´ca, S. Lasserre, Z. Li, et al. Emerging MPEG standards for point
cloud compression. IEEE Journal on Emerging and Selected Topics in Circuits and
Systems, 9(1):133–148, 2018.
[55] A. Sengupta, F. Jin, R. Zhang, and S. Cao. mm-Pose: Real-time human skeletal
posture estimation using mmWave radars and CNNs. IEEE Sensors Journal,
20(17):10032–10044, May 2020.
[56] Y. Sun, R. Hang, Z. Li, M. Jin, and K. Xu. Privacy-preserving fall detection with
deep learning on mmwave radar signal. In Proc. of IEEE Visual Communications
and Image Processing (VCIP), pages 1–4, Sydney, Australia, December 2019.
[57] A. S´anchez-Rodr´ıguez, M. Soil´an, M. Cabaleiro, and P. Arias. Automated inspection
of railway tunnels’ power line using lidar point clouds. Remote Sensing,
11(21):2567, November 2019.
[58] T. I. D. Team. ImageMagick, 2022. https://imagemagick.org.
[59] Texas Instruments Inc. TI mmWave radar specification document, 2021.
[60] S. Vagharshakyan, R. Bregovic, O. Suominen, and A. Gotchev. Densely sampled
light fields (Version 2), 2022. http://urn.fi/urn:nbn:fi:att:ed60be6d-9d15-4857-aa0da30acd16001e.
[61] S. Varakliotis, J. Ostermann, and V. Hardman. Coding of animated 3-d wireframe
models for internet streaming applications. In Proc. of IEEE International Conference
on Multimedia and Expo (ICME’01), pages 61–61, Tokyo, Japan, August
2001.
[62] Velodyne LiDAR. Puck: compact, powerful, intelligent, 2021.
[63] R. Wang, J. Peethambaran, and D. Chen. Lidar point clouds to 3-d urban models: a
review. IEEE Journal of Selected Topics in Applied Earth Observations and Remote
Sensing, 11(2):606–627, January 2018.
[64] X. Wen, T. Li, Z. Han, and Y.-S. Liu. Point cloud completion by skip-attention network
with hierarchical folding. In Proc. of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR’20), pages 1939–1948, Seattle, WA, 2020.
[65] C. Westphal. Challenges in networking to support augmented reality and virtual
reality. In Proc. of International Conference on Computing, Networking and Communications
(ICNC’17), pages 1–5, Silicon Valley, CA, January 2017.
[66] C.-H. Wu, C.-F. Hsu, T.-K. Hung, C. Griwodz, W. T. Ooi, and C.-H. Hsu. Quantitative
comparison of point cloud compression algorithms with PCC Arena. IEEE
Transactions on Multimedia, pages 1–16, February 2022. Accepted to Appear.
[67] C.-H. Wu, X. Li, R. Rajesh, W. T. Ooi, and C.-H. Hsu. Dynamic 3D point cloud
streaming: Distortion and concealment. In Proc. of ACM International Workshop
on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV’
21), pages 9–14, Istanbul, Turkey, September 2021.
[68] H. Xie, H. Yao, S. Zhou, J. Mao, S. Zhang, and W. Sun. GRNet: Gridding residual
network for dense point cloud completion. arXiv preprint arXiv:2006.03761, 2020.
[69] H. Yang, J. Shi, and L. Carlone. Teaser: Fast and certifiable point cloud registration.
IEEE Transactions on Robotics, 37(2):314–333, December 2021.
[70] Z. Yang, K. Nahrstedt, Y. Cui, B. Yu, J. Liang, S.-h. Jung, and R. Bajscy. Teeve: The
next generation architecture for tele-immersive environments. In Proc. of IEEE International
Symposium on Multimedia (ISM’05), pages 8–pp, Irvine, CA, December
2005.
[71] R. Zhang and S. Cao. Real-time human motion behavior detection via cnn using
mmwave radar. IEEE Sensors Letters, 3(2):1–4, 2018.
[72] Q.-Y. Zhou, J. Park, and V. Koltun. Open3D: A modern library for 3D data processing.
arXiv:1801.09847, 2018.
[73] M. Ziegler, R. op het Veld, J. Keinert, and F. Zilly. Acquisition system for dense
lightfield of large scenes. In Proc. of 3DTV Conference: The True Vision - Capture,
Transmission and Display of 3D Video (3DTV-CON’17), pages 1–4, Copenhagen,
Denmark, June 2017.