研究生: |
劉信宏 Liu, Hsin-Hung |
---|---|
論文名稱: |
緩解 VVC之幀內編碼失真之三階段多重注意力CNN演算法 Three-Stage Multi-Attention CNN Model for Artifacts Removal in Intra-mode VVC |
指導教授: |
王家祥
Wang, Jia-Shung |
口試委員: |
張寶基
Chang, Pao-Chi 蕭旭峰 Hsiao, Hsu-Feng 杭學鳴 Hang, Hsueh-Ming |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 英文 |
論文頁數: | 45 |
中文關鍵詞: | 多功能視頻編碼 、環路濾波器 、卷積神經網路 、注意力技術 |
外文關鍵詞: | VVC, In-loop filters, CNN, Attention-oriented techniques |
相關次數: | 點閱:4 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
多功能視頻編碼 (H.266/VVC) 作為高效率視訊編碼 (H.265/HEVC) 的後繼編碼標準,目的是在相近的視頻品質之下將 HEVC 的編碼效能大幅度提升。然而,以區塊為基礎的視頻編碼框架會導致視頻幀中出現各式各樣的失真,例如區塊效應 (blocking artifact),振鈴效應 (ringing artifact) 和模糊效應 (blurring artifact)等。因此,H.266/VVC採用了環路濾波器來處理幀中失真。而這些環路濾波器包含去塊濾波器(deblocking filter),取樣自適應偏移(SAO) 和自適應環路濾波器(ALF)。當然這些人為定義的規則可能無法充分有效消除各式各樣複雜的失真效應。近年來,隨著卷積神經網路 (CNN) 在電腦視覺領域中取得了很好的成果,許多研究開始將 CNN 用於環路濾波器。在本論文中,我們提出了一個三階段的多重注意力CNN 模型 (TMCNN) 來取代 H.266/VVC 中的環路濾波器。我們的 TMCNN分為三個階段,首先投影階段(Projection stage)負責將輸入投影到特徵空間。接著,去塊階段(Deblocking stage) 和振鈴處理階段 (Ringing-processing stage)分別消除區塊效應和振鈴效應。此外、我們的模型還採用了三種注意力(Attention-oriented)技術,分別是通道注意力 (Channel attention)、空間注意力 (Spatial attention) 和自我注意力(Self-attention)等模組。 透過這些注意力處理技,TMCNN 能夠著重在較為重要的特徵上,並有效地消除失真。實驗結果顯示,相對於VVC VTM程式,我們的TMCNN在幀內編碼模式下BD-rate降低了4.5%。
Versatile Video Coding (H.266/VVC), as the successor to High Efficiency Video Coding (H.265/HEVC), aims to double the coding performance of HEVC under similar video quality. However, these block-based video coding framework would cause various artifacts in the frames of videos. For example, blocking artifact, ringing artifact and blurring artifact, etc. Therefore, enhanced in-loop filters are adopted in VVC to remove these artifacts in distorted frames, including deblocking filter, SAO and ALF. These hand-crafted rules may not correctly eliminate various complex artifacts in distorted frames eventually. In recent years, as CNN has achieved good results in particular computer vision tasks, many studies have activated to use CNN models for replacing these hand-crafted in-loop filters. In this thesis, a three-stage multi-attention CNN model (TMCNN) is proposed for H.266/VVC. This TMCNN model consists of three stages. First, the projection stage is responsible for projecting inputs to the feature space, then the deblocking stage and ringing-processing stage are responsible for eliminating blocking artifacts and ringing artifacts, respectively. In addition, this model also has built-in three sorts of attention, which are channel attention, spatial attention and self-attention blocks. With the help of these attention skills, this TMCNN can focus on more significant features. Besides, a hybrid loss function that emphasizes both deblocking loss and ringing loss was designated to train TMCNN. Experimental results show that the proposed TMCNN significantly improves the coding performance and achieves up to 4.5% BD-rate reduction under All-intra configuration in VVC, outperform other machine learning approaches.
[1] G.J. Sullivan, J.-R. Ohm, W-J Han, and T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard,” IEEE Transactions On Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1658-1668, Dec, 2012.
[2] T. Wiegand, G.J. Sullivan, G. Bjøntegaard, and A. Luthra, “Overview of the H.264/AVC Video Coding Standard,” IEEE Transactions On Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560-576, Jul, 2003.
[3] J. Chen, E. Alshina, G.-J. Sullivan, J.-R. Ohm, and J. Boyce, “Algorithm Description of Joint Exploration Test Model 1,” document JVET-A1001, ISO/IEC/JTC1/SC29/WG11 ITU-T SG16 WP3, Oct, 2016.
[4] B. Bross, “Versatile Video Coding (Draft 5),” document JEVT-J1001 ISO/IEC/JTC1/SC29/WG11 ITU-T SG16 WP3, Apr, 2018.
[5] Y. Dai, D. Liu, F. Wu, “A Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding,” in International Conference on Multimedia Modeling, Reykjavik, Iceland, Jan 4-6, 2017, pp. 28-39.
[6] Y. Zhang , Y. Liu , P. Sun , H. Yan , X. Zhao, L. Zhang, “IFCNN: A general image fusion framework based on convolutional neural network,” Information Fusion 54, Feb, 2020.
[7] Z. Huang, Y. Li, J. Sun, “Multi-Gradient Convolutional Neural Network Based In-Loop Filter For VVC,” in IEEE International Conference on Multimedia and Expo (ICME), London, UK, Jul 6-10, 2020, pp. 1-6.
[8] S. Chen, Z. Chen, Y. Wang, S. Liu, “In-Loop Filter with Dense Residual Convolutional Neural Network for VVC,” in Conference on Multimedia Information Processing and Retrieval (MIPR), Shenzhen, China, Aug 6-8, 2020, pp. 149-152.
[9] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov, 1998.
[10] K. He, X. Zhang, S. Rne, and J. Sun, “Deep Residual Learning for Image Recognition,” arXiv preprint arXiv:1512.03385, Dec, 2015.
[11] J. Hu, L. Shen, G. Sun, “Squeeze-and-Excitation Networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, Jun 18-23, 2018, pp. 7132-7141.
[12] S. Woo, J. Park, J. Lee, I. Kweon, “CBAM: Convolutional Block Attention Module,” in Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, Sep 8-14, 2018, pp. 3-19.
[13] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, Q. Hu, “ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, Jun 13-19, 2020, pp. 11531-11539.
[14] C. Dong, C. C. Loy, K. He, X. Tang, “Image Super-Resolution Using Deep Convolutional Networks,” arXiv preprint arXiv:1501.00092, Dec, 2014.
[15] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, C. C. Loy, “ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks,” in Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, Sep 8-14, 2018, pp. 0-0.
[16] C. Dong, Y. Deng, C. C. Loy, X. Tang, “Compression artifacts reduction by a deep convolutional network,” in IEEE International Conference on Computer Vision (ICCV). Santiago, Chile, Dec 7-13, 2015, pp. 576-584.
[17] Z. Huang, X. Guo, M. Shang, J. Gao, J. Sun, “An Efficient QP Variable Convolutional Neural Network Based In-loop Filter for Intra Coding,” in Data Compression Conference (DCC), Snowbird, UT, USA, Mar 23-26, 2021, pp. 33-42.
[18] K. He, X. Zhang, S. Ren, J. Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” in IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, Dec 7-13, 2015, pp. 1026-1034.
[19] H. Zhang, I. Goodfellow, D. Metaxas, A. Odena, “Self-Attention Generative Adversarial Networks,” in Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, Califorina, USA, Jun 9-15, 2019, pp. 7354-7363.
[20] T. Li, M. Xu, Xin. Deng, “A Deep Convolutional Neural Network Approach for Complexity Reduction on Intra-mode HEVC”, in IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, China, Jul 10-14, 2017, pp. 1255-1260.