簡易檢索 / 詳目顯示

研究生: 賴鵬仁
Lai, Peng-Ren
論文名稱: 以注意力為基礎多階段CNN架構緩解 HEVC編碼失真研究
An Attention-Based Multi-Stage CNN Model for Artifact Removal in HEVC
指導教授: 王家祥
Wang, Jia-Shung
口試委員: 張寶基
Chang, Pao-Chi
彭文孝
Peng, Wen-Hsiao
蕭旭峰
Hsiao, Hsu-Feng
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 49
中文關鍵詞: 高效率視訊編碼環路濾波器卷積神經網路深度學習
外文關鍵詞: HEVC, In-loop filter, CNN, Deep learning
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 與H.264 / AVC相比,高效率視訊編碼(HEVC)最多可將bitrate降低50%,同時維持相近的影片品質。 然而,有損壓縮技術帶來了各種失真,例如區塊效應(blocking artifacts),振鈴效應 (ringing artifacts)和模糊。為了減弱這些失真,HEVC在編碼端和解碼端均採用了兩個環路濾波器 (in-loop filter):去區塊濾波器(deblocking filter, DF)和取樣自適應偏移(SAO)。 最近,為探索卷積神經網絡(CNN)去除失真的潛力,研究人員提出了多種CNN架構。
    本篇論文中,我們設計了一個多階段的注意力卷積神經網絡 (attention-based multi-stage CNN model, MACNN),用以去除失真,可取代環路濾波器或進行後處理。首先,編碼樹單元(CTU)、編碼單元的分割圖(CU partition map)及轉換單元的分割圖(TU partition map)被送入第一階段抽取特徵並將特徵進行混合;接下來,混合後的特徵圖 (feature map)被送進第二階段進行deblocking;然後,使用self-attention block捕捉特徵圖中遠距離的依賴關係;最後,在第三階段利用這些依賴關係進一步去除振鈴效應和模糊。這樣的設計具有幾個優點:首先,三階段的設計闡明了每個階段的功能,有效減輕了神經網路的負擔;第二,self-attention block使我們的架構能夠以更少的層數獲取全域的訊息,因此,MACNN藉由更高效,更實用的架構可以達到足以與深層架構匹敵的性能。實驗結果顯示,與HEVC相比,我們的MACNN在幀內編碼模式的情況下實現了平均6.5%的BD-rate降低。


    High Efficiency Video Coding (HEVC) provides up to 50% bitrate reduction compared to H.264/AVC while maintaining similar video quality. Nevertheless, lossy compression techniques bring various visual artifacts such as blocking, ringing, and blurring to coded frames. To attenuate these artifacts, HEVC adopts in-loop filters which are deblocking filter (DF) and sample adaptive offset (SAO) on both the encoder side and the decoder side. Recently, to explore convolutional neural networks (CNN) potential of removing artifacts, researchers presented several CNN models.
    In this paper, we propose an attention-based multi-stage CNN model (MACNN) aiming to remove various artifacts. To begin with, CTUs, CU partition maps, and TU partition maps are fed into the first stage of MACNN and projected to the same space for mixture. Next, the mixed feature maps are sent to the second stage to deblock. After that, the deblocked feature maps are processed by self-attention block to capture long-range dependencies. Finally, the feature maps containing long-range dependencies are exploited by the third stage for further deringing and deblurring. Such a design has several advantages. The 3-stage design clarifies the function of each stage, effectively alleviating the burden of the network; inception module along with self-attention block benefit our model having the ability to acquire global information with fewer layers. Hence, MACNN could achieve performance competitive to deep models with a more efficient and practical architecture. The experimental results show that our MACNN achieves an average 6.5% BD-rate reduction compared to HEVC in all-intra configuration, beating state-of-the-art methods.

    致謝 I 中文摘要 II ABSTRACT IV CONTENTS VI LIST OF FIGURES IX LIST OF TABLES X Chapter 1. Introduction 1 Chapter 2. Related Works 7 2.1 HEVC In-loop Filtering 7 2.2 Deep Learning Techniques 8 2.2.1 Residual Block 8 2.2.2 Inception Module 9 2.2.3 Self-Attention Block 10 2.3 Deep Learning for HEVC in-loop filtering and post-processing 12 2.3.1 VRCNN 12 2.3.2 MMS-net 13 2.3.3 MPRGAN 14 Chapter 3. Methods 16 3.1 The Architecture of MACNN 17 3.1.1 Projection stage 17 3.1.2 Deblocking stage 19 3.1.3 Refinement stage 22 3.1.4 Loss function 24 3.2 The Advantages of MACNN 27 Chapter 4. Experimental Results 33 4.1 Comparison with models for replacement of HEVC in-loop filters 35 4.1.1 Comparison with VRCNN [5] and DCAD [27] 35 4.1.2 Comparison with MMS-net [6] 39 4.2 Comparison with models for post-processing 40 4.2.1 Comparison with MPRGAN [7], VRCNN [5], and DSCNN [29] 40 4.3 Visual quality 41 Chapter 5. Conclusion 44 REFERENCES 46

    [1] Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard,” IEEE Transactions On Circuits And Systems For Video Technology, Vol. 22, No. 12, pp. 1649-1668, 2012.
    [2] ITU Telecom et al., Advanced Video Coding for Generic Audiovisual Services, ITU-T Recommendation H.264, 2003.
    [3] Andrey Norkin, Gisle Bjøntegaard, Arild Fuldseth, Matthias Narroschke, Masaru Ikeda, Kenneth Andersson, Minhua Zhou, and Geert Van der Auwera, “HEVC Deblocking Filter,” IEEE Transactions On Circuits And Systems For Video Technology, Vol. 22, No. 12, pp. 1746-1754, 2012.
    [4] Chih-Ming Fu, Elena Alshina, Alexander Alshin, Yu-Wen Huang, Ching-Yeh Chen, Chia-Yang Tsai, Chih-Wei Hsu, Shaw-Min Lei, Jeong-Hoon Park, and Woo-Jin Han. “Sample Adaptive Offset in the HEVC Standard,” IEEE Transactions On Circuits And Systems For Video Technology, Vol. 22, No. 12, pp. 1755-1764, 2012.
    [5] Yuanying Dai, Dong Liu, and Feng Wu. “A Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding,” In International Conference on Multimedia Modeling, pp. 28-39, 2017.
    [6] Jihong Kang, Sungjei Kim, and Kyoung Mu Lee. “Multi-Modal/Multi-Scale Convolutional Neural Network Based In-Loop Filter Design For Next Generation Video Codec,” In IEEE International Conference on Image Processing (ICIP), pp. 26-30, 2017.
    [7] Zhipeng Jin, Ping An, Chao Yang, and Liquan Shen. “Quality Enhancement For Intra Frame Coding Via Cnns: An Adversarial Approach,” In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1368-1372, 2018.
    [8] Gisle Bjøntegaard, “Calculation of average PSNR differences between RD curves,” document VCEG-M33, Apr. 2001.
    [9] Hang Zhao, Orazio Gallo, Iuri Frosio, and Jan Kautz. “Loss Functions for Image Restoration With Neural Networks,” IEEE Transactions On Computational Imaging, Vol. 3, No. 1, pp. 47-57, 2017.
    [10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep Residual Learning for Image Recognition,” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
    [11] Sergey Ioffe and Christian Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv preprint arXiv:1502.03167, 2015.
    [12] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. “Going Deeper with Convolutions,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9, 2015.
    [13] Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. “Self-Attention Generative Adversarial Networks,” arXiv preprint arXiv:1805.08318, 2018.
    [14] Pavel Svoboda, Michal Hradis, David Barina, and Pavel Zemcik. “Compression Artifacts Removal Using Convolutional Neural Networks,” arXiv preprint arXiv: 1605.00366, 2016.
    [15] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. “Learning a Deep Convolutional Network for Image Super-Resolution.” In European Conference on Computer Vision (ECCV), pages 184–199, 2014.
    [16] Chao Dong, Yubin Deng, Chen Change Loy, and Xiaoou Tang. “Compression Artifacts Reduction by a Deep Convolutional Network.” In IEEE International Conference on Computer Vision, pages 576-784, 2015.
    [17] Woon-Sung Park and Munchurl Kim. “CNN-BASED IN-LOOP FILTERING FOR CODING EFFICIENCY IMPROVEMENT” In IEEE 12th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), pages 1-5, 2016.
    [18] Jiwon Kim, Jung Kwon Lee and Kyoung Mu Lee. “Accurate Image Super-Resolution Using Very Deep Convolutional Networks,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1646-1654, 2016.
    [19] Jianbo Jiao, Wei-Chih Tu, Shengfeng He and Rynson W. H. Lau. “FormResNet: Formatted Residual Learning for Image Restoration,” In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1034-1042, 2017.
    [20] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke and Alex Alemi. “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,” In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017, pp. 4278-4284.
    [21] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah and Kyoung Mu Lee. “Enhanced Deep Residual Networks for Single Image Super-Resolution,” In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1132-1140, 2017.
    [22] Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1026-1034, 2015.
    [23] Seungjun Nah, Tae Hyun Kim, and Kyoung Mu Lee. “Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 257-265, 2017.
    [24] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. “TensorFlow: A System for Large-Scale Machine Learning,” In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16), pp. 265-283, 2016.
    [25] Tianyi Li, Mai Xu, and Xin Deng, “A Deep Convolutional Neural Network Approach For Complexity Reduction On Intra-Mode Hevc,” In 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 1255-1260, 2017.
    [26] Diederik P. Kingma and Jimmy Lei Ba. “Adam: A Method For Stochastic Optimization,” arXiv preprint arXiv:1412.6980, 2014.
    [27] Tingting Wang, Mingjin Chen and Hongyang Chao. “A Novel Deep Learning-Based Method of Improving Coding Efficiency from the Decoder-End for HEVC,” In Data Compression Conference (DCC), pp. 410-419, 2017.
    [28] F. Bossen. “Common Test Conditions and Software Reference Configurations,” document JCTVC-L1100, 2013.
    [29] Ren Yang, Mai Xu and Zulin Wang. “Decoder-Side Hevc Quality Enhancement With Scalable Convolutional Neural Network” In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp. 817-822, 2017.

    QR CODE