快速篩選QTMT編碼組合的兩階段多模型卷積神經網路設計

簡易檢索 / 詳目顯示

回結果列表

研究生：	傅品捷 Fu, Pin-Chieh
論文名稱：	快速篩選QTMT編碼組合的兩階段多模型卷積神經網路設計 Two-phase Multi-model for Trimming QTMT CU Partition using Convolutional Neural Networks
指導教授：	王家祥 Wang, Jia-Shung
口試委員:	張寶基 Chang, Pao-Chi 彭文孝 Peng, Wen-Hsiao 蕭旭峰 Hsiao, Hsu-Feng
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2020
畢業學年度：	108
語文別：	英文
論文頁數：	50
中文關鍵詞：	多功能視頻編碼、深度學習、卷積神經網路、快速編碼單元大小決策
外文關鍵詞：	Versatile Video Coding(VVC), Deep learning, Convolutional neural networks, QTMT CU partition
相關次數：	點閱：105 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

多功能視頻編碼（VVC）於2017年10月起始，其可以在HEVC的比特率大約50％情況下，提供相同的主觀質量。VVC在每個128 × 128塊中引入了四叉樹和多類型樹編碼單元分割（QT + MTT或QTMT）的複雜結構。而且，結合了幾種新穎的編碼工具以前所未有地實現驚人的編碼效率。當前，最新的VVC草案VTM 6.1相對於HEVC達到了平均37.7％BD-rate降低；但是，在編碼器時間的成長8.9倍，解碼器時間成長1.6倍。而最近的研究使用了一些深度卷積神經網絡（DNN）來加快HEVC某些關鍵模塊的計算速度。至於VVC，據我們所知，在文獻中只有出現一篇是基於CNN的解決方案來減少QTMT CU分割的計算。
為了有效地解決這一問題，本文提出了一種快速篩選QTMT編碼組合的兩階段多模型卷積神經網路方法。首先，主幹CNN模型來預測每個32 × 32的塊的QTMT分割的（特定）深度。其次，三個並行的CNN模型分支來預測哪個四叉樹深度，再使用一個CNN模型分支來預測是否使用三叉樹（TT）。在第二階段中，基於上述預測信息，可以將可能的QTMT CU分割的大量組合調整為可容忍的大小。但是，由於實踐了多種模型，這可能導致CNN參數數量的顯著增加，因此訓練和預測的總計算量將大大增加。所以，我們使用在MobilenetV2中一些有效的深度學習模塊，最終將計算量降級到足夠的水平。實驗結果表明，在幀內編碼模式下，與VTM 6.1相比，我們的快速QTMT CU分割方法可以為VVC測試序列節省42.341％的平均編碼時間，並且相比VTM 6.1只上升0.71 的BD-BR。

ABSTRACT
Versatile Video Coding (VVC) initialized at October 2017, will provide the same subjective quality at roughly 50% the bitrate of its predecessor HEVC. VVC introduced a complex structure of quadtree plus multi-type tree block partitioning (QT + MTT, or QTMT) in each 128 × 128 block. Also, several novel coding tools are incorporated to achieve the amazing coding efficiency unprecedentedly. Currently, the latest VVC draft VTM 6.1 provides an average bit-rate of 37.7% BD-rate reduction relative to HEVC; however, the computation cost with a factor of 8.9 in encoder time and 1.6 in decoder time. Recent studies have used some deep convolutional neural networks (DNNs) to speed up the computation for some critical modules for HEVC. As for VVC, as we have already known, only one CNN-based solution appeared in literatures to reduce the computation of QTMT CU partition.
In this thesis, to deal with this problem effectively, a two-phase multi-model for trimming QTMT CU partition using CNN models is presented. Primary, a backbone CNN model is designated to predict the (QTMT) depth of QTMT partitioning on the basis of each block of size 32 × 32. The possible depths are classified as three groups. Secondly, three parallel branches of CNN models are used to predict which QT depth is in each group, plus a branch of CNN model is utilized to predict whether using ternary tree (TT) or not. In the second phase, based on the above prediction information, the huge number of combinations of possible QTMT CU partition can be trimmed to a tolerable size. However, due to the practice of multiple models, it may lead to a significant increasing in the amount of CNNs' parameters, thus the total computations of both training and inferencing will be raised significantly. Therefore, we apply some effective deep learning modules in MobilenetV2 to downgrade the amount of computations to an adequate level eventually. The experiment results show that the proposed fast QTMT CU partitioning method could achieve 42.341% average saving of encoding time for all VVC test sequences and with 0.71 BD-BR increasing compared with VTM 6.1 in All-intra configuration.

CONTENTS
致謝--------------------------------------------I
中文摘要-----------------------------------------II
ABSTRACT----------------------------------------IV
CONTENTS----------------------------------------VI
LIST OF FIGURES---------------------------------VIII
LIST OF TABLES----------------------------------IX
Chapter 1.  Introduction------------------------1
Chapter 2.  Related Works-----------------------9
2.1     Depthwise Separable convolution---------12
2.2     Bottleneck Residual Block---------------15
Chapter 3.  Methods-----------------------------17
3.1     Training Data and Preprocessing---------17
3.2     Two-phase Multi-model-------------------19
3.2.1     Backbone------------------------------21
3.2.2     Quadtree Branch-----------------------22
3.2.3     Ternary Tree Branch-------------------26
3.2.4     Network Training----------------------28
3.3     Fast QTMT algorithm---------------------29
Chapter 4.  Experimental Results----------------33
4.1    Comparing with learning based method-----37
4.2    Comparing with non-learning based method-40
Chapter 5.  Conclusion--------------------------46
REFERENCES--------------------------------------47
                                

[1] G.J. Sullivan, J.-R. Ohm, W-J Han, and T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard,” IEEE Transactions On Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1658-1668, Dec. 2012.
[2] T. Wiegand, G.J. Sullivan, G. Bjøntegaard, and A. Luthra, “Overview of the H.264/AVC Video Coding Standard,” IEEE Transactions On Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560-576, Jul. 2003.
[3] J. Chen, E. Alshina, G.-J. Sullivan, J.-R. Ohm, and J. Boyce, “Algorithm Description of Joint Exploration Test Model 1,” document JVET-A1001, ISO/IEC/JTC1/SC29/WG11 ITU-T SG16 WP3, Oct. 2016
[4] J. An, H. Huang, K. Zhang, Y.-W. Huang, and S. Lei, “Quadtree plus binary tree structure integration with JEM tools,” document JVET-B0023 ISO/IEC/JTC1/SC29/WG11 ITU-T SG16 WP3, Feb. 2016
[5] B. Bross, “Versatile Video Coding (Draft 5),” document JEVT-J1001 ISO/IEC/JTC1/SC29/WG11 ITU-T SG16 WP3, Apr. 2018
[6] B. Bross, “Versatile Video Coding on the final stretch,” [Online], ITU workshop on “the future media”, Available: https://www.itu.int/en/ITU-T/Workshops-and-Seminars/20191008/Documents/Benjamin_Bross_Presentation.pdf, Oct. 2019
[7] X. Li, H.-C. Chuang, J. Chen, M. Karczewicz, L. Zhang, X. Zhao, and A. Said, “Multi-Type-Tree,” document JEVT-D0117 ISO/IEC/JTC1/SC29/WG11 ITU-T SG16 WP3, Oct. 2016
[8] A. Wieckowski, J. Ma, H. Schwarz, D. Marpe, and T. Wiegand, “Fast Partitioning Decision Strategies for The Upcoming Versatile Video Coding (VVC) Standard,” IEEE International Conference on Image Processing (ICIP), pp. 4130-4134 Aug. 2019
[9] J. Chen, Y. Ye, and S. Kim, “Algorithm description for Versatile Video Coding and Test Model 6 (VTM 6),” document JVET-O2002 ISO/IEC/JTC1/SC29/WG11 ITU-T SG16 WP3, Jul. 2019
[10] M. Xu , T. Li, Z. Wang, X. Deng, R. Yang, and Z. Guan, “Reducing Complexity of HEVC: A Deep Learning Approach,” IEEE Transactions on Image Preprocessing, 27(10): 5044-5059, 2018.
[11] Z. Jin, P. An, C. Yang, and L. Shen, “Fast QTBT Partition Algorithm for Intra Frame Coding through Convolutional Neural Network,” IEEE Access, vol.6. Sep. 2018
[12] S.H Park, and J.W Kang, “Context-Based Ternary Tree Decision Method in Versatile Video Coding for Fast Intra Coding”, in IEEE Access, Nov. 2019.
[13] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.C. Chen , “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” arXiv preprint arXiv:1801.04381, Mar. 2019.
[14] B. Min, and R.C.C. Cheung, “A Fast CU Size Decision Algorithm for the HEVC Intra Encoder,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 25, May. 2015
[15] J. Qiu, F. Liang, and Y. Luo, “A fast coding unit selection algorithm for HEVC,” IEEE International Conference on Multimedia and Expo Workshops (ICMEW), Oct. 2013
[16] F. Mu, L. Song, X. Yang, and Z. Luo, “Fast coding unit depth decision for HEVC,” IEEE International Conference on Multimedia and Expo Workshops (ICMEW), Sep. 2014
[17] B. Du, W.C Siu, and X. Yang, “Fast CU partition strategy for HEVC intra-frame coding using learning approach via random forests,” IEEE Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Feb, 2016
[18] Z. Wang, S. Wang, X. Zhang, S. Wang, and S.Ma, “Fast QTBT Partitioning Decision for Interframe Coding with Convolution Neural Network,” 25th IEEE International Conference on Image Processing (ICIP), Oct. 2018
[19] M. Lei, F. Luo, X. Zhang, S. Wang, and S. Ma, “Look-Ahead Prediction Based Coding Unit Size Pruning for VVC Intra Coding,” IEEE International Conference on Image Processing (ICIP), Sep. 2019
[20] N. Tang, J. Cao, F. Liang, J. Wang, H. Liu, X. Wang, X. Du, “Fast CTU Partition Decision Algorithm for VVC Intra and Inter Coding,” IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Nov. 2019.
[21] A. Krizhevsky, I. Sutskever, and G.E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS, 2012
[22] K. Simonyan , and Andrew Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv preprint arXiv:1409.1556. Sep. 2014.
[23] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going Deeper with Convolutions,” arXiv preprint arXiv:1409.4842, Sep. 2014.
[24] K. He, X. Zhang, S. Rne, and J. Sun, “Deep Residual Learning for Image Recognition,” arXiv preprint arXiv:1512.03385, Dec. 2015.
[25] S. loffe, and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv preprint arXiv:1502.03167, Feb. 2015.
[26] P. Ramachandran, B. Zoph, and Q.V. Le, “Searching for Activation Functions,” arXiv preprint arXiv:1710.05941, Oct. 2017.
[27] Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum Learning,” IEEE 26th International Conference on Machine Learning, 2009.
[28] D.P. Kingma, and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv preprint arXiv:1412.6980, Dec. 2014
[29] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” The journal of machine learning research, 15(1): 1929-1958, 2014
[30] G. Tang, M. Jing, X. Zeng, and Y. Fan, “Adaptive CU Split Decision with Pooling-variable CNN for VVC Intra Encoding,” IEEE Visual Communications and Image Processing (VCIP), Dec. 2019.

簡易檢索 / 詳目顯示

相關論文