研究生: |
洪若恒 Hung, Juo-Heng |
---|---|
論文名稱: |
以基於金字塔的深度卷積神經網路加速編碼單元分割 Complexity Reduction for CU Partition in HEVC Using Pyramid-based Deep Convolutional Neural Network |
指導教授: |
王家祥
Wang, Jia-Shung |
口試委員: |
張寶基
Chang, Pao-Chi 杭學鳴 Hang, Hsueh-Ming 彭文孝 Peng, Wen-Hsiao |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 46 |
中文關鍵詞: | 高效率視訊編碼 、編碼單元切割 、卷積神經網路 、深度學習 、快速編碼單元大小決策 |
外文關鍵詞: | HEVC, CU partition, CNN, Deep learning, Fast CU size decision |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
比起H.264/AVC標準,高效率視訊編碼(HEVC)能夠在編碼過程中節省大約50%位元量的情況下同時維持影片的品質,大大提高了編碼效率。然而,儘管能達到如此顯著的位元量下降,整體的編碼時間卻會大幅上升,最主要的原因便是HEVC中使用了四叉樹結構的編碼樹單元(CTU)。在編碼過程中,為了能找出造成失真最小的編碼樹單元切割方法,HEVC編碼器會採用暴力搜索式的作法來遞迴檢查所有可能的四叉樹並比較各個結構造成的率失真,最後找出最佳的結構,而這個過程被稱之為位元率─失真最佳化(RDO)。
本篇論文中,我們設計了一個基於特徵金字塔的深度卷積神經網路(CNN)並以此來預測每個編碼樹單元最理想的四叉樹切割方法,藉此來降低HEVC的編碼時間。首先,編碼樹單元會被送進我們提出的DeepCTUNet(DCN)之中,預測出三層不同編碼單元大小(CU size)的所有分割機率。接著,預測出的四叉樹中所有的機率值會經過剪枝,去除掉不符合HEVC規則的結果,之後透過雙向閾(bi-threshold)將較具信心的機率值二分為要切與不切,同時將模稜兩可的機率值交由原本的RDO進行處理,得出最後的四叉樹結構並送交之後的編碼步驟。從實驗的結果來看,在幀內編碼模式的情況下,本文的快速編碼單元預測方法比起原本的HEVC編碼器能夠省下60.767%的編碼時間,而BD-BR只會有1.55%的上升。
Compared with H.264/AVC standard, High Efficiency Video Coding(HEVC) achieves a better encoding efficiency since it provides a 50% bit rate reduction while still maintaining video quality. Nevertheless, such remarkable bit rate saving is at the cost of explosive encoding time. In HEVC, in order to produce the optimal quadtree structure of Coding Tree Units(CTUs), a brute-force search procedure called Coding Unit Partition(CU Partition) is utilized. To optimize the rate-distortion, the rate-distortion costs(RD costs) of every parent CU and its four sub-CUs are calculated and compared, yielding a recursive process over all CU levels.
In this thesis, a pyramid-based deep convolutional neural network(CNN) is proposed to predict the optimal quadtree structure for each CTU, avoiding the most time-consuming CU Partition in HEVC. Firstly, the incoming CTU is sent to a deep CNN model called DeepCTUNet(DCN) to predict the splitting probabilities over three CU size levels. Next, these probabilities output by DCN go through an elimination process to cut off invalid branches based on the rules of CU Partition. Afterward, a bi-threshold mechanism is applied to binarize these probabilities, making the splitting decisions of more confident CUs directly and sending ambiguous CUs to the original rate-distortion optimization process. In the end, the final quadtree structure of the input CTU is determined and will be sent to the following encoding processes. The experimental results show that our fast CU size decision method could achieve a 60.767% encoding time reduction for HEVC test sequences on average with 1.55% BD-BR increase compared with the HEVC test model HM16.5 in all intra mode.
Mai Xu, Tianyi Li, Zulin Wang, Xin Deng, Ren Yang, and Zhenyu Guan. Re- ducing Complexity of HEVC: A Deep Learning Approach. IEEE Transactions on Image Processing, 27(10):5044–5059, 2018.
Xingang Liu, Yayong Li, Deyuan Liu, Peicheng Wang, and Laurence T Yang. An Adaptive CU Size Decision Algorithm for HEVC Intra Prediction Based on Complexity Classification Using Machine Learning. IEEE Transactions on Circuits and Systems for Video Technology, 29(1):144–155, 2017.
Biao Min and Ray CC Cheung. A Fast CU Size Decision Algorithm for the HEVC Intra Encoder. IEEE Transactions on Circuits and Systems for Video Technology, 25(5):892–896, 2014.
ITU Telecom et al. Advanced Video Coding for Generic Audiovisual Services.
ITU-T Recommendation H. 264, 2003.
Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Trans- actions on circuits and systems for video technology, 22(12):1649–1668, 2012.
Nan Hu and En-Hui Yang. Fast Mode Selection for HEVC Intra-frame Coding with Entropy Coding Refinement Based on a Transparent Composite Model. IEEE Transactions on Circuits and Systems for Video Technology, 25(9):1521– 1532, 2015.
Liquan Shen, Zhi Liu, Xinpeng Zhang, Wenqiang Zhao, and Zhaoyang Zhang. An Effective CU Size Decision Method for HEVC Encoders. IEEE transactions on multimedia, 15(2):465–470, 2012.
Tao Zhang, Ming-Ting Sun, Debin Zhao, and Wen Gao. Fast Intra-mode and CU Size Decision for HEVC. IEEE Transactions on Circuits and Systems for Video Technology, 27(8):1714–1726, 2016.
Zhenyu Liu, Xianyu Yu, Shaolin Chen, and Dongsheng Wang. CNN Oriented Fast HEVC Intra CU Mode Decision. In 2016 IEEE International Symposium on Circuits and Systems (ISCAS), pages 2270–2273. IEEE, 2016.
Zhenyu Liu, Xianyu Yu, Yuan Gao, Shaolin Chen, Xiangyang Ji, and Dong- sheng Wang. CU Partition Mode Decision for HEVC Hardwired Intra Encoder Using Convolution Neural Network. IEEE Transactions on Image Processing, 25(11):5088–5103, 2016.
Chenying Wang, Li Yu, and Shengwei Wang. Accelerate CU Partition in HEVC Using Large-scale Convolutional Neural Network. arXiv preprint arXiv:1809.08617, 2018.
ITU-T VCEG et al. Joint Call for Proposals on Video Compression Technology.
VCEG-AM91, 2010.
En-hui Yang and Xiang Yu. Transparent Composite Model for Large Scale Image/Video Processing. In 2013 IEEE International Conference on Big Data, pages 38–44. IEEE, 2013.
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabi- novich. Going Deeper with Convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep Sparse Rectifier Neural Networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 315–323, 2011.
Yann LeCun, L'eon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient- based Learning Applied to Document Recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
David Sussillo and LF Abbott. Random Walk Initialization for Training Very Deep Feedforward Networks. arXiv preprint arXiv:1412.6558, 2014.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
Tsung-Yi Lin, Piotr Doll´ar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature Pyramid Networks for Object Detection. In Pro- ceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for Large-scale Image Recognition. arXiv preprint arXiv:1409.1556, 2014.
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. Inception-v4, Inception-resnet and the Impact of Residual Connections on Learning. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. SSD: Single Shot Multibox Detector. In European conference on computer vision, pages 21–37. Springer, 2016.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving Deep into Rectifiers: Surpassing Human-level Performance on Imagenet Classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv preprint arXiv:1502.03167, 2015.
Diederik P Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimiza- tion. arXiv preprint arXiv:1412.6980, 2014.
Yun Zhang, Sam Kwong, Xu Wang, Hui Yuan, Zhaoqing Pan, and Long Xu. Machine Learning-based Coding Unit Depth Decisions for Flexible Complex- ity Allocation in High Efficiency Video Coding. IEEE Transactions on Image Processing, 24(7):2225–2238, 2015.
Mart´ın Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A System for Large-scale Machine Learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), pages 265–283, 2016.
Gisle Bjontegaard. Calculation of Average PSNR Differences Between RD- curves. VCEG-M33, 2001.
Tianyi Li, Mai Xu, and Xin Deng. A Deep Convolutional Neural Network Approach for Complexity Reduction on Intra-mode HEVC. In 2017 IEEE International Conference on Multimedia and Expo (ICME), pages 1255–1260. IEEE, 2017.
Duc-Tien Dang-Nguyen, Cecilia Pasquini, Valentina Conotter, and Giulia Boato. Raise: A Raw Images Dataset for Digital Image Forensics. In Proceed- ings of the 6th ACM Multimedia Systems Conference, pages 219–224. ACM, 2015.
Xavier Glorot and Yoshua Bengio. Understanding the Difficulty of Training Deep Feedforward Neural Networks. In Proceedings of the thirteenth interna- tional conference on artificial intelligence and statistics, pages 249–256, 2010.