簡易檢索 / 詳目顯示

研究生: 丁海哲
Ting, Hai-Che
論文名稱: 運用修正版LeNet-5模型來有效降低HEVC幀內預測計算
Complexity Reduction on HEVC Intra Mode Decision with modified LeNet-5
指導教授: 王家祥
Wang, Jia-Shung
口試委員: 杭學鳴
Hang, Hsueh-Ming
彭文孝
Peng, Wen-Hsiao
蕭旭峰
Hsiao, Hsu-Feng
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 29
中文關鍵詞: 幀內預測卷積神經網路邊緣能量提取視訊編碼
外文關鍵詞: edge strength extractor, early terminated CU partition
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • HEVC與他的上一代編碼標準H.264/AVC節省了約50%的bitrate。但是預測的演算法比H.264/AVC還要來的複雜許多。HEVC擁有35個幀內預測模式,並將CU的大小種類分成了64 x 64、32 x 32、16 x 16、8 x 8以及4 x 4。在全幀內預測編碼模式當中,PU的模式決定時間佔了總編碼時間的60%〜70%。因此本篇論文的目標是藉由減少上述PU模式決定的計算複雜度,來節省整體的編碼時間。
    本篇論文首先使用[18]的做法去做CU的分割,接著使用[6]的作法,先計算出當前PU的邊緣能量之後,藉由邊緣能量去判斷當前的PU是否為一個平坦的PU,如果是平坦的PU的話就只選擇0和1這兩種幀內預測模式當作候選模式,進入rate distortion optimization去選出當前PU的最佳幀內預測模式。如果不是平坦的PU,就針對 4 x 4以及 8 x 8的PU,藉由修正後的LeNet-5 CNN model去選擇出兩個候選模式,並和Most Probable Mode(MPM) list裡的前兩個模式一起成為候選模式進入rate distortion optimization,選出當前PU的最佳幀內預測模式。
    我們從實驗結果發現,結合修正後的LeNet-5 Model以及邊緣能量提取,可以在BDBR沒有提升太多的情況之下,節省大量的編碼時間與運算量。
    關鍵字:幀內預測、卷積神經網路、邊緣能量提取、視訊編碼


    HEVC is intended to provide significantly better coding efficiency than H.264/ AVC and its predecessors. One key contributor to this performance gain is the updated version of intra prediction that extended a large number of prediction directions on various sizes of prediction units (PUs), thus at a cost of very high computational complexity. More specifically, it has (at most) 35 intra modes, more PU blocks of size 64 x 64, 32 x 32, 16 x 16, 8 x 8, and 4 x 4 as well. Consequently, the PU mode decision and intra prediction would cost around 60% to 70% encoding time in the all-intra prediction HEVC encoding. Therefore, the goal of this study is intending to lower the computational complexity of HEVC intra prediction plus reduce the total encoding time of HEVC as well.
    The key step of the proposed method is to intelligently elect an indispensable set of directions with the help of a modified LeNet-5 CNN model, thus reduce the computational complexity of further rate distortion optimization. About the modified LeNet-5 model, we first replace the tanh and sigmoid function with the Rectified Linear units, then we use zero padding to maintain the information of the input PU. Finally, we replace the Rectified Linear units of the 2nd convolutional layer with the maxout units. Besides, the edge strength extractor in [6] to determine the current PU is flat or not is adopted to skip most of the direction modes. And the early terminated CU partition technique in [18] is used to decrease the number of CUs. Finally, the candidates of neighboring PU is considered to be selected also. The experimental results demonstrate that the proposed method provides a decrease of up to 66.59% in the HEVC intra prediction processing time, with a little increase in the bit-rate (1.1% on average) and a reduction of 0.109% on average in PSNR values at most.

    Key words: Intra prediction, CNN, edge strength extractor, early terminated CU partition, HEVC

    致謝.............I 中文摘要.........II ABSTRACT.........III CONTENTS.........V LIST OF FIGURES........VII LIST OF TABLES.........IX Chapter 1. Introduction........1 Chapter 2. Related Works.......4 2.1 LeNet-5................5 2.2 Maxout Network.........6 Chapter 3. Proposed Methods....8 3.1 The Modified LeNet-5....10 3.2 Maxout Network.........11 Chapter 4. Experimental Results.......13 4.1 Database and Parameters Setting........13 4.2 The effect of the padding and the maxout.........13 4.3 The coding efficiency of the proposed method.....22 Chapter 5. Conclusion and Future Works...................26 REFERENCES................................................27

    [1] G. J. Sullivan, J. Ohm, Woo Jin Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649–1668, 2013.
    [2] G. J. Sullivan and T. Wiegand, “Video compression - from concepts to the h.264/avc standard,” Proceedings of the IEEE, vol. 93, no. 1, pp. 18–31, 2005.
    [3] Gary J. Sullivan and Jens Rainer Ohm, “Recent developments in standardization of high efficiency video coding (hevc),” Proc. SPIE 7798, Applications of Digital Image Processing XXXIII, 77980V, September 2010, pp. 731–739.
    [4] Guang Chen, Zhenyu Liu, T Ikenaga, and Dongsheng Wang, “Fast hevc intra mode decision using matching edge detector and kernel density estimation alike histogram generation,” in IEEE International Symposium on Circuits and Systems, Beijing, China, 19-23 May 2013, pp. 53–56.
    [5] Thałsa L. Da Silva, Luciano V. Agostini, and Luis A. Da Silva Cruz, “Fast hevc intra prediction mode decision based on edge direction information,” in 20th European Signal Processing Conference, Bucharest, Romania, 27-31 Aug. 2012, pp. 1214–1218.
    [6] Nan Song, Zhenyu Liu, Xiangyang Ji, and Dongsheng Wang, “CNN oriented fast PU mode decision for HEVC hardwired intra encoder,” in IEEE Global Conference on Signal and Information Processing (GlobalSIP), 14-16 Nov. 2017, pp. 239-243.
    [7] Daehyeok Gwon, Haechul Choi, and Jonghee M. Youn, “Hevc fast intra mode decision based on edge and satd cost,” in Multimedia and Broadcasting, 2015, pp. 1–5.
    [8] Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner, “Gradient-Based Learning Applied to Document Recognition” in Proceedings of the IEEE, 1998, pp. 2278-2324.
    [9] Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, Yoshua Bengio, “Maxout Networks,” in International Conference on Machine Learning, 16-21 June. 2013.
    [10] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, Volume 15 Issue 1, January 2014, pp. 1929-1958.
    [11] Jawad Nagi, Frederick Ducatelle, Gianni A. Di Caro, Dan Cires¸an, Ueli Meier, Alessandro Giusti, Farrukh Nagi#, J¨urgen Schmidhuber, Luca Maria Gambardella “Max-Pooling Convolutional Neural Networks for Vision-based Hand Gesture Recognition”, IEEE International Conference on Signal and Image Processing Applications(ICSIPA), 2011.
    [12] Jiuxiang Gua, Zhenhua Wangb, Jason Kuenb, Lianyang Mab, Amir Shahroudyb, Bing Shuaib, Ting Liub, Xingxing Wangb, Li Wangb, Gang Wangb, Jianfei Caic, Tsuhan Chenc, “Recent Advances in Convolutional Neural Networks”, arXiv:1512.07108, 2017.
    [13] Thorsten Laude and J¨orn Ostermann, “Deep learning-based intra prediction mode decision for HEVC” in 2016 Picture Coding Symposium (PCS), 2016.
    [14] Hinton, Geoffrey E., Srivastava, Nitish, Krizhevsky, Alex, Sutskever, Ilya,
    and Salakhutdinov, Ruslan, “Improving neural networks by preventing co-adaptation of feature detectors,” Technical report, arXiv:1207.0580, 2012.
    [15] David Attwell and Simon B. Laughlin. “An energy budget for signaling in the grey matter of the brain,” Journal of Cerebral Blood Flow & Metabolism (JCBFM), 2001.
    [16] Xavier Glorot, Antoine Bordes and Yoshua Bengio. “Deep sparse rectifier neural networks.” Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
    [17] Junaid Tariq, Sam Kwong. “Efficient Intra and Most Probable Mode (MPM) Selection Based on Statistical Texture Features,” IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2015
    [18] Tao Zhang, Ming-Ting Sun, Fellow, IEEE, Debin Zhao, Member, IEEE, and Wen Gao, Fellow, IEEE, “Fast Intra-Mode and CU Size Decision for HEVC”, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 27, NO. 8, AUGUST 2017
    [19] Xingang Liu, Member, IEEE, Yinbo Liu, Peicheng Wang, Chin-Feng Lai, Senior Member, IEEE, and Han-Chieh Chao, Senior Member, IEEE, “An Adaptive Mode Decision Algorithm Based on Video Texture Characteristics for HEVC Intra Prediction”, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 27, NO. 8, AUGUST 2017

    QR CODE