研究生: |
方宏倫 Fang, Hung-Luen. |
---|---|
論文名稱: |
一個以CNN為基礎的JEM快速幀內預測模式演算法 CNN-based Fast Intra Prediction Mode Decision Algorithm for JEM |
指導教授: |
王家祥
Wang, Jia-Shung |
口試委員: |
彭文孝
Peng, Wen-Hsiao 蕭旭峰 Hsiao, Hsu-Feng |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2019 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 53 |
中文關鍵詞: | 幀內預測 、機器學習 、卷積神經網路 、聯合勘探模型 、快速幀內預測模式決策 |
外文關鍵詞: | Intra prediction, machine learning, convolutional neural network, joint exploration model (JEM), ast intra prediction mode decision |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Joint Exploration Model(JEM) 是Joint Video Exploration Team(JVET)標準組織所提出的新一代視訊編碼技術,相較於High Efficiency Video Coding (HEVC),降低了30%的bit-rate,但實現了高效能編碼效率的代價,卻帶來了其高度複雜的計算與編碼結構,大幅度提升了編碼時間。根據研究顯示,在幀內預測中,高度的計算複雜度來自於失真最佳化(RDO),因此,減少失真最佳化中的候選模式是減少編碼時間的有效手段。
本篇論文提出了一種基於CNN的快速算法,使用CNN分析編碼單元(CU)的紋理,將CU分類成平滑或是非平滑的類別,分別對應採用非角度或角度的預測模式進行幀內預測。而肇因於執行CNN的成本相對高於RDO,所以必須先透過最可能的模式(MPM)以及粗略模式決策(RMD)減少RDO候選數量。如果未達過濾條件,才需透過CNN做更進一步的分析以減少候選數量。若是CNN判斷為紋理平滑的CU,則使用預測模式0與1以及一個絕對轉換誤差和(SATD)最小的角度模式進行編碼;反之,則使用最多6個角度模式進行編碼。實驗結果顯示,與JEM 7.0版本相比,最多可以節省31.66%的編碼時間,而平均可以節省26.40%的編碼時間,Bjøntegaard delta bit-rate (BD-BR)則平均上升1.39%。
Joint Exploration Model (JEM) is a potential next-generation video coding standard proposed by Joint Video Exploration Team (JVET). Compared with High Efficiency Video Coding (HEVC), there is 30% bit-rate reduction with current version of JEM. However, it also brings highly computational complexity and complex coding structure as a cost to achieve high efficiency coding efficiency. According to literatures, the computational complexity comes from the critical process of rate distortion optimization (RDO) in intra prediction. Therefore, reducing the requisite candidates of RDO is an effective way of reducing the encoding time.
In this thesis, a fast algorithm based on CNN to reduce the coding time is proposed. This CNN is designed to analyze the texture of each coding unit (CU) to obtain the classification of the CU of type smooth or complex, corresponding to non-angular or angular intra-prediction modes. Notice that, since the cost of executing the CNN model is relatively higher than RDO processing, a filter of reducing the RDO candidates is first accomplished through the most probable mode (MPM) and the rough mode decision (RMD). If the filtering fails, the CNN model is activated to reduce the candidates instead. Next, if the texture of the CU is recognized as smooth, the prediction modes 0, 1, and one angular mode of least sum of absolute transformed difference (SATD) cost are chosen; otherwise, at most six angular modes are used for encoding. The experimental results show that the 31.66% encoding time could be saved in the best case compared with JEM7.0. Also, the approach could achieve 26.40% time reduction on average with 1.39% Bjøntegaard delta bit-rate (BD-BR) increase.
[1] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) standard,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649–1668, Dec. 2012.
[2] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560–576, Jul. 2003.
[3] J. Chen, E. Alshina, G. J. Sullivan, J.-R. Ohm, and J. Boyce, “Algorithm Description of Joint Exploration Test Model 7 (JEM 7),” document JVET-G1001, ISO/IEC/JTC1/SC29/WG11 ITU-T SG16 Q.6, Jul. 2017.
[4] K. Kim, J. Min. T. Lee, W.-J. Han, and J.-H. Park, “Block Partitioning Structures in HEVC,” in IEEE Transactions on Circuits and Systems for Video Technology, Vol. 22, no. 12, pp. 1697-1706, Dec. 2012.
[5] J. An, Y.-W. Chen, K. Zhang, H. Huang, Y.-W. Huang, and S. Lei, “Block Partitioning Structure for Next Generation Video Coding,” document COM16–C966, ISO/IEC/JTC1/SC29/WG11 ITU-T SG16 Q.6, Oct. 2015.
[6] J. An, H. Huang, K. Zhang, Y.-W. Huang, and S. Lei, “Quadtree plus binary tree structure integration with JEM tools,” document JVET-B0023, ISO/IEC/JTC1/SC29/WG11 ITU-T SG16 Q.6, Feb. 2016.
[7] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 620–636, Jul. 2003.
[8] Y. Piao, J. H. Min, and J. Chen, “Encoder Improvement of Unified Intra Prediction,” document JCTVC-C207, Oct. 2010.
[9] X. Zhao, J. Chen, M. Karczewicz, L. Zhang, X. Li and W.-J. Chien, “Enhanced Multiple Transform for Video Coding,” in Data Compression Conference, 2016.
[10] X. Zhao, J-C-V. Seregin, H. Egilmez, and M. Karczewicz, “NSST: Non-Separable Secondary Transforms for Next Generation Video Coding,” in 2016 Picture Coding Symposium (PCS), Dec. 2016.
[11] Nan Song, Zhenyu Liu, Xiangyang Ji, and Dongsheng Wang, “CNN oriented fast PU mode decision for HEVC hardwired intra encoder,” in IEEE Global Conference on Signal and Information Processing (GlobalSIP), 14-16 Nov. 2017, pp. 239-243.
[12] Y. Zhang, S. Kwong, G. Zhang, Z. Pan, H. Yuen, and G. Jiang, “Low Complexity HEVC INTRA Coding for High-Quality Mobile Video Communication,” in IEEE Transactions on Industrial Informatics, vol. 11, no. 6, pp. , 1492-1504, Oct. 2015.
[13] X. Sun, X. Chen, Y. Xu, Y. Wang, and D. Yu, “Fast CU partition strategy for HEVC based on Haar wavelet,” in IET Image Processing, vol. 11, no. 9, pp. 717-723, Sept. 2017.
[14] Z. Liu, X. Yu, S. Chen and D. Wang, “CNN oriented fast HEVC intra CU mode decision,” in IEEE International Symposium on Circuits and Systems (ISCAS), 2016.
[15] M. Xu, T. Li, Z. Wang, X. Deng, R. Yang, and Z. Guan. “Reducing complexity of HEVC: A deep learning approach.” [Online]. Available: https://arxiv.org/abs/1710.01218, 2017.
[16] Z. Jin, P. An, L. Shen, and C. Yang, “CNN oriented fast QTBT partition algorithm for JVET intra coding,” in Proc. IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL (2017), pp. 1-4.
[17] Z. Wang, S. Wang, X. Zhang, S. Wang, and S. Ma, “Fast QTBT partitioning decision for interframe coding with convolution neural network,” in Proc. 25th IEEE International Conference on Image Processing (ICIP), Oct. 2018, pp. 2550–2554.
[18] T. Amestroy, A. Mercat, W. Hamidouche, C. Bergeron, and D. Menard, “Random Forest Oriented Fast QTBT Frame Partitioning,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2019.
[19] S. Kuanar, K. Rao, and C.Conly, "Fast Mode Decision in HEVC Intra Prediction using Region wise CNN Feature Classification," in Proc. IEEE International Conference on Multimedia and Expo (ICME), San Diago, Jul. 2018.
[20] Chen Li, Congrui Li, and Junwen Liu, “Fast Intra Candidate Selection and CU Split in Intra Prediction for Future Video Coding,” in IEEE International Conference of Safety Produce Informatization (IICSPI), Dec. 2018.
[21] S. Ryu, and J.W. Kang, “Machine learning-based fast angular prediction mode decision technique in video coding,” in IEEE Transactions on Image Processing (2018), in press, doi:10.1109/TIP.2018.2857404.
[22] V. Nair, and G. E. Hinton., “Rectified linear units improve restricted boltzmann machines.” in International Conference on Machine Learning (ICML), 2010.
[23] K. Sühring, and X. Li, “JVET common test conditions and software reference configurations,” document JVET-H1010, ISO/IEC/JTC1/SC29/WG11 ITU-T SG16, Oct. 2017.
[24] J. Boyce, K. Suehring, X. Li, and V. Seregin, “JVET common test conditions and software reference configurations,” document JVET-J1010, ISO/IEC/JTC1/SC29/WG11 ITU-T SG16, Apr. 2018.
[25] X. Glorot, and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in International Conference on Artificial Intelligence and Statistics (AISTATS), 2010.
[26] Kingma, Diederik P and Ba, Jimmy Lei., “Adam: A method for stochastic optimization.” arXiv preprint arXiv:1412.6980, 2014.