簡易檢索 / 詳目顯示

研究生: 馬運揚
Ma, Yun-Yang
論文名稱: Load Balanced Slice-level Parallelism of the H.264/AVC Encoder
H.264/AVC 編碼器平行化之負載平衡研究
指導教授: 王家祥
Wang, Jia-Shung
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 96
中文關鍵詞: 視訊壓縮平行處理
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • H.264/AVC is a modern standard of video compression techniques. In order to enhance the compression efficiency and quality, several new features are introduced in H.264/AVC encoder such as variable block sizes, multiple reference frames and quarter-pixel motion vector accuracy, and so on. However, the side effect is the high computing complexity of H.264/AVC encoder which causes the increment of encoding time. In this thesis, we parallelize the H.264/AVC encoder on a multi-core system and take advantage of parallel computing capability with multiple processors to speed up the encoding procedure.
    In the H.264/AVC encoding process, an encoding frame can be divided into several slices which can be encoded independently. Consequently, we proposed a new slice-level parallelism using a single slice as data unit for parallel processing. Due to the variant computational requirements of the slices during the encoding process, in previous slice-level parallelism, the uneven workload of CPUs appeared which degrades the system performance dramatically. Nevertheless, the method in this work can improve the response to this problem and achieve a higher system performance.
    In the proposed algorithm, the complexity prediction of each macroblock is performed at first and rearranges the estimated complexity of each slice as equal as possible. Furthermore, an early termination scheme is utilized to shorten the difference of the encode completion time among the slices. At last, a dynamic adjustment of slice partition is introduced to the next frame such that the computational loads of the slices become more equivalent.
    The experimental results show that the proposed slice-level parallelism can provide 3.7 times to 3.9 times speed upgrade while four CPUs are exploited, and only a small amount of coding efficiency and quality decreases.


    H.264/AVC是一個近代的視訊壓縮技術標準,為了提升壓縮效率以及品質,H.264/AVC引用了許多新的機制,例如:可變區塊大小(variable block sizes)、 多重參考幀(multiple reference frames)和四分之一像素精確度的運動向量預估(quarter-pixel motion estimation)等。然而,伴隨而來的則是大量的運算複雜度使得編碼時間過長。本篇論文主要是利用多核心系統平行運算的方式將H.264/AVC編碼器平行化以達到編碼速度的提升。
    在H.264/AVC編碼過程中,一張影像可以被切成多個獨立編碼的slice,因此本篇論文提出了一個以slice為資料平行處理單位的平行演算法(slice-level parallelism)。由於編碼過程中各個slice所消耗的運算量不盡相同,以往的slice-level parallelism在做平行運算時會使得各處理器的工作量分布不均而導致系統效能降低。然而,本篇論文的方法能夠針對此問題改進而達到更高的平行化效果。
    我們所提出的演算法中,首先針對各個即將編碼的slice做運算複雜度的預估,依預測的結果重新配置各個slice的運算量試圖達到平衡。進而利用一個提早結束編碼的機制縮短slice編碼結束時間的差異。最後再根據過去各個slice運算量的分布,動態調整下一張即將編碼影像中的slice切割方式,使各slice的運算量分布更趨平衡。實驗結果顯示,本篇論文所提出的slice-level parallelism能在使用四個CPU平行運算時,提供3.7倍至3.9倍的速度提升且僅有極少量的壓縮率及品質下降。

    致謝 I 中文摘要 III Abstract V List of Figures X Chapter 1. Introduction 1 Chapter 2. Related Work 5 2-1. Introduction to x264 Encoder 5 2-1-1. Mode decision 6 2-1-2. Rate distortion optimization 7 2-1-3. Performance evaluation of x264 9 2-2. Analysis of Thread Granularity 11 2-2-1. GOP-level Parallelism 12 2-2-2. Frame-level parallelism 12 2-2-3. Slice-level parallelism 13 2-2-4. Macroblock-level parallelism 14 2-3. Multi-level Threading of H.264 Encoder 15 2-3-1. Hyper-threading technology 16 2-3-2. Implementation of multi-level H.264 encoder 18 2-4. Hierarchical Parallelization Approach 21 2-4-1. Performance analysis 21 2-4-2. Hierarchical H.264/AVC parallel encoder 25 2-5. Wavefront Parallelization Approach 26 2-5-1. Data dependencies of H.264 27 2-5-2. Data partition of wavefront parallelization 30 2-5-3. Task scheduling and priorities 32 Chapter 3. Load Balanced Slice-level Parallelism 36 3-1. Workload Allocation of Motion Estimation 39 3-1-1. Complexity estimation 41 3-1-2. Complexity allocation 46 3-2. Early Termination Approach 48 3-2-1. Mechanism of early termination 49 3-2-2. Modified motion estimation structures 50 3-3. Fast Rate Estimation of CABAC 53 3-3-1. Model of bit-rate estimation for CABAC 55 3-3-2. Bit-rate estimation method 56 3-4. Dynamic Slice Partitioned Scheme 61 Chapter 4. Experimental Results 65 4-1. Evaluation of Encoding Speed 66 4-2. Quality Preservation 75 4-3. Comparisons of Coding Efficiency 83 Chapter 5. Conclusions and Future Works 89 References 92

    [1] International Standard Organization. Information Technology-Coding of Audio-Visual Objects, Part 10-Advanced Video Coding, ISO/IEC 14496-10.
    [2] International Standard Organization. Information Technology-Coding of Audio-Visual Objects, Part 2-Visual, ISO/IEC 14496-2.
    [3] T.-C. Chen, Y.-W. Huang, and L.-G. Chen, “Analysis and Design of Macroblock Pipelining for H.264/AVC VLSI Architecture,” Proceedings of the 2004 International Symposium on Circuits and Systems (ISCAS ‘07), vol. 2, pp. II-273-6, May 2004.
    [4] B. Jeon and J. Lee, “Fast Mode Decision for H.264,” JVT-J003, ISO/IEC MPEG and ITU-T VCEG Joint Video Team, (Waikoloa, HI), December 2003.
    [5] I. Choi, J. Lee, and B. Jeon, “Fast Coding Mode Selection with Rate-distortion Optimization for MPEG-4 Part-10 AVC/H.264,” IEEE Transactions on Circuits Systems for Video Technology (TCSVT ‘06), vol. 16, no. 12, pp. 1557-1561, 2006.
    [6] P. Yin, H.Y. Cheong, A. Tourapis, and J. Boyce, “Fast Mode Decision and Motion Estimation for JVT/H.264,” Proceedings of IEEE International Conference on Image Processing (ICIP ‘03), 2003.
    [7] D. Wu, F. Pan, K.P. Lim, S. Wu, Z.G. Li, X. Lin, S. Rahardja, and C.C. Ko, “Fast Inter Mode Decision in H.264/AVC Video Coding,” IEEE Transactions on Circuits Systems for Video Technology (TCSVT ’05), vol. 15, no. 6, pp. 953-958, 2005.
    [8] H. Kim and Y. Altunbasak, “Low-complexity Macroblock Mode Selection for H.264/AVC Encoders,” Proceedings of IEEE International Conference on Image Processing (ICIP ‘04), pp. 765-768, 2004.
    [9] B. Jung and B. Jeon, “Fast Inter Mode Selection for H.264/AVC Using Large Block and Zero Motion Consistency History,” IEEE TENCON 2007-2007 IEEE Region 10 Conference, vol. 10, pp. 1-4, 2007.
    [10] Z. Chen, P. Zhou, and Y. He, “Fast Motion Estimation for JVT,” Joint Video Team, Doc. JVT-G016, March 2003.
    [11] Y. Liang, I. Ahmad, J. Luo, Y. Sun, and V. Swaminathan, “On Using Hierarchical Motion History for Motion Estimation in H.264/AVC,” IEEE Transactions on Circuits Systems for Video Technology (TCSVT ’05), vol. 15, no. 12, pp. 1594-1603, 2005.
    [12] C.-L. Yang, L.-M. Po, and W.-H. Lam, “A Fast H.264 Intra Prediction Algorithm Using Macroblock Properties,” Proceedings of IEEE International Conference on Image Processing (ICIP ‘04), 2004.
    [13] M.-J. Chen, G.-L. Li, Y.-Y. Chiang, and C.-T. Hsu, “Fast Multiframe Motion Estimation Algorithms by Motion Vector Composition for The MPEG-4/AVC/H.264 Standard,” IEEE Transactions on Multimedia (TMM ’06), vol. 8, no.3, pp. 478-487, June 2006.
    [14] S. Ge, X. Tian, and Y.-K. Chen, “Efficient Multi-threading Implementation of H.264 Encoder on Intel Hyper-Threading Architectures,” Proceedings of IEEE Pacific-Rim Conference on Multimedia (PCM ‘07), pp. 469-473, December 2003.
    [15] S. M. Akramulah, I. Ahmad, and M. L. Liou, "Parallelization of Mpeg-2 Video Encoder for Parallel and Distributed Computing Systems," Proceedings of the 38th Midwest Symposium on Circuits and Systems (MWSCAS ’95), vol. 2, pp. 834-837, August 1995.
    [16] P. Tiwari and E. Viscito, "A Parallel Mpeg-2 Video Encoder with Look-ahead Rate Control," Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’96), vol. 4, pp. 1994-1997, May 1996.
    [17] K. Shen, L.A. Rowe, and E.J. Delp, "Parallel Implementation of An Mpeg-1 Encoder: Faster than Real Time," in SPIE, vol. 2419, pp. 407-418, February 1995.
    [18] Z. Zhao and P. Liang, “Data Partition for Wavefront Parallelization of H.264 Video Encoder,” Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS ‘06), pp. 21-24, May 2006.
    [19] Y.-K. Chen, E.Q. Li, X. Zhou, and S. Ge, “Implementation of H.264 encoder and decoder on personal computers,” Journal of Visual Communication and Image Representation, vol. 17, no. 2, pp. 509-532, 2006.
    [20] D. Marr, F. Binns, D.L. Hill, G. Hinton, D.A. Koufaty, J.A. Miller, and M. Upton, “Hyper-Threading Technology Microarchitecture and Performance,” Intel Technology Journal Q1, pp. 4-15, 2002.
    [21] A. Rodriguez, A. Gonzalez, and M.P. Malumbres, “Hierarchical Parallelization of an H.264/AVC Video Encoder,” Proceedings of International Symposium on Parallel Computing in Electrical Engineering (PARELEC ‘06), pp. 363-368, 2006.
    [22] P. List, A. Joch, J. Lainema, G. Bjontegaard, and M. Karczewicz, “Adaptive Deblocking Filter,” IEEE Transactions on Circuit System and Video Technology (TCSVT ’03), vol. 13, no. 7, pp. 614-619, July 2003.
    [23] x264 software [online] http://www.videolan.org/developers/x264.html.
    [24] L. Merritt, “x264: A High Performance H.264/AVC Encoder,” [online] http://neuron2.net/library/avc/overview_x264_v8_5.pdf.
    [25] J.C. Fernández and M.P. Malumbres, “A Parallel Implementation of H.26L Video Encoder,” Proceedings of EuroPar 2002 conference (LNCS 2400), pp. 830- 833, 2002.
    [26] A. Rodriguez, A. González, and M.P. Malumbres, “Performance Evaluation of Parallel MPEG-4 Video Coding Algorithms on Clusters of Workstations,” Proceedings of IEEE International Conference on Parallel Computing in Electrical Engineering (PARELEC ’04), pp. 354-357, 2004.
    [27] B. Jung and B. Jeon, “Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection,” Journal of Visual Communication and Image Representation, vol. 19, pp. 558-572, 2008.
    [28] S. Zhu and K.-K. Ma, “A New Diamond Search Algorithm for Fast Blockmatching Motion Estimation,” IEEE Transactions on Image Processing (TIP ‘00), vol. 9, pp.287–290, February 2000.
    [29] S. Sun and S. Chen, “An Efficient Parallel Algorithm for H.264/AVC Encoder,” Proceedings of the 2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCS ‘08), pp. 66-69, 2008.
    [30] D. Marpe, H. Schwarz, and T. Wiegand, “Context-Based Adaptive Binary Arithmetic Coding in the H.264/AVC Video Compression Standard,” IEEE Transactions on Circuits and Systems for Video Technology (TCSVT ’06), vol. 13, no. 7, pp. 620-636, 2006.
    [31] L. Liu and X. Zhuang, “CABAC Based Bit Estimation for Fast H.264 RD Optimization Decision,” 6th IEEE Consumer Communications and Networking Conference 2009 (CCNC ‘09). pp. 1-5, January 2009.
    [32] NVIDIA, NVIDIA CUDA Compute Unified Device Architecture-Programming Guide Version 1.1, 2007.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE