簡易檢索 / 詳目顯示

研究生: 吳亮甫
Wu, Liang-Fu
論文名稱: A Pipelined H.264 Decoder on the Cell Broadband Engine
Cell寬頻引擎的管線化H.264解碼器製作
指導教授: 李政崑
Lee, Jenq-Kuen
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 40
中文關鍵詞: H.264Cell寬頻引擎平行化多核心SIMD直接內存訪問
外文關鍵詞: H.264, Cell Broadband Engine, Parallel, Multicore, SIMD, DMA
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 目前市場上有越來越多的應用程式會使用影像壓縮的技術, 所以像H.264這樣的壓縮標準漸漸的受到重視. H.264針對高解析度的影片提供了目前最佳的壓縮效率, 但是高壓縮率也帶來了非常重的計算負擔. 所以必須要強化處理器才能應付H.264的效能要求. 因為目前的處理器強化是朝多核心的方向, 所以為了要達到最佳的效能, 應用程式的設計也必須把多核心平台的特性加以思考利用. 除此之外, H.264核心的演算法中擁有許多複雜的相依性, 所以必須要好好分析才能達到較佳的平行化. Cell寬頻引擎是針對多媒體計算設計的高
    效能晶片多核心處理器, 包含一顆主要處理器PowerPC Processor Element (PPE) 以及八顆Synergistic Processor Elements (SPEs). 另外還提供非常多的函式庫以及介面來幫助應用程式的開發, H.264藉著Cell 寬頻引擎的高效能可以減低影像壓縮所帶來的沉重負擔. 在這篇論文中, 我們探討資料平行以及工作平行的可行性, 然後提出了一個結合兩種方式的平行化方法.


    With the growing number of applications involved with video compression and decompression, video CODEC like H.264 plays an important role in modern market. H.264 achieves the highest compression efficiency targeting at the requirement of High Definition (HD) video contents at present, but the cost is the demand of high computational complexity. As a result the processor must be advanced to attain good performance for H.264. However, due to the well-known energy consumption and heat radiation issues, multicore platform becomes the main trend in computer architecture.
    In order to approach peak performance, multi-core platform’s characteristics must be taken into consideration. Also when parallelizing the H.264 algorithm, the CODEC must be exploited and evaluated to solve the complex dependencies in it. One of the popular multicore platforms is the IBM Cell Broadband Engine (Cell B.E.), which is a heterogeneous chip multicore processor composed of one Power Processor Element (PPE) and eight Synergistic Processor Elements (SPEs). The Cell BE is specially designed to meet the high performance requirement for multimedia applications with Single Instruction Multiple Data (SIMD) and Direct Memory Access (DMA) units inside. It also provides a rich set of libraries and APIs for application development. With the strength of Cell BE, we should be able to reduce the burden of computation introduced by H.264.
    In this thesis, the data parallelism and task parallelism are exploited to bring up a combinational parallel decoder based on JM’s open source H.264 decoder. PPE distributes two slices at a time, with two pipelined decoding flow each being composed of 4 SPEs. Double buffering is employed to process the slices independently. The theoretical speedup is 9 times comparing to sequential execution on PPE. Deblocking module is offloaded to SPE with double buffering used in the experiment, and the speedup is 1.17 times.

    Abstract i Contents iii List of Figures v List of Tables vii 1 Introduction 1 2 Cell Broadband Engine Architecture 4 2.1 Cell Broadband Engine . . . . . . . . . . . . . . . . 4 2.1.1 Overview . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 PowerPC Processor Element . . . . . . . . . . . . 5 2.1.3 Synergistic Processor Element . . . . . . . . . . 6 2.1.4 Element Interconnect Bus . . . . . . . . . . . . . 7 2.2 The SPE Runtime Management Library . . . . . . . . . 8 3 H.264/MPEG4 Part 10 11 3.1 H.264 Overview . . . . . . . . . . . . . . . . . . . 12 3.2 Prediction Scheme . . . . . . . . . . . . . . . . . 13 3.2.1 Inter Prediction . . . . . . . . . . . . . . . . 13 3.2.2 Intra Prediction . . . . . . . . . . . . . . . . 14 3.3 Deblocking Filter . . . . . . . . . . . . . . . . . 15 3.4 Transform . . . . . . . . . . . . . . . . . . . . . 16 3.5 Quantisation . . . . . . . . . . . . . . . . . . . . 17 3.6 Reordering . . . . . . . . . . . . . . . . . . . . . 17 3.7 Entropy Encode . . . . . . . . . . . . . . . . . . . 18 4 Parallel H.264 Decoder Scheme 19 4.1 Parallell Schemes . . . . . . . . . . . . . . . . . 19 4.2 Dependencies of H.264 . . . . . . . . . . . . . . . 21 4.2.1 Macroblock . . . . . . . . . . . . . . . . . . . 21 4.2.2 Frame . . . . . . . . . . . . . . . . . . . . . . 21 4.2.3 Slice . . . . . . . . . . . . . . . . . . . . . . 22 4.2.4 Size of the unit . . . . . . . . . . . . . . . . 22 4.3 Task Splitting . . . . . . . . . . . . . . . . . . . 23 4.4 Combine Data Parallelism and Task Parallelism . . . 28 4.5 Disadvantages . . . . . . . . . . . . . . . . . . . 30 5 Experiment 31 5.1 Experimental Environment . . . . . . . . . . . . . . 31 5.2 Experimental Results . . . . . . . . . . . . . . . . 32 6 Conclusion and Future Work 36 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . 36 6.2 Future work . . . . . . . . . . . . . . . . . . . . 36

    [1] H.264/AVC Software Coordination. [Online]. Availave:
    http://iphome.hhi.de/suehring/tml/
    [2] Iain E.G. Richardson. H.264 and mepg4 video compression, video coding for
    next-generation multimedia. In H.264/MPEG4 Part 10, page 159-223, 2003.
    [3] “Cell Broadband Engine Architecture”, Cell B.E. SDK Doc, October 2006.
    [4] “Programming Tutorial Version 3.1”, Cell B.E. SDK Doc, 2008.
    [5] “SPE Runtime Management Library Version 2.3”, Cell B.E. SDK Doc, October
    2008.
    [6] “Example Library API Reference Version 3.1”, Cell B.E. SDK Doc, September
    2008.
    [7] “SIMD Math Library Specification for Cell Broadband Engine Architecture
    Version 1.2”, Cell B.E. SDK Doc, July 2008.
    [8] “C/C++ Language Extensions for Cell Broadband Engine Architecture Version
    2.6”, Cell B.E. SDK Doc, August 2008.
    [9] Mauricio Alvarez Mesa, Alex Ramirez, Arnaldo Azevedo, Cor Meenderinck, Ben
    Juurlink, Mateo Valero, “Scalability of Macroblock-level Parallelism for H.264
    Decoding”, Advanced Computer Architecture and Compilation for Embedded
    Systems (ACACES), Poster Session, L’Aquila, Italy, August 2008.
    [10] David A. Bader and Sulabh Patel, “High Performance MPEG-2 Software
    Decoder on the Cell Broadbane Engine”, IEEE International Symposium on
    Parallel and Distributed Processing Symposium (IPDPS), April 2008.
    [11] HyunkiBaik, Kue-Hwan Sihn, Yun-il Kim, Sehyun Bae, Najeong Han, Hyo
    Jung Song, “Analysis and Parallelization of H.264 decoder on Cell Broadband
    Engine Architecture”, IEEE International Symposium on Signal Processing and
    Information Technology (ISSPIT), December 2007.
    [12] Michael A. Baker, Pravin Dalale, Karam S. Chatha, Sarma B. K. Vrudhula, “A
    Scalable Parallel H.264 Decoder on the Cell Boradband Engin Architecture”,
    International Conference on Hardware-Software Codesign and System Synthesis
    (CODES+ISSS), October 2009.
    [13] Arnaldo Azevedo, Cor Meenderinck, Ben Juurlink, Mauricio Alvarez, Alex
    Ramirez, “Analysis of Video Filtering on the Cell Processor”, Workshop on
    Signal Processing, Integrated Systems and Circuits (ProRISC), 2007.
    [14] Michael Roitzsch, “Slice-Balancing H.264 Video Encoding for Improved
    Scalability of Multicore Decoding”, ACM Conference on Embedded Systems
    Software (EMSOFT), October 2007.
    [15] Yun-il Kim, Jong-Tae Kim, Sehyun Bae, Hyunki Baik, Hyo Jung Song,
    “H.264/AVC Decoder Parallelization and Optimization on Asymetric Multicore
    Platform using Dynamic Load Balancing”, IEEE International Conference on
    Multimedia and Expo (ICME), June 2008.
    [16] Shuwei Sun, Dong Wang, Shuming Chen, “A Highly Efficient Parallel Algorithm
    for H.264 Encoder Based on Macro-Block Region Partition”, High Performance
    Computing and Communications, Third International Conference (HPCC),
    September 2007.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE