簡易檢索 / 詳目顯示

研究生: 李育瑄
yu shiung li
論文名稱: 雙核心系統晶片平台上H.264解碼器的多個程式模型分析
Analysis of H.264 decoder for various programming model on dual-core SoC platform
指導教授: 石維寬
Wei-Kuan SHih
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2007
畢業學年度: 95
語文別: 中文
論文頁數: 56
中文關鍵詞: h.264雙核心程式模型雙核心溝通
外文關鍵詞: h.264, dual-core, programming model, dual-core communication
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • H.264/AVC [1]是一個近幾年非常受歡迎的國際標準,它是由 ITU-T VCEG (Video Coding Experts Group) 與ISO/IEC MPEG (Moving Picture Experts Group)所共同制定的高度壓縮視訊編碼標準。相較於之前的標準H.264/AVC有較高的運算複雜度,因此我們使用了非對稱(asymmetric)雙核心平台實做H.264/AVC decoder,期望能透過dual-core的效能實做出快速的h.264/AVC decocer。
    在dual-core的平台實作有效率的H.264/AVC解碼器有以下幾點須考慮
    1. software partition:哪些程式應該在MPU上執行,哪些程式應該在DSP上執行,平均分配會需要較多的平行控制。
    2. Data movement:processor之間常需要做資料交換,例如在DSP上decode完的資料必須搬到ARM上做display,因此就需要做資料搬移的動作,我們採用DMA或MPU搬移資料。
    3. synchronization:ARM跟DSP有固定的function需要在特定的時機點執行,因此需要做兩邊的平行控制,以通知對方做特定工作,我們使用interrupt和polling兩種方式。
    因此我們考量以下幾點實做三個h.264 decoder programming model於dual-core SoC platform上,並利用software pipeline最佳化程式流程並且提出效能分析與優缺點比較。
    以下為我們分析的三個programming model
    1. MPU decode full Entropy – polling programming model
    2. MPU decode full Entropy – interrupt programming model
    3. MPU decode partial Entropy – interrupt programming model
    我們的實驗環境為ARM上有執行OS的情況,polling的 programming model較不適合使用DMA,因為DMA的設定需在kernel mode執行,而polling的程式與buffer都存在user space,使用DMA需花費時間在user space與kernel space資料搬移以及system call呼叫,因此在我們的論文中polling隱含不使用DMA,interrupt會使用DMA做資料搬移。
    經實驗結果發現DSP真正執行演算法的時間只有1.8s,而等待資料時間太久導致解碼速度減慢,因此使用MPU decode full Entropy – interrupt programming model能有效提升解碼效能,但由於interurpt次數太多需花較多的overhead以及需做user space和kernel space資料搬移,此外實做部分的Entropy decode程式在DSP上會導致code size太大而使解碼時間增加很多,但透過程式流程的改善可以有效減少cache miss次數。


    H.264/AVC [1] is an extremely popular international standard of digital vieo compression in recent years, which is developed by ITU-T VCEG (Video Coding Experts Group) and ISO/IEC MPEG (Moving Picture Experts Group ) . H.264 / AVC has higher computed complexity compared with previous standard, so we use the asymmetric dual-core SoC platform to implement H.264/AVC decoder.

    On the dual-core platform, there are three points that shoud be think about with a view to implement efficient H.264/AVC decoder
    1. software partition: Which procedures should execute on MPU? Which procedures should execute on DSP? Average allocation needs ore synchronizing control.
    2. Data movement: dual-core needs to do a lot of data exchange, for example the restructed data that DSP decoded need to be moved to external memory in order to display later. We adopt DMA or MPU to move the materials.
    3. synchronization: MPU and DSP both have some procedures which shoulde be executed in the specific opportunity, so need to do synchronizing control. We adopt polling or interrupt.
    We consider this three points to implement three h.264 decoder programming models on dual-core SoC platform and utilize software pipeline to increase parallelism.

    The following is three programming models that we put forward
    1. MPU decode full Entropy - polling programming model
    2. MPU decode full Entropy - interrupt programming model
    3. MPU decode partial Entropy - interrupt programming model

    In our experiment environment, we are running an embedded linux on MPU. MPU decode full Entropy - polling programming model is relatively unsuitable to use DMA, because the setup of DMA shoule execte in kernel mode. However, the procedure and buffer of MPU decode full Entropy - polling programming model are all in user space, so using DMA need to spend a lot of time copying data and doing system call. Consequently, in the fist programming model we don’t use DMA to move data.
    The experimental results show that the time that DSP spend on decoding is only 1.8s. Other time is spended on waiting data. Therfore, MPU decode full Entropy - Interrupt programming model can improve the efficiency of decoding, but need additional overhead of onterrup processing and data copy between user space and kernel space. In MPU decode partial Entropy - interrupt programming model, the code size of DSP instruction is too big. It leads to a lot of cache misses and makes decoding time increase.The number of cache misses can be reduced through the design of decoding flow.

    中文摘要 I 英文摘要 II 誌謝 IV 目錄 V 圖目錄 VIII 表目錄 X 1. INTRODUCTION 1 1.1. BACKGROUND 1 1.2. MOTIVATION 2 1.3. READING GUIDANCE 3 2. VERSATILE/PB926EJ-S PLATFORM BASEBOARD AND PACDSP OVERVIEW 4 2.1. VERSATILE/PB926EJ-S PLATFORM BASEBOARD 4 2.1.1. introduction 4 2.1.2. interrupt controller 6 2.1.3. DMA controller 8 2.2. PACDSP 9 2.2.1. architecture features 9 2.2.2. PACDSP core 9 2.2.3. PACDSP kernel 10 2.2.3.1. Program Sequence Control Unit 10 2.2.3.2. Scalar Unit 11 2.2.3.3. VLIW Data Path 11 2.2.4. Pipeline Architecture 11 3. H.264 DECODING ALGORITHM OVERVIEW AND IMPLEMENTATION 13 3.1. BASIC DEFINITION 13 3.2. PROFILE 14 3.3. ENCODER/DECODER 15 3.4. IMPLEMENTATION 16 3.4.1. IT/IQ、PPC、DF 16 3.4.2. Entropy 16 3.4.2.1. Exp-Golomb Entropy Coding 17 3.4.2.2. Context-Based Adaptive Variable Length Coding(CAVLC) 18 3.4.2.2.1. our optimatation 20 3.4.2.2.2. fast table-lookup algorithm 20 4. VARIOUS H.264 DUAL-CORE DECODER PROGRAMMING MODEL 25 4.1. DATA STRUCTURE AND MEMORY ALLOCATION 25 4.2. VARIOUS PROGRAMMING MODEL INTRODUCTION AND IMPLEMENTATION 27 4.2.1. MPU decode full ED – polling programming model 27 4.2.1.1. Software Partition 27 4.2.1.2. data movement function design and implementation 28 4.2.1.2.1. ARM_ED 29 4.2.1.2.2. ARM_R 29 4.2.1.2.3. ARM_W 29 4.2.1.3. program flow and software pipeline 31 4.2.2. MPU decode full ED – interrupt programming model 34 4.2.2.1. software partition 34 4.2.2.2. data movement function design and implementation 35 4.2.2.2.1. DMA_ED 35 4.2.2.2.2. DMA_R 36 4.2.2.2.3. DMA_W 36 4.2.2.3. program flow and software pipeline 37 4.2.3. MPU decode partial ED – interrupt programming model 42 4.2.3.1. software partition 42 4.2.3.2. data movement function 43 4.2.3.3. program flow and software partition 43 5. SPECIFIC LINUX DRIVER DESIGN FOR ALL PROGRAMMING MODEL 44 5.1. LINUX DEVICE DRIVER ARCHITECTURE 44 5.2. DSP DRIVER IMPLEMENTATION 44 5.2.1. driver memory map 45 5.2.2. ioctl() 46 5.2.2.1. PAC_INIT 46 5.2.2.2. PAC_START 47 5.2.3. mmap() 47 5.2.4. write() 47 5.2.5. read() 47 5.2.6. interrupt service routine(ISR) 48 5.2.6.1. data movement ISR 48 5.2.6.2. DMA complete ISR 48 6. PROGRAMMING MODEL ANALYSIS AND PERFORMANCE COMPARISON 49 6.1. MPU DECODE FULL ED – POLLING PROGRAMMING MODEL 49 6.2. MPU DECODE FULL ED – INTERRUPT PROGRAMMING MODEL 50 6.3. MPU DECODE PARTIAL ED – INTERRUPT PROGRAMMING MODEL 53 7. CONCLUSION AND FUTURE WORK 55 REFERENCE 56

    [1] J. V. Team, Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification. ITU-T Rec. H.264 and ISO/IEC 14496-10 AVC, May 2003.

    [2] To-Wei Chen, Yu-Wen Huang, Tung-Chien Chen, Yu-Han Chen, Chuan-Yung Tsai and Liang-Gee Chen,” Architecture Design of H.264/AVC Decoder with Hybrid Task Pipelining for High Definition Videos.” In Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS 2005), Kobe, Japan, 2005.

    [3] http://www.ti.com/corp/docs/landing/davinci/index.html

    [4] Xing Qin, Xiaolang Yan” A Memory and Speed Efficient CAVLC Decoder” in
    Visual Communications and Image Processing 2005

    [5] Yong Ho Moon, Gyu Yeong Kim, and Jae Ho Kim, Member, IEEE” An Efficient Decoding of CAVLC in H.264/AVC Video Coding Standard” in IEEE Transactions on Consumer Electronics, Vol. 51, No. 3, AUGUST 938 2005

    [6] Jia-Ming Chen, Hsin-Wen We, Jian-Liang Luo, Pou-Hang Ian, and Wei-Kuan Shih ”H.264/AVC Decoder Realization on a SoC Platform”

    [7] Jian-Liang Luo “Implementation and Optimization of H.264 baseline profile decoder on PACDSP dual core platform”

    [8] Cheng-Nan Chiu, Chien-Tang Tseng, and Chun-Jen Tsai “TIGHTLY-COUPLED MPEG-4 VIDEO ENCODER FRAMEWORK ON ASSYMETRIC DUAL-CORE PLATFORMS”

    [9] Peng Li, Yu Lu, Shen Li, Hongxing Wei “Realization of Embedded Multimedia System Based On Dual-Core Processor OMAP5910”

    [10] Nedovodeev K. V. “Multimedia data processing on dual-core SoC Multicore-24”

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE