運用於嵌入式訊號處理器之向量暫存器架構設計與模擬

簡易檢索 / 詳目顯示

回結果列表

研究生：	洪銘彥 Ming-Yen Hong
論文名稱：	運用於嵌入式訊號處理器之向量暫存器架構設計與模擬 Vector Register Architecture Design and Simulation on Embedded DSP Processor
指導教授：	吳仁銘 Jen-Ming Wu
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2008
畢業學年度：	96
語文別：	英文
論文頁數：	73
中文關鍵詞：	暫存器架構
相關次數：	點閱：4 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

Single Instruction Multiple Data (SIMD) is powerful in multimedia processing. Usually, for a conventional 32-bit machine, if one data unit is 8-bit in width, one SIMD instruction can operate on four units at a time and thus reach data parallelism to four. These data units are often be regarded as subwords in SIMD processing. However, performance of SIMD is restricted by ill subword permutation in register file. Therefore, we propose a new architecture of register file named Vector Register (VR) architecture. With Vector Register, subwords can be well permuted in register file without bringing heavy traffic between memory and register file. We have designed three benchmarks, matrix transposition, deblocking filter, and discrete cosine transform (DCT) based on H.264/AVC, and set up a deliberating simulation flow on Starfish DSP (digital signal processor) simulator. The simulation results shows that, in average, we can improve cycle count, instruction count of these benchmarks to 30.865%, 30.606%, respectively.

在多媒體處理的領域中，由於資料的特性，單一指令操作於多重資料( Single Instruction Multiple Data, SIMD )的運算處理技術是有效及廣泛被使用的。通常，對於一台32-bit的機器來說，假如一個運算資料單位是8-bit的話，一條SIMD的指令可以同時操作於4個資料單位，因此也能將運算的平行度提升到4。這些運算資料單元在SIMD運算處理技術中，時常被稱之為子字符( subword )。然而，SIMD運算的效能常常受限於這些subwords在暫存器( register )之間的排列狀況。因此，為了解決subwords的排列問題，我們提出了一種新的暫存器架構，稱之為向量暫存器架構( Vector Register Architecture )。藉由向量暫存器架構，我們可以更自由地在暫存器間，排列、重組這些subwords，而不需要在暫存器跟記憶體之間，製造大量的資料流量。為了模擬與驗證向量暫存器的效能，我們基於新一代的影像壓縮技術─H.264/AVC，設計了三組標準測試程式( benchmark )，這些程式分別是矩陣轉置( matrix transposition)，去方塊效應濾波器( deblocking filter)，離散餘弦轉換 ( discrete cosine transform)。我們並設計了一套清楚的模擬流程去進行向量暫存器架構的模擬。透過這套流程，我們的模擬結果顯示：向量暫存器架構能有效地降低指令所消耗的週期數( cycle count )，以及所需要的指令數( instruction count )。平均而言，透過向量暫存器架構，我們能分別改善cycle count達到30.865%，instruction count達到30.606%。

Contents
Introduction                                    1
1 Research Motivation                                        1
2 Organization of This Thesis                               2
Starfish DSP Architecture                                     5
1 Introduction                                            5
2 Architecture of Starfish DSP                                6
2.1 Register File Architecture                                6
2.2 Pipeline Architecture                                    7
2.3 Instruction Set Architecture                                9
3 Software Toolkit of Starfish DSP                            11
3.1 Toolchain                                            11
3.2 Simulator                                            12
3.3 Debuger                                                13
H.264/AVC Video Coding                                    15
1 Introduction                                            15
1.1 Terminologies in H.264/AVC                                16
1.2 H.264/AVC Encoder                                    17
1.3 H.264/AVC Decoder                                    20
2 Critical Functions in H.264/AVC                                20
2.1 Discrete Cosine Transform (DCT)                             21
2.2 Deblocking Filter                                         24
Vector Register Architecture                                31
1 Introduction                                            31
2 Previous Studies and Works                                32
3 Principle of Vector Register                                33
4 Hardware Architecture of Vector Register                    34
4.1 Register File                                            34
4.2 Status Flag                                             34
5 ISA of Vector Register                                    34
6 Issues Regarding to Vector Register                            36
6.1 Register Pressure                                         36
6.2 Pipeline Data Hazard Detection and Register Bypassing         37
Simulation                                                43
1 Introduction                                            43
1.1 Assumptions and Terminologies                            44
2 An Overview of Simulation Flow                            44
3 Modification of Starfish DSP Simulator                        44
3.1 Principle of Starfish DSP Simulator                        44
3.2 Implementations of VR instructions                        47
4 Composing Vector Register Benchmarks                     51
4.1 Matrix Transposition                                    51
4.2 Deblocking Filter                                        52
4.3 DCT                                                    55
5 Code Generation                                        59
6 Simulation Results                                        61
6.1 Matrix Transposition                                     61
6.2 Deblocking Filter                                         63
6.3 DCT                                                    64
6.4 Summary                                            66
Conclusion                                                69

                                

[1] ”Analog Devices - Embedded Processing and DSP - Blackfin Processor Home”.
http://www.analog.com/processors/blackfin/.
[2] Juinn-Dar Huang. Members of starfish C1 group. ”An Overview to the Pipeline
Archtecture of Star IP DSP Processor”. http://nthucad.cs.nthu.edu.tw/ starip.
[3] Iain E. G. Richardson. ”H.264 and MPEG-4 Video Compression Video Coding for
Next-generation Multimedia”. John Wiley and Sons, 2003.
[4] G.J. Bjntegaard G. Luthra A. Wiegand, T. Sullivan. ”Overview of the H.264/AVC
video coding standard”. IEEE trans. Circuits and Systems for Video Technology,
13(7):560–576, July 2003.
[5] Y. Kamaci, N. Altunbasak. ”Performance comparison of the emerging H.264 video
coding standard with the existing standards”. Multimedia and Expo, 2003. ICME
’03., 1:345–348, July 2003.
[6] Xue Quan. Liu Jilin. Wang Shijie. Zhao Jiandong. ”H.264/AVC baseline profile de-
coder optimization on independent platform”. International Conference on Wire-
less Communications, Networking and Mobile Computing, 2005., 2:1253–1256, Sept
2005.
BIBLIOGRAPHY 72
[7] D. Ligang Lu. Ming-Ting Sun Jian Lou. Jagmohan, A. He. ”Statistical Analysis
Based H.264 High Profile Deblocking Speedup”. IEEE International Symposium
on Circuits and Systems, 2007. ISCAS 2007., pages 3143–3146, May 2007.
[8] Joint Video Team of ITU-T and ISO/IEC JTC 1. ”Draft ITU-T Recommenda-
tion and Final Draft International Standard of Joint Video Specification (ITU-
T Rec. H.264 — ISO/IEC 14496-10 AVC)”. document JVT-G050r1, May 2003;
technical corrigendum 1 documents JVTK050r1 (non-integrated form) and JVT-
K051r1 (integrated form),March 2004; and Fidelity Range Extensions documents
JVT-L047(nonnonintegrated form) and JVT-L050 (integrated form),, July 2004.
[9] S.G. Donglok Kim. Yongmin Kim Yoochang Jung. Berg. ”A register file with
transposed access mode”. International Conference on Computer Design, 2000.,
pages 559–560, September 2000.
[10] Asadollah Shahbahrami. Ben Juurlink. Stamatis Vassiliadis. ”Matrix register file
and extended subwords: two techniques for embedded media processors”. Confer-
ence On Computing Frontiers, pages 171–179, 2005.
[11] Asadollah Shahbahrami. Ben Juurlink. Stamatis Vassiliadis. ”Accelerating Color
Space Conversion Using Extended Subwords and the Matrix Register File”. Eighth
IEEE International Symposium on Multimedia, 2006. ISM’06., pages 37–46, Dec
2006.
[12] John Oliver. Venkatesh Akella. Frederic Chong. ”Efficient orchestration of sub-
word parallelism in media processors”. ACM Symposium on Parallel Algorithms
and Architectures, pages 225–234, 2004.
[13] R.B. Lee. ”Subword permutation instructions for two-dimensional multimedia
processing in MicroSIMD architectures”. IEEE International Conference on
73 BIBLIOGRAPHY
Application-Specific Systems, Architectures, and Processors, 2000., pages 3–14,
2000.
[14] U. Peleg, A. Weiser. ”MMX technology extension to the Intel architecture”. Micro,
IEEE, 16(4):42–50, Aug 1996.
[15] W.J. Khailany B. Mattson P. Kapasi-U.J. Owens J.D. Rixner, S. Dally. ”Regis-
ter organization for media processing”. Sixth International Symposium on High-
Performance Computer Architecture, 2000. HPCA-6., pages 375–386, Jan 2000.
[16] E. Dutt N.D. Nicolau-A. Paek Yunheung Shrivastava, A. Park Sanghyun. Earlie.
”Automatic Design Space Exploration of Register Bypasses in Embedded Proces-
sors”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, 26(12):2102–2115, Dec 2007.
[17] ”H.264/AVC Software Coordination”. http://iphome.hhi.de/suehring/tml/.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文