研究生: |
洪銘彥 Ming-Yen Hong |
---|---|
論文名稱: |
運用於嵌入式訊號處理器之向量暫存器架構設計與模擬 Vector Register Architecture Design and Simulation on Embedded DSP Processor |
指導教授: |
吳仁銘
Jen-Ming Wu |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2008 |
畢業學年度: | 96 |
語文別: | 英文 |
論文頁數: | 73 |
中文關鍵詞: | 暫存器架構 |
相關次數: | 點閱:4 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Single Instruction Multiple Data (SIMD) is powerful in multimedia processing. Usually, for a conventional 32-bit machine, if one data unit is 8-bit in width, one SIMD instruction can operate on four units at a time and thus reach data parallelism to four. These data units are often be regarded as subwords in SIMD processing. However, performance of SIMD is restricted by ill subword permutation in register file. Therefore, we propose a new architecture of register file named Vector Register (VR) architecture. With Vector Register, subwords can be well permuted in register file without bringing heavy traffic between memory and register file. We have designed three benchmarks, matrix transposition, deblocking filter, and discrete cosine transform (DCT) based on H.264/AVC, and set up a deliberating simulation flow on Starfish DSP (digital signal processor) simulator. The simulation results shows that, in average, we can improve cycle count, instruction count of these benchmarks to 30.865%, 30.606%, respectively.
在多媒體處理的領域中,由於資料的特性,單一指令操作於多重資料( Single Instruction Multiple Data, SIMD )的運算處理技術是有效及廣泛被使用的。通常,對於一台32-bit的機器來說,假如一個運算資料單位是8-bit的話,一條SIMD的指令可以同時操作於4個資料單位,因此也能將運算的平行度提升到4。這些運算資料單元在SIMD運算處理技術中,時常被稱之為子字符( subword )。然而,SIMD運算的效能常常受限於這些subwords在暫存器( register )之間的排列狀況。因此,為了解決subwords的排列問題,我們提出了一種新的暫存器架構,稱之為向量暫存器架構( Vector Register Architecture )。藉由向量暫存器架構,我們可以更自由地在暫存器間,排列、重組這些subwords,而不需要在暫存器跟記憶體之間,製造大量的資料流量。為了模擬與驗證向量暫存器的效能,我們基於新一代的影像壓縮技術─H.264/AVC,設計了三組標準測試程式( benchmark ),這些程式分別是矩陣轉置( matrix transposition),去方塊效應濾波器( deblocking filter),離散餘弦轉換 ( discrete cosine transform)。我們並設計了一套清楚的模擬流程去進行向量暫存器架構的模擬。透過這套流程,我們的模擬結果顯示:向量暫存器架構能有效地降低指令所消耗的週期數( cycle count ),以及所需要的指令數( instruction count )。平均而言,透過向量暫存器架構,我們能分別改善cycle count達到30.865%,instruction count達到30.606%。
[1] ”Analog Devices - Embedded Processing and DSP - Blackfin Processor Home”.
http://www.analog.com/processors/blackfin/.
[2] Juinn-Dar Huang. Members of starfish C1 group. ”An Overview to the Pipeline
Archtecture of Star IP DSP Processor”. http://nthucad.cs.nthu.edu.tw/ starip.
[3] Iain E. G. Richardson. ”H.264 and MPEG-4 Video Compression Video Coding for
Next-generation Multimedia”. John Wiley and Sons, 2003.
[4] G.J. Bjntegaard G. Luthra A. Wiegand, T. Sullivan. ”Overview of the H.264/AVC
video coding standard”. IEEE trans. Circuits and Systems for Video Technology,
13(7):560–576, July 2003.
[5] Y. Kamaci, N. Altunbasak. ”Performance comparison of the emerging H.264 video
coding standard with the existing standards”. Multimedia and Expo, 2003. ICME
’03., 1:345–348, July 2003.
[6] Xue Quan. Liu Jilin. Wang Shijie. Zhao Jiandong. ”H.264/AVC baseline profile de-
coder optimization on independent platform”. International Conference on Wire-
less Communications, Networking and Mobile Computing, 2005., 2:1253–1256, Sept
2005.
BIBLIOGRAPHY 72
[7] D. Ligang Lu. Ming-Ting Sun Jian Lou. Jagmohan, A. He. ”Statistical Analysis
Based H.264 High Profile Deblocking Speedup”. IEEE International Symposium
on Circuits and Systems, 2007. ISCAS 2007., pages 3143–3146, May 2007.
[8] Joint Video Team of ITU-T and ISO/IEC JTC 1. ”Draft ITU-T Recommenda-
tion and Final Draft International Standard of Joint Video Specification (ITU-
T Rec. H.264 — ISO/IEC 14496-10 AVC)”. document JVT-G050r1, May 2003;
technical corrigendum 1 documents JVTK050r1 (non-integrated form) and JVT-
K051r1 (integrated form),March 2004; and Fidelity Range Extensions documents
JVT-L047(nonnonintegrated form) and JVT-L050 (integrated form),, July 2004.
[9] S.G. Donglok Kim. Yongmin Kim Yoochang Jung. Berg. ”A register file with
transposed access mode”. International Conference on Computer Design, 2000.,
pages 559–560, September 2000.
[10] Asadollah Shahbahrami. Ben Juurlink. Stamatis Vassiliadis. ”Matrix register file
and extended subwords: two techniques for embedded media processors”. Confer-
ence On Computing Frontiers, pages 171–179, 2005.
[11] Asadollah Shahbahrami. Ben Juurlink. Stamatis Vassiliadis. ”Accelerating Color
Space Conversion Using Extended Subwords and the Matrix Register File”. Eighth
IEEE International Symposium on Multimedia, 2006. ISM’06., pages 37–46, Dec
2006.
[12] John Oliver. Venkatesh Akella. Frederic Chong. ”Efficient orchestration of sub-
word parallelism in media processors”. ACM Symposium on Parallel Algorithms
and Architectures, pages 225–234, 2004.
[13] R.B. Lee. ”Subword permutation instructions for two-dimensional multimedia
processing in MicroSIMD architectures”. IEEE International Conference on
73 BIBLIOGRAPHY
Application-Specific Systems, Architectures, and Processors, 2000., pages 3–14,
2000.
[14] U. Peleg, A. Weiser. ”MMX technology extension to the Intel architecture”. Micro,
IEEE, 16(4):42–50, Aug 1996.
[15] W.J. Khailany B. Mattson P. Kapasi-U.J. Owens J.D. Rixner, S. Dally. ”Regis-
ter organization for media processing”. Sixth International Symposium on High-
Performance Computer Architecture, 2000. HPCA-6., pages 375–386, Jan 2000.
[16] E. Dutt N.D. Nicolau-A. Paek Yunheung Shrivastava, A. Park Sanghyun. Earlie.
”Automatic Design Space Exploration of Register Bypasses in Embedded Proces-
sors”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, 26(12):2102–2115, Dec 2007.
[17] ”H.264/AVC Software Coordination”. http://iphome.hhi.de/suehring/tml/.