研究生: |
陳鈺勳 Chen, Yu-Hsun |
---|---|
論文名稱: |
多核心平台上運算元件間相互通訊之設計與 分析 Design and Analysis of Inter-PE Communication on Many-Core Platform |
指導教授: |
黃稚存
Huang, Chih-Tsun |
口試委員: |
劉靖家
Liou, Jing-Jia 黃稚存 Huang, Chih-Tsun 陳添福 Chen, Tien-Fu |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2012 |
畢業學年度: | 101 |
語文別: | 英文 |
論文頁數: | 84 |
中文關鍵詞: | 多核心 、晶片系統 、晶片網路 、運算元件間相互溝通 |
外文關鍵詞: | Many-Core, SoC, NoC, inter-PE communication |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著現代多核心平台的運算元件數量不斷增加,運算單位互
相通訊的吞吐量與可靠性變成重要的議題,在過去的實作當中,
我們提出一個基於晶片網路的多核心平台,組成的內容有:16 個
運算單位、晶片上通訊程式以及在應用層可以保證運算元件間交
互通訊可靠性之資料傳輸協定,運算元件透過PE-to-PE core 對
晶片網路做資料傳輸藉此與其他運算元件溝通,此PE-to-PE
core 的傳輸效率瓶頸在於每筆資料由CPU 從記憶體搬至
PE-to-PE core 的延遲。我們提出一個改善的架構,藉由簡化軟
硬體間的介面、使用burst-mode 資料傳輸直接搬運記憶體資料
以及資料傳輸協定用硬體實作來改善。
我們分析此改善的架構在相似結構的SystemC 平台上執行的
結果,歸納出匯流排運作的延遲會影響資料傳輸的速度。而在我
們多核心平台上,實驗的結果顯示出運算元件間相互溝通有資料
傳輸協定的最高吞吐量為2687.3Mbps,比改善前的架構
148.5Mbps 快約22.6 倍。使用TSMC 0.13μm CMOS 製程去合成我
們改善後的架構,在頻率100MHz 下,我們的面積約19.1K 個邏
輯閘,約是改善前的69.2%。對於運算元件間相互溝通的效率,
經由速度和面積的比較,我們提出的改善架構具有較快的速度、
及較好的面積使用效率。
With the continuous increase in the number of Processing Elements (PEs) in modern
many-core platforms, the throughput and reliability of inter-PE communication at
application-level has become important issues. In our previous work, we proposed a Networkon-
Chip (NoC) based many-core platform which consists of 16 PEs, on-chip communication
library, and flow control protocol which can guarantee the reliability of inter-PE communication
at application-level. Each PE communicates with each other by using the PE-to-PE
core which is the interface connected to NoC. The bottleneck of communication efficiency is
the latency of one data transmission because the data needs to be read from local memory
by CPU and pushed into the PE-to-PE core. We propose an improved architecture which
simplifies the interface between software and hardware, and accesses local memory directly
with burst-mode data transmission. In addition, we implement the software-level of flow
control protocol into hardware-level.
We analyze behaviors of this improved architecture on the corresponding SystemC platform,
and attribute the latency of data transmission to the speed of local memory access.
The experimental results show that the maximum throughput of inter-PE communication
with flow control protocol is 2687.3Mbps, which is 22.6 times faster than 119.1Mbps in our
previous work. Using TSMC 0.13μm CMOS technology, area of this improved architecture
operating at 100MHz is 19.1K gates, which is 69.2% of previous work. With the comparison
of area and speed, this improved architecture has faster data transmission speed and
area-efficiency of inter-PE communication.
[1] W. Wolf, A. A. Jerraya, and G. Martin, “Multiprocessor System-on-Chip (MPSoC)
Technology”, IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems,
vol. 27, no. 10, pp. 1701–1713, Oct. 2008.
[2] J. del Cuvillo, W. Zhu, Z. Hu, and G. R. Gao, “Toward a Software Infrastructure for
the Cyclops-64 Cellular Architecture”, in Proceedings of the 20th International Symposium
on High-Performance Computing in an Advanced Collaborative Environment,
2006 (HPCS ’06), May 2006.
[3] G. Almsi, C. Cascaval, J. G. Castaos, M. Denneau, D. Lieber, J. E. Moreira, and Jr.
H. S. Warren, “Dissecting Cyclops: a Detailed Analysis of a Multithreaded Architecture”,
in ACM SIGARCH Computer Architecture News, Mar. 2003.
[4] B. Baas, Z. Yu, M. Meeuwsen, O. Sattari, R. Apperson, E. Work, J. Webb, M. Lai,
T. Mohsenin, D. Truong, and J. Cheung, “Asap: A fine-grained many-core platform for
dsp applications”, Micro, IEEE, vol. 27, no. 2, pp. 34–45, Mar. 2007.
[5] S. V. Tota, M. R. Casu, M. R. Roch, L. Rostagno, and M. Zamboni, “MEDEA: a Hybrid
Shared-Memory/Message-Passing Multiprocessor NoC-based Architecture”, pp. 45–50,
Mar. 2010.
[6] A. A. Ravankar and S. G. Sedukhin, “Mesh-of-Tori: A Novel Interconnection Network
for Frontal Plane Cellular Processors”, in Proceedings of International Conference on
Networking and Coomputing (ICNC ’10), Nov. 2010, pp. 281–284.
[7] P. Gehlo and S. S. Chouhan, “Performance Evaluation of Network on Chip Architectures”,
in Proceedings of International Conference on Emerging Trends in Electronic
and Photonic Devices and Systems (ELECTRO ’09), Dec. 2009, pp. 124–127.
[8] S. Bell, B. Edwards, J. Amann, R. Conlin, K. Joyce, V. Leung, J. MacKay, M. Reif,
L. Bao, J. Brown, M. Mattina, C.-C. Miao, C. Ramey, D. Wentzlaff, W. Anderson,
E. Berger, N. Fairbanks, D. Khan, F. Montenegro, J. Stickney, and J. Zook, “Tile64 -
processor: A 64-core soc with mesh interconnect”, in Proceedings of IEEE international
Solid-State Circuits Conference, 2008 (ISSCC ’08), Feb. 2008, pp. 88–598.
[9] P.-Y. Chen and C.-T. Huang, “RTL Realization of NoC-Based Multi-Core Platform”,
in Master Thesis, Department of Computer Science, National Tsing Hua University,
Hsinchu, Taiwan, Oct. 2011.
[10] Arteris S.A., “Arteris FlexNoC Interconnect IP”, http://www.arteris.com/flexnoc.
[11] J. Chi, “Conbus IP Core: Overview”, http://opencores.org/project,wb conbus, Apr.
2003.
[12] D. Lampret, C.-M. Chen, M. Mlinar, J. Rydberg, M. Ziv-Av, C. Ziomkowski, G. Mc-
Gary, B. Gardner, R. Mathur, and M. Bolado, “OpenRISC 1000 Architecture Manual
rev 1.3”, http://opencores.org/or1k/Main Page, May 2006.
[13] Lampret D and Baxter J, “OpenRISC 1200 IP Core Specification rev 0.11”,
http://opencores.org/or1k/Main Page, Jan. 2011.
[14] U. Rudolf, “WISHBONE DMA/Bridge IP Core”,
http://opencores.org/project,wb dma, Jan. 2002.
[15] Silicore OpenCores, “WISHBONE, Revision B.4 Specification”,
http://cdn.opencores.org/downloads/wbspec b4.pdf, June 2010.
[16] OCP-IP, “Open Core Protocol Specification Release 2.2”, http://www.ocpip.org, Jan.
2007.
[17] Arteris S.A., “Arteris NoC Solution 1.16 NoCcompiler User’s Guide o918v10”, Feb.
2009.
[18] Arteris S.A., “Arteris NoC Solution 1.16 NoCexplorer User’s Guide o3088v9”, Feb. 2009.
[19] U. Michael, “RAM wb Core: Overview”, http://opencores.org/project,ram wb, Apr.
2009.
[20] Malcolm Phillips, “Sort Techniques ”Array Sorting””,
http://homepages.ihug.co.nz/ aurora76/Malc/Sorting Array.htm#Exchanging.
[21] B. Wilkinson and M. Allen, Parallel Programming Techniques and Applications Using
Networked Workstations and Parallel Computers 2nd ed., Pearson Education Inc, Mar
2004.
[22] ISO, Information Technology – Digital Compression and Coding of Continuous-Tone
Still Images: Requirements and Guidelines, 1994.
[23] ISO, Information Technology – Digital Compression and Coding of Continuous-Tone
Still Images: Compliance Testing, 1995.
[24] ISO, Information Technology – Digital Compression and Coding of Continuous-Tone
Still Images: Extensions, 1997.
[25] ISO, Information Technology – Digital Compression and Coding of Continuous-Tone
Still Images: Registration of JPEG Profiles, SPIFF Profiles, SPIFF Tags, SPIFF
Colour Spaces, APPn Markers, SPIFF Compression Types and Registration Authorities
(REGAUT), 1999.
[26] Design Automation Standards Committee, IEEE Standard Verilog Hardware Description
Language, The Institute of Electrical and Electronics Engineers, Inc. 3 Park Avenue,
New York, NY 10016-5997, USA, Sep 2001.
[27] R. C. Gonzalez and R. E. Woods, Digital Image Processing (2nd Edition), Prentice Hall,
Jan 2002.
[28] Synopsys, “Design Compiler Command-Line Interface Guide, version B-2008.09-SP2”,
http://www.synopsys.com.