研究生: |
蔡欣恬 Tsai, Hsin Tien |
---|---|
論文名稱: |
應用高階合成之硬體加速器效能改善技術 Performance Optimization of Accelerators using C-based High-Level Synthesis Flow |
指導教授: |
黃稚存
Huang, Chih Tsun |
口試委員: |
李毅郎
Li, Yih Lang 劉靖家 Liou, Jing Jia 金仲達 King, Chung Ta |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 英文 |
論文頁數: | 46 |
中文關鍵詞: | 高階合成 、加速器 |
外文關鍵詞: | HLS, accelerator |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
未來有部分的系統會使用許多的加速器來加速系統效能,因此什麼程式或程式中的哪些功能做成加速器可以有效加速系統效能以及該加速器的架構應該如何選擇會是一個重要的研究課題。
傳統的設計流程主要是以暫存器傳輸級(Register-Transfer Level;RTL)來進行模擬,但是如果以暫存器傳輸級設計流程做加速器的設計架構空間探索(design space exploration)是相當耗時的,因此我們選擇較高階的高階合成(High-Level Synthesis;HLS)設計流程做加速器的設計架構空間探索。
在高階合成設計流程中,我們發現可能會因為程式本身架構的關係,使得無論如何改變高階合成中的各種參數仍無法優化效能,其中最重要的原因就是記憶體。因此我們提出memory-remapping的方法改變高階合成中的記憶體分區(memory partition)功能中的記憶體映射(memory mapping)來優化效能。
另外在我們研究的過程中發現高階合成的工具還存在一些缺陷,而這些缺陷會使得最後產生的結果效能不如預期,因此我們提出了三種補丁來填補這些缺陷。
實驗結果顯示我們提出的方法可以優化程式的效能,並得到執行時間較佳的設計。
High-level synthesis (HLS) has made significant progress in compiling high-level programs into register-transfer level (RTL) specifications. Memory partitioning in HLS can efficiently map data elements in the same logical array onto multiple physical banks. But manually rewriting code is still necessary in order to obtain better quality of results in memory system optimization.
In this thesis we provide a memory-remapping methodology to optimize the memory partitioning and the performance. We use Aladdin in our flow in order to quickly do the design space exploration and generate dynamic data dependence graphs (DDDG). Build the graphs with memory accesses, memory partitions, and scheduled cycles to illustrate the status of memory accesses after scheduling and move candidates on the graphs to reduce the total cycles and produce the better data placement in memory partitions. We optimize the code with the better data placement in memory partitions.
Vivado HLS tool can generate different architectures of one design by applying different user configurations.
For these user applied configurations, Vivado HLS uses a general way to implement so it may limits the behavior of the generated RTL design. For the limitations in Vivado HLS tool, we proposed three patches to break the limitations. After adding patches, these limitations in Vivado HLS can be solved.
Experiment results on Vivado HLS show that our approach can optimize design with better total cycle and the results on Design Compiler also prove that after synthesis the design into gate-level, the optimized design generated by our methodology has better performance.
[1] Y. S. Shao, B. Reagen, G.-Y. Wei, and D. Brooks \Aladdin: A Pre-RTL, Power-
Performance Accelerator Simulator Enabling Large Design Space Exploration of Cus-
tomized Architectures," in ACM/IEEE International Symposium on Computer Archi-
tecture (ISCA), pp. 97-108 Jun. 2014.
[2] Avalible:http://accelerator.eecs.harvard.edu/isca14tutorial/isca2014-tutorial-cad-
benchmarks.pdf
[3] J. Cong, W. Jiang, B. Liu, and Y. Zou \Automatic Memory Partitioning and Scheduling
for Throughput and Power Optimization" in ACM, 2011.
[4] Y. Wang, P. Zhang, X. Cheng, and J. Cong \An Integrated and Automated Memory
Optimization Flow for FPGA Behavioral Synthesis" in ASPDAC, 2012.
[5] P. Li, Y. Wang, P. Zhang, G. Luo, T. Wang, and J. Cong \Memory Partitioning and
Scheduling Co-optimization in Behavioral Synthesis" in ICCAD, 2012.
[6] Y. Wang, P. Li, P. Zhang, C. Zhang, and J. Cong \Memory Partitioning for Multidi-
mensional Arrays in High-Level Synthesis" in DAC, 2013.
[7] Y. Wang, P. Li, and J. Cong \Theory and Algorithm for Generalized Memory Parti-
tioning in High-Level Synthesis" in FPGA, 2014.
[8] P. Li, P. Zhang, L. -N. Pouchet, and J. Cong \Resource-Aware Throughput Optimiza-
tion for High-Level Synthesis" in FPGA, 2015.
[9] M. Li, P. Zhang, C. Zhu, H. Jia, X. Xie, J. Cong, and W. Gao \High Eciency VLSI
Implementation of an Edge-directed Video Up-scaler Using High Level Synthesis" in
IEEE International Conference on Consumer Electronics (ICCE), 2015.