通用圖形處理器內晶片網路之動態頻率調整機制

簡易檢索 / 詳目顯示

回結果列表

研究生：	杜巧韻 Tu, Chiao-Yun
論文名稱：	通用圖形處理器內晶片網路之動態頻率調整機制 Run-Time Frequency Scaling of On-Chip Networks for GPGPU
指導教授：	金仲達 King, Chung-Ta
口試委員:	黃婷婷 Hwang, Ting-Ting 劉靖家 Liou, Jing-Jia
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2013
畢業學年度：	101
語文別：	英文
論文頁數：	43
中文關鍵詞：	通用圖形處理器、晶片網路、動態頻率調整
外文關鍵詞：	Dynamic Frequency Scaling, Many-to-Few-to-Many
相關次數：	點閱：1 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

現今的通用圖形處理器(GPGPUs) 於高度平行的應用程式上能提供相較於一般用途處理器 (CPU) 十倍甚至百倍的運算能力。由於通用圖形處理器於晶片網路(Network-on-Chip)上所表現的傳輸行為模式與一般用途處理器不同，傳統設計給一般用途處理器的晶片網路架構並不適用給通用圖形處理器。此碩士論文中提出了一個動態頻率調整機制，根據晶片網路處理的負荷量調整網路頻率以符合不同應用程式對於頻寬的需求。
在此碩士論文裡，首先，我們探討通用圖形處理器在網路的傳輸行為模式並將它們分成三種類型。在不同的類型下，需求網路和回覆網路由於網路負載量的差異會有不同的頻寬需求。其次，我們動態的監控某些運算核心(shader core)並預測網路的負荷量，再依據前面研究階段的類型特徵去調節網路頻率。實驗結果顯示此動態頻率調節機制最高可以提升二十七百分比的性能(平均能夠提升七點四百分比的性能)。

Modern General Purpose computing on GPUs (GPGPUs) provide orders of magnitude more computing power than general purpose processors (CPU) for highly parallel applications. Since the traffic pattern of GPGPUs behaves considerably different than CPU, the conventional interconnection network designs for CPU are not applicable for GPGPUs. This thesis proposes a run-time dynamic frequency scaling mechanism that can meet the bandwidth demands of different applications by tuning the frequency of network in response to the network load. In this thesis, we first investigate the characteristics of GPGPU traffic pattern and classify the traffic patterns of GPGPUs to three types. Under the different types, the request network and reply network require different bandwidth to handle the network load. Second, we leverage the property to regulate the network
frequency dynamically by monitoring some shader cores and predict the network load. Evaluation show that this dynamic frequency tuning design can achieve up to 27% improvement compared to baseline setting (on average, it results 7.4 % improvement).

Introduction 1
Background 5
1 Baseline GPGPU Architecture 5
2 Characteristics of GPGPU Applications  6
2.1 Many-to-Few-to-Many Traffic Pattern  6
2.2 Characteristic of GPGPU Traffic Pattern  7
Methodology 13
1 DFS Policy Overview  13
1.1 Data collection  15
1.2 Parameter computation  15
1.3 Dynamic Frequency Control Algorithm  16
2 Frequency Tuning Rationale  17
2.1 Ideal Frequency Tuning Technique  19
2.2 Practical Frequency Tuning Technique  20
3 Scaling Overhead and Hardware Implementation  20
3.1 Scaling Overhead  20
3.2 Hardware Implementation  21
Experimental Evaluation 23
1 Simulation Setup  23
2 Evaluation Result  25
2.1 Network Limit Exploration  25
2.2 Core Monitor Exploration  26
2.3 Time Window Size  27
2.4 Performance Result  29
2.5 Power Consumption  32
2.6 Performance Gain by Ideal DFS Mechanism  34
Related Work 37
Conclusion and Future Work 39
1 Conclusion  39
2 Future Work . 40
                                

[1] NVIDIA, “Cuda zone.” [Online]. Available: http://www.nvidia.com/cuda
[2] ——, “Nvidias next generation cuda compute architecture: Fermi,” 2009.
[3] A. Bakhoda, J. Kim, and T. Aamodt, “Throughput-effective on-chip networks for manycore accelerators,” in Proceedings of the 2010 43rd Annual IEEE/ACM international symposium on Microarchitecture. IEEE Computer Society, 2010, pp. 421–432.
[4] W. Dally and B. Towles, Principles and practices of interconnection networks. Morgan
Kaufmann, 2004.
[5] A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt, “Analyzing cuda workloads using
a detailed gpu simulator,” in Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on. IEEE, 2009, pp. 163–174.
[6] K. J. Nowka, G. D. Carpenter, E. W. MacDonald, H. C. Ngo, B. C. Brock, K. I. Ishii, T. Y. Nguyen, and J. L. Burns, “A 32-bit powerpc system-on-a-chip with support for dynamic voltage scaling and dynamic frequency scaling,” Solid-State Circuits, IEEE Journal of, vol. 37, no. 11, pp. 1441–1447, 2002.
[7] R. M. Senger, E. D. Marsman, G. A. Carichner, S. Kubba, M. S. McCorquodale, and
R. B. Brown, “Low-latency, hdl-synthesizable dynamic clock frequency controller with self-referenced hybrid clocking,” in Circuits and Systems, 2006. ISCAS 2006. Proceedings. 2006
IEEE International Symposium on. IEEE, 2006, pp. 4–pp.
[8] L. Shang, L.-S. Peh, and N. K. Jha, “Dynamic voltage scaling with links for power opti-
mization of interconnection networks,” in High-Performance Computer Architecture, 2003.
HPCA-9 2003. Proceedings. The Ninth International Symposium on. IEEE, 2003, pp. 91–102.
[9] “Parboil benchmark suite.” http://impact.crhc.illinois.edu/parboil.php.
[10] Pcchen, “N-queens solver,” http://forums.nvidia.com/index.php?showtopic=76893.
[11] S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S. Lee, and K. Skadron, “Rodinia: A
benchmark suite for heterogeneous computing,” in Workload Characterization, 2009. IISWC
2009. IEEE International Symposium on. IEEE, 2009, pp. 44–54.
[12] NVIDIA, “Nvidia cuda sdk code samples.” [Online]. Available:
http://developer.download.nvidia.com/compute/cuda/sdk /website/samples.html.
[13] K. Choi, R. Soma, and M. Pedram, “Fine-grained dynamic voltage and frequency scaling
for precise energy and performance tradeoff based on the ratio of off-chip access to on-chip
computation times,” Computer-Aided Design of Integrated Circuits and Systems, IEEE Trans-
actions on, vol. 24, no. 1, pp. 18–28, 2005.
[14] H. Kim, J. Kim,W. Seo, Y. Cho, and S. Ryu, “Providing cost-effective on-chip network band-
width in gpgpus,” in Computer Design (ICCD), 2012 IEEE 30th International Conference on.
IEEE, 2012, pp. 407–412.
[15] G. Semeraro, G. Magklis, R. Balasubramonian, D. H. Albonesi, S. Dwarkadas, and M. L.
Scott, “Energy-efficient processor design using multiple clock domains with dynamic voltage
and frequency scaling,” in High-Performance Computer Architecture, 2002. Proceedings.
Eighth International Symposium on. IEEE, 2002, pp. 29–40.
[16] W. Kim, M. S. Gupta, G.-Y. Wei, and D. Brooks, “System level analysis of fast, per-core
dvfs using on-chip switching regulators,” in High Performance Computer Architecture, 2008.
HPCA 2008. IEEE 14th International Symposium on. IEEE, 2008, pp. 123–134.
[17] A. K. Mishra, R. Das, S. Eachempati, R. Iyer, N. Vijaykrishnan, and C. R. Das, “A case
for dynamic frequency tuning in on-chip networks,” in Microarchitecture, 2009. MICRO-42.
42nd Annual IEEE/ACM International Symposium on. IEEE, 2009, pp. 292–303.
[18] J. Lee, V. Sathisha, M. Schulte, K. Compton, and N. S. Kim, “Improving throughput of
power-constrained gpus using dynamic voltage/frequency and core scaling,” in Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on. IEEE, 2011, pp. 111–120.
[19] A. Samih, R. Wang, A. Krishna, C. Maciocco, T.-Y. C. Tai, and Y. Solihin, “Energy-efficient
interconnect via router parking.” in HPCA, 2013, pp. 508–519.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文