硬體線程級投機效能分析｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	王瀅捷 Wang, Ying-Chieh
論文名稱：	硬體線程級投機效能分析 Hardware Thread-Level Speculation Performance Analysis
指導教授：	李哲榮 Lee, Che-Rung
口試委員:	周志遠 Jerry Chou 許慶賢 Hsu, Ching-Hsien
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2015
畢業學年度：	103
語文別：	英文
論文頁數：	47
中文關鍵詞：	線程級投機、效能分析
外文關鍵詞：	Thread-Level Speculation, Performance Analysis
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

線程級投機是一種平行的架構。線程級投機可以免除直接編譯平行程式時所需的問題分析，且這對於程式開發者建立平行程式是很有幫助的。然而，對於平行程式來說，效能才是最重要的問題。因此，我們分析IBM Blue Gene/Q 電腦上硬體線程級投機的效能。
此篇論文會展示在IBM Blue Gene/Q 電腦上硬體線程級投機的效能模型。此模型有很好的效能預測，也經由實驗驗證過。這個模型能夠幫助我們了解利用特殊目的的線程級投機可以讓需要以單一執行來避免記憶體衝突的程式得到多少潛在的效能加速。基於分析和測量線程級投機的運作和成本，我們推出一個能夠幫助我們利用這個硬體的特色的策略。除此之外，我們比較了硬體線程級投機和OpenMP的效能。基於效能分析，我們給了一個方向，幫助我們決定硬體線程級投機和OpenMP哪個效能較好。這個結果不但可以幫助使用者利用程級投機，同時也能提供未來線程級投機架構設計改進的方向。

Thread-Level Speculation (TLS) is one of the parallel frameworks. TLS can avoid the analysis problem of compiler-directed code parallelization and this is helpful for programmers to generate parallel programs. However, the performance is the most important issue for parallel programs. Therefore, we analyse the performance of hardware Thread-Level Speculation (TLS) in the IBM Blue Gene/Q computer.

This paper presents a performance model for hardware Thread-Level Speculation (TLS) in the IBM Blue Gene/Q computer. The model shows good performance prediction, as verified by the experiments. The model helps to understand potential gains from using special purpose TLS hardware to accelerate the performance of codes that, in a strict sense, require serial processing to avoid memory conflicts. Based on analysis and measurements of the TLS behavior and its overhead, a strategy is proposed to help utilize this hardware feature. Furthermore, we compare the performance of hardware Thread-Level Speculation and OpenMP. Based on the performance analysis, we give a direction for deciding between this two parallel frameworks. And the results can not only help users to utilize the TLS but also suggest potential improvement for the future TLS architectural designs.

Introduction 3
Background 6
Timing Analysis on Thread-level Speculation 9
1 Performance model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1 Execution without any conflict . . . . . . . . . . . . . . . . . . 10
1.2 Execution with single conflict . . . . . . . . . . . . . . . . . . 10
1.3 Execution with two conflicts . . . . . . . . . . . . . . . . . . . 14
1.4 Execution with more than 2 conflicts . . . . . . . . . . . . . . 16
2 Rollback estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Comparison of Thread-Level Speculation and OpenMP 23
1 Timing analysis on OpenMP . . . . . . . . . . . . . . . . . . . . . . . 23
2 Performance comparison . . . . . . . . . . . . . . . . . . . . . . . . . 23
Experiments 29
1 Experiments of Thread-Level speculation performance model . . . . . 29
2 Experiments of OpenMP performance model . . . . . . . . . . . . . . 32
3 Experiments of comparison between OpenMP and Thread-Level Speculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1 Experiments on sample hashing program . . . . . . . . . . . . 34
3.2 Experiments on LAMMPS . . . . . . . . . . . . . . . . . . . . 37
3.3 Experiments on GROMACS . . . . . . . . . . . . . . . . . . . 37
Discussion and conclusion 43
                                

[1] Arnamoy Bhattacharyya. Using combined proling to decide when thread level
speculation is protable. In Proceedings of the 21st International Conference on
Parallel Architectures and Compilation Techniques, PACT '12, pages 483{484,
New York, NY, USA, 2012. ACM.
[2] Anasua Bhowmik and Manoj Franklin. A general compiler framework for spec-
ulative multithreading. In Proceedings of the Fourteenth Annual ACM Sympo-
sium on Parallel Algorithms and Architectures, SPAA '02, pages 99{108, New
York, NY, USA, 2002. ACM.
[3] Derek Bruening, Srikrishna Devabhaktuni, and Saman Amarasinghe. Softspec:
Software-based speculative parallelism. In In 3rd ACM Workshop on Feedback-
Directed and Dynamic Optimization (FDDO-3, 1998.
[4] Shailender Chaudhry, Robert Cypher, Magnus Ekman, Martin Karlsson, An-
ders Landin, Sherman Yip, Hakan Zeer, and Marc Tremblay. Simultaneous
speculative threading: A novel pipeline architecture implemented in sun's rock
processor. In Proceedings of the 36th Annual International Symposium on Com-
puter Architecture, ISCA '09, pages 484{495, New York, NY, USA, 2009. ACM.
[5] Marcelo Cintra and Diego R. Llanos. Toward ecient and robust software
speculative parallelization on multiprocessors. SIGPLAN Not., 38(10):13{24,
June 2003.
[6] Lance Hammond, Mark Willey, and Kunle Olukotun. Data speculation support
for a chip multiprocessor. In Proceedings of the Eighth International Conference
on Architectural Support for Programming Languages and Operating Systems,
ASPLOS VIII, pages 58{69, New York, NY, USA, 1998. ACM.
[7] Armin Heindl and Gilles Pokam. An analytic framework for performance mod-
eling of software transactional memory. Comput. Netw., 53(8):1202{1214, June
2009.
[8] Maurice Herlihy and J. Eliot B. Moss. Transactional memory: Architectural
support for lock-free data structures. In Proceedings of the 20th Annual Inter-
national Symposium on Computer Architecture, ISCA '93, pages 289{300, New
York, NY, USA, 1993. ACM.
[9] Tom Knight. An architecture for mostly functional languages. In Proceedings
of the 1986 ACM Conference on LISP and Functional Programming, LFP '86,
pages 105{112, New York, NY, USA, 1986. ACM.
[10] Donald E. Porter and Emmett Witchel. Understanding transactional memory
performance. In Proceedings of the IEEE International Symposium on Perfor-
mance Analysis of Systems and Software (ISPASS), pages 97{108, 2010.
[11] Gurindar S. Sohi, Scott E. Breach, and T. N. Vijaykumar. Multiscalar proces-
sors. SIGARCH Comput. Archit. News, 23(2):414{425, May 1995.
[12] Josep Torrellas, Luis Ceze, James Tuck, Calin Cascaval, Pablo Montesinos,
Wonsun Ahn, and Milos Prvulovic. The bulk multicore architecture for im-
proved programmability. Commun. ACM, 52(12):58{65, December 2009.
[13] M. Tremblay. Majc: Microprocessor architecture for java computing. Hot
Chips, 1999.
[14] AmyWang, Matthew Gaudet, PengWu, Jose Nelson Amaral, Martin Ohmacht,
Christopher Barton, Raul Silvera, and Maged Michael. Evaluation of blue
gene/q hardware support for transactional memories. In Proceedings of the
21st International Conference on Parallel Architectures and Compilation Tech-
niques, PACT '12, pages 127{136, New York, NY, USA, 2012. ACM.
[15] Paraskevas Yiapanis, Demian Rosas-Ham, Gavin Brown, and Mikel Lujan. Op-
timizing software runtime systems for speculative parallelization. ACM Trans.
Archit. Code Optim., 9(4):39:1{39:27, January 2013.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文