以多核心圖形處理器實現雅可比-大衛森演算法

簡易檢索 / 詳目顯示

回結果列表

研究生：	萬子豪
論文名稱：	以多核心圖形處理器實現雅可比-大衛森演算法
指導教授：	陳人豪
口試委員:
學位類別：	碩士 Master
系所名稱：	南大校區系所調整院務中心 - 應用數學系所應用數學系所(English)
論文出版年：	2014
畢業學年度：	102
語文別：	中文
中文關鍵詞：	圖形處理器、雅可比-大衛森演算法
外文關鍵詞：	Graphics Processing Unit, Jacobi-Davidsons Method
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

摘要
　　Jacobi-Davidsons Method在求解大型稀疏特徵值問題時雖然有極佳的迭代收斂性，但近年來資料規模量逐漸變大，即便擁有極佳的迭代收斂性還是會花上大量得研究成本。因此使用圖形處理器(Graphics Processing Unit,GPU)以協同處理的方式降低研究成本就顯得更為重要。
　　本論文探討如何以圖形處理器加速 Jacobi-Davidsons Method。其中包含基本線性代數運算如矩陣相乘，向量內積和解大行稀疏線性系統，且分析在使用圖形處理器加速前後之效率。
　　研究結果顯示，GPU 之執行結果為正確的。而基本線性代數運算中， GPU可將效率提升 1.95 ~ 4.638 倍，可見其效率提升。然而，整體的 Jacobi-Davidson Method 計算時間卻與 CPU 版本的相近，原因可能與記憶體搬移耗費過多時間，以及本實驗中所使用的 GPU 的計算時脈較低有關。

Accelerating Jacobi-Davidson Method using Multi-core Graphics Processing Unit

Abstract
　　Jacobi-Davidson Method (JDM) has rapid iterative convergence in
solving large sparse eigenvalue problems. However, due to the huge
matrix size, we still have to spend a lot of research costs. This motivates
us to employ the graphics processing unit (GPU) to accelerate the JDM.
Under the framework of Compute Unified Device Architecture
(CUDA), some linear algebraic operations including matrix-matrix
multiplication, vector inner product and the computation of the solution
of the sparse linear system, are accelerated by using GPU. To evaluate the
performance of our code, we also perform these operations and overall
JDM with and without GPU. The results show that the solutions
computed by GPU are correct. Moreover, these linear algebraic
operations via GPU can gain 1.95~4.63 times speedup with respect to
CPU version. However, the performance of overall JDM by using GPU is
comparable to those by CPU. This may be due to many extra works
regarding memory transfer in our GPU code, or slower clock rate in our
GPU.

目錄
第一章   緒論............................................1
    1.1  研究背景........................................1
    1.2  研究動機........................................2 
    1.3  研究目的........................................5 
    1.4  論文架構........................................6
第二章　CUDA背景知識探討................................7
    2.1　  CUDA.........................................7
        2.1.1　CUDA平行化程式設計模型.................7
        2.1.2　CUDA記憶體模型........................8
        2.1.3　NVIDIA GeForce GT 740 M 硬體介紹.......10
　　     2.1.4  CUDA Kernel...........................10
        2.1.5  CUDA Runtime API......................12
        2.1.6  __syncthread()函式......................13
    2.2 　  CUDA平行化方法 .............................13
第三章  Jacobi-Davidson Method平行化....................17
    3.1　  Jacobi-Davidsons Method簡介.................17
    3.2　  Jacobi-Davidsons Method.....................18
    3.3　  CUDA平行化Jacobi-Davidsons Method............19
第四章　實驗結果.........................................21
    4.1　  實驗環境......................................21
    4.2    實驗方式......................................21
    4.3    實驗問題......................................22
    4.4    矩陣乘向量....................................23
    4.5    向量內積......................................25
    4.6    Jacobi Method................................26
    4.7    Jacobi-Davidsons Method......................27
第五章　結論.............................................29
參考文獻.................................................37









圖目錄
圖1-1 GPU 與 CPU 峰值浮點運算能力比較...................1
圖1-2 GPU 與 CPU 記憶體頻寬比較.........................2
圖1-3大量執行單元........................................3
圖2-1 平行化程式設計模型..................................8
圖2-2 CUDA 記憶體模型...................................9
圖2-3平行化運算過程.....................................15
圖4-1 GPU 與 CPU 矩陣乘向量時間折線圖..................24
圖4-2 GPU與 CPU 向量內積時間折線圖.....................25
圖4-3 GPU與 CPU Jacobi Method 時間折線圖.................26
圖4-4 GPU與 CPU Jacobi-Davidsons Method 時間折線圖.......27








表目錄
表2-1優化前後之 GPU 和 CPU 時間比較表...................16
表4-1  CPU 與 GPU 架構簡介.............................21
表4-2  特徵值結果........................................23
表4-3  CPU 與 GPU 矩陣乘向量效能之比較.................24
表4-4  CPU 與 GPU 向量內積效能之比較...................25
表4-5  CPU 與 GPU Jacobi Method 效能之比較..............26
表4-6  CPU 與 GPU Jacobi-Davidsons Method 效能之較.......28

                                

參考文獻
[1] 張舒,GPU高效能運算之CUDA,2-10(2011)
[2] 薛熙于,還在用圖形顯示卡打電動嗎?當超級電腦遇上圖形顯示卡,物理雙月刊,34 172-174(2012)
[3] 黃耘,利用CUDA平行計算平台探討可壓縮留在三維煙囪管道的熱傳導分析,1-8(2010)
[4] Weichung Wang,Performance models and workload distribution algorithms for optimizing a hybrid CPU–GPU multifrontal solver, Computers & Mathematics with Applications,67 1421-1437(2014)
[5] Gerard L. G. Sleijpen、Henk A. Van der Vorst,A Jacobi-Davidson Lteration Method for Linear Eigenvalue Problems,SIAM REVIEW,42,2 267-293(2000)
[6] Weichung Wang、Tsung-Min Hwang、Wen-Wei Lin、Jinn-Liang Liu,Numerical methods for semiconductor heterostuctures with band nonparabolicity,Journal of Computational Physics,190 141-158(2003)
[7] SLEIJPEN,G.L.G.，AND VANDER VORST,H.A.A Jacobi-Davidson iteration method for linear eigenvalue problems.SIAM J.Matrix Anal.Appl.17,2(1996),401-25.
[8] DAVIDSON,E.R. The iterative calculation of a few of the lowest eigenvalues and corresponding eigenvectors of large real-symmetric matrices.J.Comput.Phys.17(1975),87-94.
[9] http://www.top500.org/
[10] Weichung Wang,A CPU-GPU hybrid approach for the unsymmetric multifrontal method,Parallel Computing,37,759-770(2011)
[11] Wei-Kang Cheng,GPU-Based Acceleration of Ray-tracing Alorithm and It's Applications on Medical Imaging ,1-3(2010)
[12] PETER ARBENZ、MICHIEL E. HOCHSTENBACH, A JACOBI–DAVIDSON METHOD FOR SOLVING COMPLEX SYMMETRIC EIGENVALUE PROBLEMS, SIAM J. SCI. COMPUT,25,1655-1673(2004)
[13] Jinn-Liang Liu、Jen-Hao Chen、O. Voskoboynikov, A model for semiconductor quantum dot molecule based on the current spin density functional theory,Computer Physics Communications,175, 575–582 (2006)
[14] https://software.intel.com/en-us/intel-mkl

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文