簡易檢索 / 詳目顯示

研究生: 史曉磊
Shi, Xiaolei
論文名稱: 應用GPU叢集與顯式多重網格方法求解不可壓縮流動問題
An explicit multigrid scheme using artificial compressibility method for the simulation of unsteady incompressible flows on multi-GPU cluster
指導教授: 林昭安
Lin, Chao-An
口試委員: 牛仰堯
Niu, Yang-Yao
吳宗信
Wu, Jong-Shinn
陳慶耀
Chen, Ching-Yao
學位類別: 碩士
Master
系所名稱: 工學院 - 動力機械工程學系
Department of Power Mechanical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 41
中文關鍵詞: 人工壓縮方法多重網格法多GPU不可壓縮流動
外文關鍵詞: artificial compressibility method, multigrid method, multi-GPU, incompressibility flow
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 基於人工可壓縮性方法,本研究提出了壹種在多GPU集群上工作的非穩態不可壓縮Navier-Stokes方程的顯式求解器。通過計算蓋驅動立方腔(LDC)內的三維(3D)層流和湍流流動,該求解器的數值精度得到了充分驗證。數值預測結果與基準解和實驗結果結果相比,不論是平均量和湍流統計平均量都十分吻合。對於雷諾數3200的LDC湍流流動,本研究對Taylor-Gortler-Like(TGL)渦旋進行了詳細分析,此算例中下遊的二次角渦(DSE)和TGL渦之間存在復雜的相互作用。
    該求解器通過顯式FAS多重網格方案進行加速計算。數值測試結果表明與單層網格相比,7層多重網格可以提供高達250倍加速。此外,本研究討論了網格數量以及多重網格層數對FMG-FAS方法的性能影響。對於網格分辨率為256x256x256,采用4層多重網格的算例,GPU版本程序相對於串行的CPU程序加速值達到141倍。
    此外,該研究中設計了細粒度重疊執行和通訊隱藏的代碼優化策略來輔助多GPU計算,weak scaling 的測試結果表明該GPU求解器在采用4層多重網格、32塊GPU的情況具有75%的並行效率。


    An explicit solver for unsteady incompressible Navier-Stokes equations working on a multi-GPU cluster is presented, based on the artificial compressibility method. The numerical procedure is validated by computing three-dimensional (3D) laminar and turbulent flows within a lid-driven cubic cavity. The predicted results compare favorably with previous benchmark solutions and measurements, both in mean and turbulent quantities. For turbulent flow at Reynolds number 3200, the detailed analysis of the Taylor-G$\ddot{o}$rtler-Like (TGL) vortices are conducted, where there exist complex interactions between the downstream secondary corner eddy (DSE) and the TGL vortices. Computation is accelerated by an explicit FAS multigrid scheme. Up to 250 speedups for FAS Lev.7 is reported. Moreover, the influences of the grid number as well as the number of multigrid layers on the performance of FMG-FAS scheme are investigated. Significant GPU accelerations of the computations are achieved over CPU per core by a factor of 141 for FAS Lev.~4 for the resolution of $256^3$. Fine-grained overlapping strategies are designed to assist multi-GPU computations, weak scaling shows about 75\% for FAS Lev. 4 at 32 GPUs.

    Abstract i Dedication ii Acknowledgments iii List of Figures vii List of Tables ix 1 Introduction 1 2 Numerical method 3 2.1 Governing equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Multigrid methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 Inter-grid transfer operation . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3 Multi-GPU implementation 10 3.1 Domain decomposition and communication . . . . . . . . . . . . . . . 10 3.2 Overlapping strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4 Numerical results 16 4.1 Laminar lid-driven cavity ows . . . . . . . . . . . . . . . . . . . . . 16 4.2 Turbulent lid-driven cavity ows . . . . . . . . . . . . . . . . . . . . . 18 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5 Performance 22 5.1 Performance of FMG-FAS . . . . . . . . . . . . . . . . . . . . . . . . 22 5.2 Single node performance . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.3 Multiple node performance . . . . . . . . . . . . . . . . . . . . . . . . 30 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 6 Conclusion 33 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 6.2.1 Inter-node communication . . . . . . . . . . . . . . . . . . . . 34 6.2.2 Multigrid implementation . . . . . . . . . . . . . . . . . . . . 35 Bibiography 37 Research achievement 41

    J.D. Owens, M. Houston, D. Luebke, S. Green, J.E. Stone, J.C. Phillips, GPU Computing, Proceedings of the IEEE, 96 (2008) 879-899.
    X. Wang, T. Aoki, Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster, Parallel Computing, 37 (2011) 521-535.
    D.D.J. Chandar, J. Sitaraman, D.J. Mavriplis, A GPU-based incompressible Navier-Stokes solver on moving overset grids, International Journal of Computational Fluid Dynamics, 27 (2013) 268-282.
    P.Y. Hong, L.M. Huang, L.S. Lin, C.A. Lin, Scalable multi-relaxation-time lattice Boltzmann simulations on multi-GPU cluster, Computers \& Fluids, 110 (2015) 1-8.
    B. Mostafazadeh Davani, F. Marti, B. Pourghassemi, F. Liu, A. Chandramowlishwaran, Unsteady Navier-Stokes computations on GPU architectures, in: 23rd AIAA Computational Fluid Dynamics Conference, American Institute of Aeronautics and Astronautics, 2017.
    X. Zhu, E. Phillips, V. Spandan, J. Donners, G. Ruetsch, J. Romero, R. Ostilla-M?nico, Y. Yang, D. Lohse, R. Verzicco, M. Fatica, R.J.A.M. Stevens, AFiD-GPU: A versatile Navier-Stokes solver for wall-bounded turbulent flows on GPU clusters, Computer Physics Communications, 229 (2018) 199-210.
    M.A. Diaz, M.A. Solovchuk, T.W.H. Sheu, High-performance multi-GPU solver for describing nonlinear acoustic waves in homogeneous thermoviscous media, Computers \& Fluids, 173 (2018) 195-205.
    J. Kim, P. Moin, Application of a fractional-step method to incompressible Navier-Stokes equations, Journal of Computational Physics, 59 (1985) 308-323.
    D.A. Jacobsen, I. Senocak, Multi-level parallelism for incompressible flow computations on GPU clusters, Parallel Computing, 39 (2013) 1-20.
    R. DeLeon, D. Jacobsen, I. Senocak, Large-Eddy Simulations of Turbulent Incompressible Flows on GPU Clusters, Computing in Science \& Engineering, 15 (2013) 26-33.
    G. Oyarzun, R. Borrell, A. Gorobets, A. Oliva, MPI-CUDA sparse matrix-vector multiplication for the conjugate gradient method with an approximate inverse preconditioner, Computers \& Fluids, 92 (2014) 244-252.
    P. Zaspel, M. Griebel, Solving incompressible two-phase flows on multi-GPU clusters, Computers \& Fluids, 80 (2013) 356-364.
    A.V. Gorobets, F.X. Trias, A. Oliva, A parallel MPI+OpenMP+OpenCL algorithm for hybrid supercomputations of incompressible flows, Computers \& Fluids, 88 (2013) 764-772.
    A.J. Chorin, A numerical method for solving incompressible viscous flow problems, Journal of Computational Physics, 135 (1997) 118-125.
    A. Gilmanov, F. Sotiropoulos, A hybrid Cartesian/immersed boundary method for simulating flows with 3D, geometrically complex, moving bodies, Journal of Computational Physics, 207 (2005) 457-492.
    S.E. Rogers, D. Kwak, C. Kiris, Steady and unsteady solutions of the incompressible Navier-Stokes equations, AIAA Journal, 29 (1991) 603-610.
    P. Louda, K. Kozel, J. Prihoda, Numerical solution of 2D and 3D viscous incompressible steady and unsteady flows using artificial compressibility method, International Journal for Numerical Methods in Fluids, 56 (2008) 1399-1407.
    A. Brandt, Multilevel adaptive computations in fluid dynamics, AIAA Journal, 18 (1980) 1165-1172.
    A. Brandt, O.E. Livne, Multigrid techniques: 1984 guide with applications to fluid dynamics, revised edition, Society for Industrial and Applied Mathematics, 2011.
    P. Wesseling, An introduction to multigrid methods, John Wiley \& Sons Australia, Limited, 1992.
    C. Liu, X. Zheng, C.H. Sung, Preconditioned multigrid methods for unsteady incompressible flows, Journal of Computational Physics, 139 (1998) 35-57.
    H.-W. Hsu, F.-N. Hwang, Z.-H. Wei, S.-H. Lai, C.-A. Lin, A parallel multilevel preconditioned iterative pressure Poisson solver for the large-eddy simulation of turbulent flow inside a duct, Computers and Fluids, 45 (2011) 138-146.
    M. Darwish, I. Sraj, F. Moukalled, A coupled finite volume solver for the solution of incompressible flows on unstructured grids, Journal of Computational Physics, 228 (2009) 180-201.
    D. Drikakis, O.P. Iliev, D.P. Vassileva, A nonlinear multigrid method for the three-dimensional incompressible Navier-Stokes equations, Journal of Computational Physics, 146 (1998) 301-321.
    A. Brandt, Multi-level adaptive solutions to boundary-value problems, Mathematics of Computation, 31 (1977) 333-390.
    W.Y. Soh, J.W. Goodrich, Unsteady solution of incompressible Navier-Stokes equations, Journal of Computational Physics, 79 (1988) 113-134.
    R. Courant, K. Friedrichs, H. Lewy, On the partial difference equations of mathematical physics, IBM Journal of Research and Development, 11 (1967) 215-234.
    S.P. Vanka, A.F. Shinn, K.C. Sahu, Computational fluid dynamics using graphics processing units: challenges and opportunities, (2011) 429-437.
    G.M. Johnson, Multiple-grid convergence acceleration of viscous and inviscid flow computations, Applied Mathematics and Computation, 13 (1983) 375-398.
    NVIDIA C., NVIDIA Tesla V100 GPU architecture, NVIDIA Corporation, 2017.
    NVIDIA C., Developing a Linux kernel module using RDMA for GPUdirect: Application guide, NVIDIA Corporation, 2018.
    Y.J. Lo, S. Williams, B. Van Straalen, T.J. Ligocki, M.J. Cordery, N.J. Wright, M.W. Hall, L. Oliker, Roofline Model Toolkit: A practical tool for architectural and program analysis, in, Springer International Publishing, Cham, (2015) 129-148.
    G. Ofenbeck, R. Steinmann, V. Caparros, D.G. Spampinato, M. P?schel, Applying the roofline model, in: 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), (2014) 76-85.
    H.C. Ku, R.S. Hirsh, T.D. Taylor, A pseudospectral method for solution of the three-dimensional incompressible Navier-Stokes equations, Journal of Computational Physics, 70 (1987) 439-462.
    A.K. Prasad, J.R. Koseff, Reynolds number and end-wall effects on a lid-driven cavity flow, Physics of Fluids A: Fluid Dynamics, 1 (1989) 208-218.
    A.K. Prasad, C.-Y. Perng, J.R. Koseff, Some observations on the influence of longitudinal vortices in a lid-driven cavity flow, in: AIAA/ASME/SIAM/APS 1st National Fluid Dynamics Congress, Cincinnati, Ohio, (1988) 288-295.
    J.R. Koseff, R.L. Street, P.M. Gresho, C.D. Upson, J.A.C. Humphrey, W.M. To, Three-dimensional lid-driven cavity flow: experiment and simulation, in, United States, 1983.
    H. Abe, H. Kawamura, Y. Matsuo, Direct numerical simulation of a fully developed turbulent channel flow with respect to the Reynolds number dependence, J. Fluids Eng. 123 (2001) 382-393.
    B.E. Owolabi, C.A. Lin, Marginally turbulent Couette flow in a spanwise confined passage of square cross, Physics of Fluids, 30 (2018) 075102.
    D.S. Kumar, A.K. Dass, A. Dewan, A multigrid-accelerated code on graded cartesian meshes for 2D time-dependent incompressible viscous flows, Engineering Applications of Computational Fluid Mechanics, 4 (2010) 71-90.

    QR CODE