研究生: |
路易斯 Luis Herrera |
---|---|
論文名稱: |
基於KVM、支援Windows的GPGPU虛擬化技術 A KVM-Based GPGPU Virtualization Technique for Windows |
指導教授: |
鍾葉青
Chung, Yeh Ching |
口試委員: |
金仲達
King, Chung Ta 徐慰中 Hsu, Wei Chung 洪士灝 Hung, Shih Hao |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 英文 |
論文頁數: | 51 |
中文關鍵詞: | CUDA 、GPGPU 、高性能運算 |
外文關鍵詞: | CUDA, GPGPU virtualization, High Performance Computing |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
虛擬化技術藉由抽象化底層硬體以及增進跨應用程式間的硬體使用,加速了資源的共享。近年來,一般用途的圖形處理器 (GPGPU) 在高性能運算領域變成相當關鍵,然而缺乏開源的 GPU 程式模組以及 GPU 本身的架構設計使得在虛擬化的環境下使用一般用途的圖形處理器有相當大的挑戰。
這篇研究的目的是增進在虛擬機上使用 CUDA 做高性能運算應用程式的效能,我們使用QEMU-KVM作為我們的虛擬機監測器,Windows 8.1作為我們虛擬機上的作業系統,建置一個共享的中介函式庫在虛擬機上的作業系統裡,負責合法地轉換 CUDA 的函式呼叫,其會將函式呼叫轉至中介的驅動程式,然後中介的驅動程式在初始化、記憶體配置和驗證後會將此需求送給在虛擬機內的虛擬設備,在虛擬機監測器內的虛擬設備會從佇列取出需求,然後使用適當的 CUDA API 來處理相對應的需求。一旦設備完成需求,結果會被送回到虛擬機上的中介驅動程式。最後,假如 CUDA 有被安裝在虛擬機上,結果會成功地送到使用者應用程式上。
結果說明當虛擬機上的作業系統使用2MB的大分頁,能減少TLB的轉換和減少分頁的虛擬記憶體與實體記憶體間的轉換消耗,因此有顯著的效能提升,且對於分頁鎖定的記憶體能有接近實體機的表現
Virtualization technology facilitates the sharing of resources by abstracting the underlying hardware and improving utilization across applications. In recent years, General Purpose Graphics Processing Units (GPGPUs) have become critical in high performance computing (HPC). Yet, the lack of open source programming models for GPUs and their architectural design pose significant challenges in using them in virtualized environments.
The purpose of this study is to enhance the performance of HPC applications using the Compute Unified Device Architecture (CUDA) executing on the Guest OS. We used Qemu-KVM as the Virtual Machine Monitor (VMM) and Windows 8.1 as the Guest OS. A shared imposter library was created in the guest OS, which intercepts legitimate CUDA function calls and subsequently relays the request to the guest OS virtual device driver. The imposter driver then performs initializations, memory allocation and validation and then sends the packaged request to the virtual device. The virtual device in the VMM dequeues the request and, using the legitimate CUDA driver API, executes the request. When the virtual device completes the request, the results are channeled to the guest OS imposter driver. Ultimately, the results are presented to the user application as if CUDA were installed in the guest VM.
Results illustrate near host performance for page locked memory and substantial improvement of pageable memory as a result of reduced TLB translation and reducing translation cost of pages’ virtual address to physical address and vice-versa when using large pages of 2MB on guest OS.
[1] Y. E. Mahoti et al, "Resource Scheduling in Virtual Environment of Cloud Computing," in Mediterranean Conference on Information & Communication Technologies, 2015.
[2] "NVIDIA," NVIDIA, June 2016. [Online]. [Accessed June 2016].
[3] C.-T. Yang, H.-Y. Wang, W.-S. Ou, Y.-T. Liu and C.-H. Hsu, "On Implementation of GPU Virtualization Using PCI Pass-Through," in IEEE 4th International Conference on Cloud Computing Technology and Science, 2012.
[4] M. Oikawa et al., "DS-CUDA: a middleware to use many GPUs in the cloud environment," in SC, 2012.
[5] C. Reaño, A. J. Peña, F. Silla, J. Duato, R. Mayo and E. S. Quintana-Orti, "CU2rCU: towards the Complete rCUDA Remote," 2012.
[6] V. A. Smirnov, E. V. Korolev and O. I. Poddaeva, "Cloud Environments with GPU Virtualization: Problems and Solutions," in International Conference on Data Mining, Electronics and Information Technology, 2015.
[7] L. Shi et al., "vCUDA: GPU accelerated high performance computing in virtual machines," in IPDPS, 2009.
[8] G. Coviello, "GVirtus," Oct 2010. [Online]. Available: https://github.com/Dredok/gvirtus.gvirtus. [Accessed May 2016].
[9] J. Duato, A. J. Pena, F. Silla, R. Mayo and E. S. Quintana-Orti, "rCUDA: Reducing the Number of GPU-Based Accelerators in High Performance Clusters," 2010.
[10] C. Reaño, F. Silla, A. J. Peña, G. Shainer, S. Schultz, A. Castello, E. S. Quintana-Orti and J. Duato, "Boosting the performance of remote GPU virtualization using InfiniBand Connect-IB and PCIe 3.0," in International Conference on Cluster Computing, Madrid, 2014.
[11] C. Reaño, E.S. Quintana-Orti, F.Silla, J. Duato and A.J. Peña, "Influence of InfiniBand FDR on the performance of remote GPU virtualization," in IEEE International Conference on Cluster Computing, Indianapolonis, 2013.
[12] M. Gottschlag, M. Hillenbrand, J. Kehne, J. Stoess and F. Bellosa, "LoGV: Low-overhead GPGPU Virtualization," in IEEE International Conference on High Performance Computing and Communications, 2013.
[13] V. Gupta et al., "GViM: GPU accelerated virtual machines," in HPCVirt, 2009.
[14] J. R. Santos, Y. Turney and J. Mudigonda, "Taming heterogeneous nic capabilities for i/o virtualization," in Workshop on I/O Virtualization, 2008.
[15] J. R. Santos, Y. Turney, G. J. Yanakiraman and I. Pratt, "Bridging the gap between software and hardware techniques for i/o virtualization.," in USENIX Annual Techincal Conference, 2008.
[16] MicroSoft, "Large-Page Support," MicroSoft, 2016. [Online]. Available: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366720%28v=vs.85%29.aspx. [Accessed 2016 June].
[17] J. Corbet, "An introduction to compound pages," LWN, 11 Nov 2014. [Online]. Available: https://lwn.net/Articles/619514/. [Accessed June 2016].
[18] J. Corbet, "Transparent huge pages," 19 Jan 2011. [Online]. Available: https://lwn.net/Articles/423584/. [Accessed June 2016].
[19] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach 5th Edition, Morgan Kaufmann, 2012.
[20] A. G. Bromley, "Memory fragmentation in buddy methods for dynamic storage allocation," SC, 2006.
[21] T. Li, V. K. Narayana, E. El-Araby and . T. El-Ghazawi, "GPU Resource Sharing and Virtualization on High Performance Computing Systems," in International Conference on Parallel Processing, 2011.
[22] NVIDIA, NVIDIA CUDA Samples v7.5, 2015.
[23] NVIDIA, NVIDIA CUDA API Reference Manual v7.5, 2015.
[24] Y. E. Mahoti, "Resource Scheduling in Virtual Environment of Cloud Computing," Mediterranean Conference on Information & Communication Technologies, 2015.
[25] NVIDIA, "CUDA C Programming Guide 7.5," NVIDIA, Sep 2015. [Online]. Available: http://docs.nvidia.com/cuda/cuda-c-programming-guide. [Accessed June 2016].
[26] Zillians, Inc, "V-GPU: GPU virtualization," April 2014. [Online]. Available: http:// www.zillians.com/vgpu. [Accessed June 2016].
[27] K. Zhang et al., "Design and verfication of heterogeneous streaming parallel mechanism on kepler CUDA," in IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable Autonomic and Secure Compting; Pervasive Intelligence and Computing, 2010.
[28] Group, Khronos OpenCL Working, OpenCL 1.2 Specification, 2016.
[29] KVM, "Kernel Virtual Machine," KVM, 2016. [Online]. Available: http://www.linux-kvm.org/. [Accessed June 2016].