qCUDA-ARM：嵌入式 GPU 架構虛擬化解決方案｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃柏瑀 Huang, Bo-Yu
論文名稱：	qCUDA-ARM：嵌入式 GPU 架構虛擬化解決方案 qCUDA-ARM: Virtualization for Embedded GPU Architectures
指導教授：	李哲榮 Lee, Che-Rung
口試委員:	林郁翔 Lin, Yu-Shiang 鍾武君 Chung, Wu-Chun
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	42
中文關鍵詞：	虛擬化、圖形處理器、圖形處理器通用計算、嵌入式系統
外文關鍵詞：	Virtualization, GPU, GPGPU, Embedded Architecture
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

智慧物件的湧現正全面地改變計算資源的取得方式，從傳統集中
式的雲端資料中心到現今分散式邊緣運算。為了面對智慧物件及其應
用所帶來的多樣性問題，異質運算和虛擬化技術成為邊緣節點(edge
node)架構設計研究中兩股主要的研究趨勢。
在此篇論文中，我們同時將兩股研究趨勢納入考量，並提出針對
嵌入式圖形處理器架構的虛擬化系統，qCUDA-ARM。由於 x86 架構和
ARM 架構本質上的差異，qCUDA-ARM 重新設計了包含記憶體管理的子
系統。透過實驗我們可以得知，對於計算密集計算密集的 GPGPU 應用，
qCUDA-ARM 可以達到和實體機器相同的效能;而對於記憶體密集的應
用，qCUDA-ARM 可以達到近乎實體機器九成的效能。

The emergence of Internet of Things (IOT) is changing the ways of computing
resources acquisition, from centralized cloud data centers to distributed pervasive
edge nodes. To cope the small amount of diversity problem for IOT devices and
applications, two research trends are investigated for the system design of edge
nodes: heterogeneity and virtualization. In this thesis, we consider the integration of
those two important trends and present a virtualization system for embedded GPU
architectures, called qCUDA-ARM. The design of qCUDA-ARM is based on the
framework of qCUDA, a virtualization system for x86 servers. Because of the architectural differences between x86 servers and ARM based embedded systems, many
subsystems of qCUDA-ARM, such as memory management, need to be redesigned.
We evaluated the performance of qCUDA-ARM with three CUDA benchmarks and
two real world applications. For computational intensive jobs, qCUDA-ARM can
reach similar performance of the native system; and for memory bound programs,
qCUDA-ARM can also have upto 90% performance of that of the native one.

Introduction 1
Background 4
1 Virtualization 4
2 GPGPU Virtualization  5
2.1 Direct passthrough  5
2.2 Mediated passthrough  6
2.3 Full virtualization  7
2.4 API remoting  7
Design and Implementation 9
1 qCUDA System Architecture  9
2 Memory Allocation in qCUDA-ARM  12
3 Kernel Modifications  16
Experiments 18
1 Benchmarks  19
1.1 Memory Bandwidth  19
1.2 Matrix Multiplication  22
iii
1.3 Vector Addition  23
2 Scalability  27
2.1 Memory bandwidth  27
2.2 Matrix Multiplication  27
2.3 Vector Addition  27
3 Real Applications  32
3.1 Sobel Edge Detection 32
3.2 Cryptocurrency Miner  33
Conclusion 39
bibliography  40
                                

[1] Cuda toolkit document 5.9 memory management. https://docs.nvidia.com/
cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART_
_MEMORY_1ge8d5c17670f16ac4fc8fcb4181cb490c.
[2] Mediatek helio. https://en.wikichip.org/wiki/mediatek/helio.
[3] Programming guide :: Cuda toolkit documentation. https://docs.nvidia.
com/cuda/cuda-c-programming-guide/index.html/.
[4] Tanya Amert, Nathan Otterness, Ming Yang, James H Anderson, and F Donelson Smith. Gpu scheduling on the nvidia tx2: Hidden details revealed. In 2017
IEEE Real-Time Systems Symposium (RTSS), pages 104–115. IEEE, 2017.
[5] Mathias Gottschlag ; Marius Hillenbrand ; Jens Kehne ; Jan Stoess ; Frank
Bellosa. Logv: Low-overhead gpgpu virtualization. 2013 IEEE 10th International Conference on High Performance Computing and Communications 2013
IEEE International Conference on Embedded and Ubiquitous Computing, pages
1721–1726, 2013.
[6] A. Celesti, D. Mulfari, M. Fazio, M. Villari, and A. Puliafito. Exploring container virtualization in iot clouds. pages 1–6, 2016.
[7] Jin Tack limitation Jason Nieh Christoffer Dall, Shih-Wei limitation and Georgios Koloventzos. Kvm/arm: The design and implementation of the linux arm
hypervisor. Proceedings of the 43rd International Symposium on Computer Architecture, pages 304–316, 2016.
[8] Giulio GiuntaRaffaele MontellaGiuseppe AgrilloGiuseppe Coviello. A gpgpu
transparent virtualization component for high performance computing clouds.
Euro-Par 2010-Parallel Processing, pages 379–391, 2010.
[9] J. Duato, A. J. Pea, F. Silla, R. Mayo, and E. S. Quintana-Ort. rcuda: Reducing
the number of gpu-based accelerators in high performance clusters. pages 224–
231, 2010.
[10] Chuanxiong Guo, Guohan Lu, Dan Li, Haitao Wu, Xuan Zhang, Yunfeng Shi,
Chen Tian, Yongguang Zhang, and Songwu Lu. Bcube: a high performance,
server-centric network architecture for modular data centers. Proceedings of the
ACM SIGCOMM 2009 Conference on Data Communication, 2009.
[11] Che-Rung Lee Hong-Cyuan Hsu. G-kvm: A full gpu virtualization on kvm.
2016 IEEE International Conference on Computer and Information Technology,
pages 545–552, 2016.
[12] Mythili Suryanarayana Prabhu Preethi Natarajan Hao Hu Flavio Bonomi
Jiang Zhu, Douglas S. Chan. Improving web sites performance using edge
servers in fog computing architecture. 2013 IEEE Seventh International Symposium on Service-Oriented System Engineering, pages 320–323, 2013.
[13] Richard W.M. Jones. Optimizing QEMU boot time.
[14] journalFlavio Bonomi; Rodolfo Milito; Jiang Zhu; Sateesh Addepalli. Fog computing and its role in the internet of things. 2012 first edition of the MCC
workshop on Mobile cloud computing, pages 13–16, 2012.
[15] journalRusty Russell. virtio: towards a de-facto standard for virtual i/o devices.
ACM SIGOPS Operating Systems Review - Research and developments in the
Linux kernel, pages 95 – 103, 2008.
[16] Shinpei Kato, Michael McThrow, Carlos Maltzahn, and Scott Brandt. Gdev:
First-class gpu resource management in the operating system. In Proceedings
of the 2012 USENIX Conference on Annual Technical Conference, USENIX
ATC’12, pages 37–37, Berkeley, CA, USA, 2012. USENIX Association.
[17] Yaozu Dong Kun Tian and David Cowperthwaite. A full gpu virtualization
solution with mediated pass-through. USENIX ATC’14 Proceedings of the 2014
USENIX conference on USENIX Annual Technical Conference, pages 121 – 132,2014
[18] Y. Li L. Tong and W. Gao. A hierarchical edge cloud architecture for mobile
computing. The 35th Annual IEEE International Conference on Computer
Communications, pages 1–9, 2016.
[19] R. Morabito, J. Kjllman, and M. Komu. Hypervisors vs. lightweight virtualization: A performance comparison. In 2015 IEEE International Conference on
Cloud Engineering, pages 386–393, March 2015.
[20] A. Nomura P. Markthub and S. Matsuoka. mrcuda: Low-overhead middleware
for transparently migrating cuda execution from remote to local gpus. presented
at the SC15. Conf, 2015.
[21] Jayavardhana Gubbi; Rajkumar Buyya; Slaven Marusic; Marimuthu
Palaniswami. Internet of things (iot): A vision, architectural elements, and
future directions. 2013 Future Generation Computer Systems, 29:1645–1660,
2013.
[22] L. Shi, H. Chen, J. Sun, and K. Li. vcuda: Gpu-accelerated high-performance
computing in virtual machines. IEEE Transactions on Computers, 61(6):804–
816, 2012.
[23] Ashley Stevens. Introduction to amba
R 4 ace and big. little processing technology. ARM White Paper, CoreLink Intelligent System IP by ARM, 2011.
[24] Hiroshi Yamada Yusuke Suzuki, Shinpei Kato and Kenji Kono. Gpuvm: Why
not virtualizing gpus at the hypervisor? 2014 USENIX Annual Technical Conference (USENIX ATC 14), page 109120, 2014.
[25] Hiroshi Yamada Yusuke Suzuki, Shinpei Kato and Kenji Kono. Gpuvm: Gpu
virtualization at the hypervisor. IEEE Transactions on Computers, 65:2752 –
2766, 2015.

簡易檢索 / 詳目顯示

相關論文