簡易檢索 / 詳目顯示

研究生: 黃柏瑀
Huang, Bo-Yu
論文名稱: qCUDA-ARM:嵌入式 GPU 架構虛擬化解決方案
qCUDA-ARM: Virtualization for Embedded GPU Architectures
指導教授: 李哲榮
Lee, Che-Rung
口試委員: 林郁翔
Lin, Yu-Shiang
鍾武君
Chung, Wu-Chun
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 42
中文關鍵詞: 虛擬化圖形處理器圖形處理器通用計算嵌入式系統
外文關鍵詞: Virtualization, GPU, GPGPU, Embedded Architecture
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 智慧物件的湧現正全面地改變計算資源的取得方式,從傳統集中
    式的雲端資料中心到現今分散式邊緣運算。為了面對智慧物件及其應
    用所帶來的多樣性問題,異質運算和虛擬化技術成為邊緣節點(edge
    node)架構設計研究中兩股主要的研究趨勢。
    在此篇論文中,我們同時將兩股研究趨勢納入考量,並提出針對
    嵌入式圖形處理器架構的虛擬化系統,qCUDA-ARM。由於 x86 架構和
    ARM 架構本質上的差異,qCUDA-ARM 重新設計了包含記憶體管理的子
    系統。透過實驗我們可以得知,對於計算密集計算密集的 GPGPU 應用,
    qCUDA-ARM 可以達到和實體機器相同的效能;而對於記憶體密集的應
    用,qCUDA-ARM 可以達到近乎實體機器九成的效能。


    The emergence of Internet of Things (IOT) is changing the ways of computing
    resources acquisition, from centralized cloud data centers to distributed pervasive
    edge nodes. To cope the small amount of diversity problem for IOT devices and
    applications, two research trends are investigated for the system design of edge
    nodes: heterogeneity and virtualization. In this thesis, we consider the integration of
    those two important trends and present a virtualization system for embedded GPU
    architectures, called qCUDA-ARM. The design of qCUDA-ARM is based on the
    framework of qCUDA, a virtualization system for x86 servers. Because of the architectural differences between x86 servers and ARM based embedded systems, many
    subsystems of qCUDA-ARM, such as memory management, need to be redesigned.
    We evaluated the performance of qCUDA-ARM with three CUDA benchmarks and
    two real world applications. For computational intensive jobs, qCUDA-ARM can
    reach similar performance of the native system; and for memory bound programs,
    qCUDA-ARM can also have upto 90% performance of that of the native one.

    1 Introduction 1 2 Background 4 2.1 Virtualization 4 2.2 GPGPU Virtualization 5 2.2.1 Direct passthrough 5 2.2.2 Mediated passthrough 6 2.2.3 Full virtualization 7 2.2.4 API remoting 7 3 Design and Implementation 9 3.1 qCUDA System Architecture 9 3.2 Memory Allocation in qCUDA-ARM 12 3.3 Kernel Modifications 16 4 Experiments 18 4.1 Benchmarks 19 4.1.1 Memory Bandwidth 19 4.1.2 Matrix Multiplication 22 iii 4.1.3 Vector Addition 23 4.2 Scalability 27 4.2.1 Memory bandwidth 27 4.2.2 Matrix Multiplication 27 4.2.3 Vector Addition 27 4.3 Real Applications 32 4.3.1 Sobel Edge Detection 32 4.3.2 Cryptocurrency Miner 33 5 Conclusion 39 bibliography 40

    [1] Cuda toolkit document 5.9 memory management. https://docs.nvidia.com/
    cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART_
    _MEMORY_1ge8d5c17670f16ac4fc8fcb4181cb490c.
    [2] Mediatek helio. https://en.wikichip.org/wiki/mediatek/helio.
    [3] Programming guide :: Cuda toolkit documentation. https://docs.nvidia.
    com/cuda/cuda-c-programming-guide/index.html/.
    [4] Tanya Amert, Nathan Otterness, Ming Yang, James H Anderson, and F Donelson Smith. Gpu scheduling on the nvidia tx2: Hidden details revealed. In 2017
    IEEE Real-Time Systems Symposium (RTSS), pages 104–115. IEEE, 2017.
    [5] Mathias Gottschlag ; Marius Hillenbrand ; Jens Kehne ; Jan Stoess ; Frank
    Bellosa. Logv: Low-overhead gpgpu virtualization. 2013 IEEE 10th International Conference on High Performance Computing and Communications 2013
    IEEE International Conference on Embedded and Ubiquitous Computing, pages
    1721–1726, 2013.
    [6] A. Celesti, D. Mulfari, M. Fazio, M. Villari, and A. Puliafito. Exploring container virtualization in iot clouds. pages 1–6, 2016.
    [7] Jin Tack limitation Jason Nieh Christoffer Dall, Shih-Wei limitation and Georgios Koloventzos. Kvm/arm: The design and implementation of the linux arm
    hypervisor. Proceedings of the 43rd International Symposium on Computer Architecture, pages 304–316, 2016.
    [8] Giulio GiuntaRaffaele MontellaGiuseppe AgrilloGiuseppe Coviello. A gpgpu
    transparent virtualization component for high performance computing clouds.
    Euro-Par 2010-Parallel Processing, pages 379–391, 2010.
    [9] J. Duato, A. J. Pea, F. Silla, R. Mayo, and E. S. Quintana-Ort. rcuda: Reducing
    the number of gpu-based accelerators in high performance clusters. pages 224–
    231, 2010.
    [10] Chuanxiong Guo, Guohan Lu, Dan Li, Haitao Wu, Xuan Zhang, Yunfeng Shi,
    Chen Tian, Yongguang Zhang, and Songwu Lu. Bcube: a high performance,
    server-centric network architecture for modular data centers. Proceedings of the
    ACM SIGCOMM 2009 Conference on Data Communication, 2009.
    [11] Che-Rung Lee Hong-Cyuan Hsu. G-kvm: A full gpu virtualization on kvm.
    2016 IEEE International Conference on Computer and Information Technology,
    pages 545–552, 2016.
    [12] Mythili Suryanarayana Prabhu Preethi Natarajan Hao Hu Flavio Bonomi
    Jiang Zhu, Douglas S. Chan. Improving web sites performance using edge
    servers in fog computing architecture. 2013 IEEE Seventh International Symposium on Service-Oriented System Engineering, pages 320–323, 2013.
    [13] Richard W.M. Jones. Optimizing QEMU boot time.
    [14] journalFlavio Bonomi; Rodolfo Milito; Jiang Zhu; Sateesh Addepalli. Fog computing and its role in the internet of things. 2012 first edition of the MCC
    workshop on Mobile cloud computing, pages 13–16, 2012.
    [15] journalRusty Russell. virtio: towards a de-facto standard for virtual i/o devices.
    ACM SIGOPS Operating Systems Review - Research and developments in the
    Linux kernel, pages 95 – 103, 2008.
    [16] Shinpei Kato, Michael McThrow, Carlos Maltzahn, and Scott Brandt. Gdev:
    First-class gpu resource management in the operating system. In Proceedings
    of the 2012 USENIX Conference on Annual Technical Conference, USENIX
    ATC’12, pages 37–37, Berkeley, CA, USA, 2012. USENIX Association.
    [17] Yaozu Dong Kun Tian and David Cowperthwaite. A full gpu virtualization
    solution with mediated pass-through. USENIX ATC’14 Proceedings of the 2014
    USENIX conference on USENIX Annual Technical Conference, pages 121 – 132,2014
    [18] Y. Li L. Tong and W. Gao. A hierarchical edge cloud architecture for mobile
    computing. The 35th Annual IEEE International Conference on Computer
    Communications, pages 1–9, 2016.
    [19] R. Morabito, J. Kjllman, and M. Komu. Hypervisors vs. lightweight virtualization: A performance comparison. In 2015 IEEE International Conference on
    Cloud Engineering, pages 386–393, March 2015.
    [20] A. Nomura P. Markthub and S. Matsuoka. mrcuda: Low-overhead middleware
    for transparently migrating cuda execution from remote to local gpus. presented
    at the SC15. Conf, 2015.
    [21] Jayavardhana Gubbi; Rajkumar Buyya; Slaven Marusic; Marimuthu
    Palaniswami. Internet of things (iot): A vision, architectural elements, and
    future directions. 2013 Future Generation Computer Systems, 29:1645–1660,
    2013.
    [22] L. Shi, H. Chen, J. Sun, and K. Li. vcuda: Gpu-accelerated high-performance
    computing in virtual machines. IEEE Transactions on Computers, 61(6):804–
    816, 2012.
    [23] Ashley Stevens. Introduction to amba
    R 4 ace and big. little processing technology. ARM White Paper, CoreLink Intelligent System IP by ARM, 2011.
    [24] Hiroshi Yamada Yusuke Suzuki, Shinpei Kato and Kenji Kono. Gpuvm: Why
    not virtualizing gpus at the hypervisor? 2014 USENIX Annual Technical Conference (USENIX ATC 14), page 109120, 2014.
    [25] Hiroshi Yamada Yusuke Suzuki, Shinpei Kato and Kenji Kono. Gpuvm: Gpu
    virtualization at the hypervisor. IEEE Transactions on Computers, 65:2752 –
    2766, 2015.

    QR CODE