簡易檢索 / 詳目顯示

研究生: 白信佑
Pai, Hsin-Yu
論文名稱: 實現容器間的GPU多租戶解決方法提升雲平台之資源使用率
Enabling multi-tenant GPU for Maximizing resource utilization in containerized cloud environment
指導教授: 周志遠
Chou, Jerry
口試委員: 李哲榮
Lee, Che-Rung
賴冠州
Lai, Kuan-Chou
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 24
中文關鍵詞: 圖形處理器多租戶虛擬化容器排程
外文關鍵詞: Gpu, Multi-tenancy, Virtualization, Container, Scheduling
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在現代計算系統中,GPU和container扮演著很重要的角色。由於GPU強大的平行計算能力,它可以縮短處理大量數據的時間成本。與VM相比,container對資源的需求較少,因此它可以創建更多instance而且開機時間更短。

    然而,應用程序並不總是充分利用GPU,導致問題GPU利用不足的問題。為了提高利用率,必須同時執行多個應用程序。

    但是GPU共享並不容易,因為系統還無法控制GPU資源。為了解決這個問題,我們設計了一個軟件解決方案來管理GPU資源。解決方法包括前端和後端。所有應用程序都在前端。後端負責安排應用程序。如果應用程序啟動GPU功能,它將被前端攔截庫攔截並將請求發送到後端。在後端排程並收到執行信號後,前端應用程序才可以恢復執行。通過這種方式,我們可以管理每個容器的使用。

    除了提高利用率外,我們還實現了公平。請注意,我們在本文中定義的公平性是控制每個用戶在用戶定義的最大和最小利用率之間的GPU時間使用。利用率表示用戶程序在一個時間間隔內可以使用GPU的時間百分比。


    In modern computing system, GPU and containers play an important role. Because of GPU is very powerful of parallel computing, it can shorten the time cost for dealing with big amount of data. Compare to VMs, container has less demand for resources, so it is able to create more instances and has less boot time.

    However applications don't always fully utilize GPU, leads to the problem : GPU under-utilization. In order to increase utilization, it has to make multiple application execute concurrently.

    But it is not easy to share GPU, because the system is not able to control GPU resources yet. In order to solve this problem, we designed a software solution to manage GPU resources. The solution consists of Frontend and Backend. All applications are in Frontend. Backend is responsible for scheduling the applications. If application launches GPU function, it will be intercepted by Frontend intercepting library and send the request to Backend. After Backend scheduling and received the execution signal, Frontend applications could resume its execution. By this way, we are able to manage the usage of each container.

    In addition to increase utilization, we also achieved fairness. Note that fairness we defined in this paper is to control the GPU time usage of each user between user-defined maximum and minimum utilization. The utilization represents the percentage of time that the user program can use GPU in a time interval.

    1 Introduction 1 2 Related Work 4 3 Mythology 6 3.1 Resource Requirement . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1.1 Frontend . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1.2 Backend . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2.1 Intercept function call . . . . . . . . . . . . . . . . . . . . 8 3.2.2 Evaluate the time usage . . . . . . . . . . . . . . . . . . . 8 3.2.3 Communication between Frontend & Backend . . . . . . . 10 4 Scheduling Algorithm 12 4.1 Utilization Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.2 Selection Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.3 Time Quota Assignation . . . . . . . . . . . . . . . . . . . . . . . 14 5 Experimental Setup 15 6 Experimental Evaluation 16 6.1 Tenancy Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6.1.1 Fully utilized by minimum requirements . . . . . . . . . . . 16 6.1.2 Fully utilized by maximum requirements . . . . . . . . . . 17 6.1.3 Under utilized by maximum requirements . . . . . . . . . . 17 6.2 Time Quota Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 18 7 Conclusion 22 References 23

    [1] Chu, H.-h., and Nahrstedt, K. Cpu service classes for multimedia applications.
    In Proceedings IEEE International Conference on Multimedia Computing and
    Systems (1999), vol. 1, IEEE, pp. 296–301.
    [2] Dickman, L., Lindahl, G., Olson, D., Rubin, J., and Broughton, J. Pathscale infinipath:
    A first look. In 13th Symposium on High Performance Interconnects
    (HOTI’05) (2005), IEEE, pp. 163–165.
    [3] Dwarakinath, A. A fair-share scheduler for the graphics processing unit. PhD
    thesis, The Graduate School, Stony Brook University: Stony Brook, NY.,
    2008.
    [4] Gottschlag, M., Hillenbrand, M., Kehne, J., Stoess, J., and Bellosa, F. Logv:
    Low-overhead gpgpu virtualization. In 2013 IEEE 10th International Conference
    on High Performance Computing and Communications & 2013 IEEE
    International Conference on Embedded and Ubiquitous Computing (2013),
    IEEE, pp. 1721–1726.
    [5] Gupta, K., Stuart, J. A., and Owens, J. D. A study of persistent threads style gpu
    programming for gpgpu workloads. In 2012 Innovative Parallel Computing
    (InPar) (2012), IEEE, pp. 1–14.
    [6] Gupta, V., Schwan, K., Tolia, N., Talwar, V., and Ranganathan, P. Pegasus:
    Coordinated scheduling for virtualized accelerator-based systems. In 2011
    USENIX Annual Technical Conference (USENIX ATC’11) (2011), p. 31.
    [7] Kang, D., Jun, T. J., Kim, D., Kim, J., and Kim, D. Convgpu: Gpu management
    middleware in container based virtualized environment. In 2017 IEEE
    International Conference on Cluster Computing (CLUSTER) (2017), IEEE,
    pp. 301–309.
    [8] Kato, S., Lakshmanan, K., Rajkumar, R., and Ishikawa, Y. Timegraph: Gpu
    scheduling for real-time multi-tasking environments. In Proc. USENIX ATC
    (2011), pp. 17–30.
    [9] Kato, S., McThrow, M., Maltzahn, C., and Brandt, S. Gdev: First-class GPU
    resource management in the operating system. In Presented as part of the
    2012 USENIX Annual Technical Conference (USENIX ATC 12) (Boston, MA,
    2012), USENIX, pp. 401–412.
    [10] Kyriazis, G. Heterogeneous system architecture: A technical review. AMD
    Fusion Developer Summit (2012), 21.
    [11] Menychtas, K., Shen, K., and Scott, M. L. Disengaged scheduling for fair,
    protected access to fast computational accelerators. In ACM SIGPLAN Notices
    (2014), vol. 49, ACM, pp. 301–316.
    [12] OrgFoundation, X. Nouveau: Accelerated open source driver for nvidia cards.
    URL https://nouveau. freedesktop. org/wiki (2011).
    [13] Park, J. J. K., Park, Y., and Mahlke, S. Chimera: Collaborative preemption for
    multitasking on a shared gpu. ACM SIGPLAN Notices 50, 4 (2015), 593–606.
    [14] Rossbach, C. J., Currey, J., Silberstein, M., Ray, B., and Witchel, E. Ptask:
    operating system abstractions to manage gpus as compute devices. In Proceedings
    of the Twenty-Third ACM Symposium on Operating Systems Principles
    (2011), ACM, pp. 233–248.
    [15] Suzuki, Y., Kato, S., Yamada, H., and Kono, K. Gpuvm: Why not virtualizing
    gpus at the hypervisor? In Proceedings of the 2014 USENIX Conference on
    USENIX Annual Technical Conference (Berkeley, CA, USA, 2014), USENIX
    ATC’14, USENIX Association, pp. 109–120.
    [16] Suzuki, Y., Yamada, H., Kato, S., and Kono, K. Gloop: An event-driven runtime
    for consolidating gpgpu applications. In SoCC 2017 - Proceedings of the
    2017 Symposium on Cloud Computing (9 2017), Association for Computing
    Machinery, Inc, pp. 80–93.
    [17] Tanasic, I., Gelado, I., Cabezas, J., Ramirez, A., Navarro, N., and Valero,
    M. Enabling preemptive multiprogramming on gpus. In 2014 ACM/IEEE
    41st International Symposium on Computer Architecture (ISCA) (2014), IEEE,
    pp. 193–204.
    [18] Wang, L., Huang, M., and El-Ghazawi, T. Exploiting concurrent kernel execution
    on graphic processing units. In 2011 International Conference on High
    Performance Computing & Simulation (2011), IEEE, pp. 24–32.
    [19] Wu, B., Liu, X., Zhou, X., and Jiang, C. Flep: Enabling flexible and efficient
    preemption on gpus. ACM SIGOPS Operating Systems Review 51, 2 (2017),
    483–496.
    [20] Zeno, L., Mendelson, A., and Silberstein, M. Gpupio: the case for i/o-driven
    preemption on gpus. In Proceedings of the 9th Annual Workshop on General
    Purpose Processing using Graphics Processing Unit (2016), ACM, pp. 63–71.

    QR CODE