實現容器間的GPU多租戶解決方法提升雲平台之資源使用率

簡易檢索 / 詳目顯示

回結果列表

研究生：	白信佑 Pai, Hsin-Yu
論文名稱：	實現容器間的GPU多租戶解決方法提升雲平台之資源使用率 Enabling multi-tenant GPU for Maximizing resource utilization in containerized cloud environment
指導教授：	周志遠 Chou, Jerry
口試委員:	李哲榮 Lee, Che-Rung 賴冠州 Lai, Kuan-Chou
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	24
中文關鍵詞：	圖形處理器、多租戶、虛擬化、容器、排程
外文關鍵詞：	Gpu, Multi-tenancy, Virtualization, Container, Scheduling
相關次數：	點閱：90 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在現代計算系統中，GPU和container扮演著很重要的角色。由於GPU強大的平行計算能力，它可以縮短處理大量數據的時間成本。與VM相比，container對資源的需求較少，因此它可以創建更多instance而且開機時間更短。

然而，應用程序並不總是充分利用GPU，導致問題GPU利用不足的問題。為了提高利用率，必須同時執行多個應用程序。

但是GPU共享並不容易，因為系統還無法控制GPU資源。為了解決這個問題，我們設計了一個軟件解決方案來管理GPU資源。解決方法包括前端和後端。所有應用程序都在前端。後端負責安排應用程序。如果應用程序啟動GPU功能，它將被前端攔截庫攔截並將請求發送到後端。在後端排程並收到執行信號後，前端應用程序才可以恢復執行。通過這種方式，我們可以管理每個容器的使用。

除了提高利用率外，我們還實現了公平。請注意，我們在本文中定義的公平性是控制每個用戶在用戶定義的最大和最小利用率之間的GPU時間使用。利用率表示用戶程序在一個時間間隔內可以使用GPU的時間百分比。

In modern computing system, GPU and containers play an important role. Because of GPU is very powerful of parallel computing, it can shorten the time cost for dealing with big amount of data. Compare to VMs, container has less demand for resources, so it is able to create more instances and has less boot time.

However applications don't always fully utilize GPU, leads to the problem : GPU under-utilization. In order to increase utilization, it has to make multiple application execute concurrently.

But it is not easy to share GPU, because the system is not able to control GPU resources yet. In order to solve this problem, we designed a software solution to manage GPU resources. The solution consists of Frontend and Backend. All applications are in Frontend. Backend is responsible for scheduling the applications. If application launches GPU function, it will be intercepted by Frontend intercepting library and send the request to Backend. After Backend scheduling and received the execution signal, Frontend applications could resume its execution. By this way, we are able to manage the usage of each container.

In addition to increase utilization, we also achieved fairness. Note that fairness we defined in this paper is to control the GPU time usage of each user between user-defined maximum and minimum utilization. The utilization represents the percentage of time that the user program can use GPU in a time interval.

Introduction 1
Related Work 4
Mythology 6
1 Resource Requirement . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1 Frontend . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Backend . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1 Intercept function call . . . . . . . . . . . . . . . . . . . . 8
2.2 Evaluate the time usage . . . . . . . . . . . . . . . . . . . 8
2.3 Communication between Frontend & Backend . . . . . . . 10
Scheduling Algorithm 12
1 Utilization Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Selection Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Time Quota Assignation . . . . . . . . . . . . . . . . . . . . . . . 14
Experimental Setup 15
Experimental Evaluation 16
1 Tenancy Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.1 Fully utilized by minimum requirements . . . . . . . . . . . 16
1.2 Fully utilized by maximum requirements . . . . . . . . . . 17
1.3 Under utilized by maximum requirements . . . . . . . . . . 17
2 Time Quota Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 18
Conclusion 22
References 23
                                

[1] Chu, H.-h., and Nahrstedt, K. Cpu service classes for multimedia applications.
In Proceedings IEEE International Conference on Multimedia Computing and
Systems (1999), vol. 1, IEEE, pp. 296–301.
[2] Dickman, L., Lindahl, G., Olson, D., Rubin, J., and Broughton, J. Pathscale infinipath:
A first look. In 13th Symposium on High Performance Interconnects
(HOTI’05) (2005), IEEE, pp. 163–165.
[3] Dwarakinath, A. A fair-share scheduler for the graphics processing unit. PhD
thesis, The Graduate School, Stony Brook University: Stony Brook, NY.,
2008.
[4] Gottschlag, M., Hillenbrand, M., Kehne, J., Stoess, J., and Bellosa, F. Logv:
Low-overhead gpgpu virtualization. In 2013 IEEE 10th International Conference
on High Performance Computing and Communications & 2013 IEEE
International Conference on Embedded and Ubiquitous Computing (2013),
IEEE, pp. 1721–1726.
[5] Gupta, K., Stuart, J. A., and Owens, J. D. A study of persistent threads style gpu
programming for gpgpu workloads. In 2012 Innovative Parallel Computing
(InPar) (2012), IEEE, pp. 1–14.
[6] Gupta, V., Schwan, K., Tolia, N., Talwar, V., and Ranganathan, P. Pegasus:
Coordinated scheduling for virtualized accelerator-based systems. In 2011
USENIX Annual Technical Conference (USENIX ATC’11) (2011), p. 31.
[7] Kang, D., Jun, T. J., Kim, D., Kim, J., and Kim, D. Convgpu: Gpu management
middleware in container based virtualized environment. In 2017 IEEE
International Conference on Cluster Computing (CLUSTER) (2017), IEEE,
pp. 301–309.
[8] Kato, S., Lakshmanan, K., Rajkumar, R., and Ishikawa, Y. Timegraph: Gpu
scheduling for real-time multi-tasking environments. In Proc. USENIX ATC
(2011), pp. 17–30.
[9] Kato, S., McThrow, M., Maltzahn, C., and Brandt, S. Gdev: First-class GPU
resource management in the operating system. In Presented as part of the
2012 USENIX Annual Technical Conference (USENIX ATC 12) (Boston, MA,
2012), USENIX, pp. 401–412.
[10] Kyriazis, G. Heterogeneous system architecture: A technical review. AMD
Fusion Developer Summit (2012), 21.
[11] Menychtas, K., Shen, K., and Scott, M. L. Disengaged scheduling for fair,
protected access to fast computational accelerators. In ACM SIGPLAN Notices
(2014), vol. 49, ACM, pp. 301–316.
[12] OrgFoundation, X. Nouveau: Accelerated open source driver for nvidia cards.
URL https://nouveau. freedesktop. org/wiki (2011).
[13] Park, J. J. K., Park, Y., and Mahlke, S. Chimera: Collaborative preemption for
multitasking on a shared gpu. ACM SIGPLAN Notices 50, 4 (2015), 593–606.
[14] Rossbach, C. J., Currey, J., Silberstein, M., Ray, B., and Witchel, E. Ptask:
operating system abstractions to manage gpus as compute devices. In Proceedings
of the Twenty-Third ACM Symposium on Operating Systems Principles
(2011), ACM, pp. 233–248.
[15] Suzuki, Y., Kato, S., Yamada, H., and Kono, K. Gpuvm: Why not virtualizing
gpus at the hypervisor? In Proceedings of the 2014 USENIX Conference on
USENIX Annual Technical Conference (Berkeley, CA, USA, 2014), USENIX
ATC’14, USENIX Association, pp. 109–120.
[16] Suzuki, Y., Yamada, H., Kato, S., and Kono, K. Gloop: An event-driven runtime
for consolidating gpgpu applications. In SoCC 2017 - Proceedings of the
2017 Symposium on Cloud Computing (9 2017), Association for Computing
Machinery, Inc, pp. 80–93.
[17] Tanasic, I., Gelado, I., Cabezas, J., Ramirez, A., Navarro, N., and Valero,
M. Enabling preemptive multiprogramming on gpus. In 2014 ACM/IEEE
41st International Symposium on Computer Architecture (ISCA) (2014), IEEE,
pp. 193–204.
[18] Wang, L., Huang, M., and El-Ghazawi, T. Exploiting concurrent kernel execution
on graphic processing units. In 2011 International Conference on High
Performance Computing & Simulation (2011), IEEE, pp. 24–32.
[19] Wu, B., Liu, X., Zhou, X., and Jiang, C. Flep: Enabling flexible and efficient
preemption on gpus. ACM SIGOPS Operating Systems Review 51, 2 (2017),
483–496.
[20] Zeno, L., Mendelson, A., and Silberstein, M. Gpupio: the case for i/o-driven
preemption on gpus. In Proceedings of the 9th Annual Workshop on General
Purpose Processing using Graphics Processing Unit (2016), ACM, pp. 63–71.

簡易檢索 / 詳目顯示

相關論文