簡易檢索 / 詳目顯示

研究生: 唐晏湄
Tang, Yan-Mei
論文名稱: PCIe 頻寬共享限制下的多執行個體 GPU 之調度器
PCIe Bandwidth Aware Scheduling for Multi-Instance GPU
指導教授: 周志遠
Chou, Jerry
口試委員: 賴冠州
Lai, Kuan-Chou
李哲榮
Lee, Che-Rung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 26
中文關鍵詞: 多執行個體 GPU資源共享多租戶
外文關鍵詞: Multi-Instance GPU, Resource Sharing, Multi-Tenancy
相關次數: 點閱:33下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著 GPU 技術不斷發展,也促進人工智慧相關領域越加蓬勃發展。然而當 GPU 的計算能力進步速度遠遠超過許多常見工作的需求,就導致資源使用效率低落、資源浪費的問題。NVIDIA 於是推出多執行個體 GPU (MIG) 的功能,其能夠將一張完整的 GPU 在硬體架構層面切分成多個 MIG 切片。這項功能能夠讓使用者在同一張 GPU 上同時運行多個不同的工作,也因此能夠提高原先被浪費掉的資源。儘管 MIG 能夠確保不同 MIG 切片之間的資源獨立,然而我們發現這些 MIG 切片之間仍舊會共享同一個 PCIe 頻寬,有可能會造成頻寬競爭的問題,特別在運行的工作對 PCIe 頻寬需求特別大量時會更加嚴重。我們是第一個發現這項問題的研究,並且也針對頻寬競爭的狀況提出了解決方法。我們設計一個會考量到 PCIe 頻寬限制的多執行個體 GPU (MIG) 之調度器來管理 PCIe 頻寬在多個 MIG 切片之間共享的情形。我們的實驗結果也證實了我們提出的調度器和基準方法的累積作業時間相比有明顯下降,而且這項實驗結果在實體的 NVIDIA A100 GPU 上以及模擬環境中都有得到實證。


    As GPU technology advances, it significantly propels progress in the field of AI. However, the rapid growth in GPU computational power often outpaces the capabilities required by many existing workloads, resulting in under utilization of GPU resources. The introduction of Multi-Instance GPU (MIG) by NVIDIA A100 GPUs enables the partitioning of physical GPU resources. This allows multiple tasks to share entire GPUs simultaneously, thereby enhancing resource utilization. Despite the resource isolation provided by MIG, we observe that PCIe bandwidth remains shared among MIG slices, potentially leading to contention, especially during periods of intense PCIe bandwidth utilization. Our research identifies and addresses this issue, being among the first to demonstrate the occurrence of PCIe bandwidth contention across MIG slices in tasks with high bandwidth requirements. To mitigate this contention issue, we propose a PCIe bandwidth aware scheduler designed to effectively manage PCIe bandwidth sharing across MIG slices. Our experimental results demonstrate that the scheduler outperforms the baseline approach in reducing aggregated job completion time, as demonstrated in both on NVIDIA A100 GPU hardware and simulations.

    1. Introduction 1 2. Related Work 4 3. Preliminary Experiments 7 4. Methodology 11 5. Evaluation 18 6. Conclusion 24 Reference 25

    [1] Belviranli, M. E., Khorasani, F., Bhuyan, L. N., and Gupta, R. Cumas: Data transfer aware multi-application scheduling for shared gpus. In Proceedings of the 2016 International Conference on Supercomputing (2016), pp. 1–12.
    [2] Chaudhary, S., Ramjee, R., Sivathanu, M., Kwatra, N., and Viswanatha, S. Balancing efficiency and fairness in heterogeneous gpu clusters for deep learning. In Proceedings of the Fifteenth European Conference on Computer Systems (2020), pp. 1–16.
    [3] Chen, Q., Yang, H., Mars, J., and Tang, L. Baymax: Qos awareness and increased utilization for non-preemptive accelerators in warehouse scale computers. ACM SIGPLAN Notices 51, 4 (2016), 681–696.
    [4] Li, B., Gadepally, V., Samsi, S., and Tiwari, D. Characterizing multi-instance gpu for machine learning workloads. In 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (2022), IEEE, pp. 724–731.
    [5] Li, B., Patel, T., Samsi, S., Gadepally, V., and Tiwari, D. Miso: exploiting multi-instance gpu capability on multi-tenant gpu clusters. In Proceedings of the 13th Symposium on Cloud Computing (2022), pp. 173–189.
    [6] Porter, C., Chen, C., and Pande, S. Compiler-assisted scheduling for multi-instance gpus. In Proceedings of the 14th Workshop on General Purpose Processing Using GPU (2022), pp. 1–6.
    [7] Shen, H., Chen, L., Jin, Y., Zhao, L., Kong, B., Philipose, M., Krishnamurthy, A., and Sundaram, R. Nexus: A gpu cluster engine for accelerating dnn-based video analysis. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (2019), pp. 322–337.
    [8] Tan, C., Li, Z., Zhang, J., Cao, Y., Qi, S., Liu, Z., Zhu, Y., and Guo, C. Serving dnn models with multi-instance gpus: A case of the reconfigurable machine scheduling problem. arXiv preprint arXiv:2109.11067 (2021).
    [9] Van Heeswijk, M., Miche, Y., Oja, E., and Lendasse, A. Gpu accelerated and parallelized elm ensembles for large-scale regression. Neurocomputing 74, 16 (2011), 2430–2437.
    [10] Weng, Q., Xiao, W., Yu, Y., Wang, W., Wang, C., He, J., Li, Y., Zhang, L., Lin, W., and Ding, Y. {MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22) (2022), pp. 945–960.
    [11] Yang, Z., Wu, H., Xu, Y., Wu, Y., Zhong, H., and Zhang, W. Hydra: Deadline-aware and efficiency-oriented scheduling for deep learning jobs on heterogeneous gpus. IEEE Transactions on Computers (2023).
    [12] Zhang, H., Li, Y., Xiao, W., Huang, Y., Di, X., Yin, J., See, S., Luo, Y., Lau, C. T., and You, Y. Migperf: A comprehensive benchmark for deep learning training and inference workloads on multi-instance gpus. arXiv preprint arXiv:2301.00407 (2023).

    QR CODE