以深度強化學習方法解決具可調式平行度工作的排程最佳化問題

簡易檢索 / 詳目顯示

回結果列表

研究生：	董子睿 Tung, Tzu-Ruei
論文名稱：	以深度強化學習方法解決具可調式平行度工作的排程最佳化問題 DRL: A Deep Reinforcement Learning Scheduling Algorithm for Minimizing the Average Completion Time of Moldable Jobs
指導教授：	周志遠 Chou, Jerry
口試委員:	李端興 LEE, DUAN-SHIN 李哲榮 LEE, CHE-RUNG
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	31
中文關鍵詞：	分散式系統、雲端計算、深度學習、強化學習、工作排程、可調式平行度工作
外文關鍵詞：	distributed system, cloud computing, deep learning, reinforcement learning, job scheduling, moldable job
相關次數：	點閱：125 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在近期快速發展的HPC（高性能計算）領域中，工作排程器就是分散式系統和雲端系統的作業系統，它們負責管理計算資源並掌控程序的執行。對於那些可調式平行度的工作，也就是可以以不同平行度運行的工作; 我們發現當前主流的排程器框架（Slurm，Kubernetes和Mesos）通常不會負責調整工作運行的平行度，而是由使用者來決定工作的平行度。然而用戶不熟悉系統的狀態和特徵，所以用戶並不能根據系統的狀態和特徵做出適當的決定。因此，我們提出了可以處理挑選工作和調整工作平行度的DRL。我們使用深度強化學習技術來優化DRL調度器的策略，讓DRL可以自動學習系統和工作負載的特性。我們的實驗結果顯示我們與最優的基於閾值的方法相比，我們的方法平均節省了大約20％的工作完成時間。

In the rapidly expanding field of HPC (High Performance Computing), job schedulers are the operator system of distributed systems and cloud systems, which manage computing resources and control the execution of the processes. For those moldable jobs, that is jobs which can be run in different degrees of parallelism; we find that the current popular scheduler (Slurm, Kubernetes, and Mesos) usually not responsible for the decision of job's scale, but the user determines the parallelism of the job. However, users are not familiar with the status and characteristics of the system, so users cannot make appropriate decisions based on the status and characteristics of the system. Thus, we proposed DRL that can handle both job selection and job scaling. We use the technique of deep reinforcement learning to optimize the policy of DRL scheduler, which could automatically learn the characteristics of system and workload. Our evaluation shows that our approach saved about 20\% of average job completion time compared with the optimal threshold-based method.

摘要
致謝
目錄
第一章：Introduction ------------------- 1
第二章：Background ------------------- 4
第三章：Problem definition ------------- 7
第四章：Challenges -------------------- 11
第五章：Methods -----------------------13
第六章：Experiments ------------------- 17
第七章：Related work -------------------26
第八章：Conclusion ---------------------28
參考文獻 --------------------------------29
                                

[1]Bao,Y.,Peng,Y.,andWu,C. Deeplearning-basedjobplacementindistributedmachine learning clusters. InIEEE INFOCOM 2019 - IEEE Conference onComputer Communications(April 2019), pp. 505–513.[2]Barsanti, L., and Sodan, A. C. Adaptive job scheduling via predictive job re-sourceallocation.InJobSchedulingStrategiesforParallelProcessing(Berlin,Heidelberg, 2007), E. Frachtenberg and U. Schwiegelshohn, Eds., SpringerBerlin Heidelberg, pp. 115–140.[3]Burns, B., Grant, B., Oppenheimer, D., Brewer, E., and Wilkes, J. Borg,omega, and kubernetes.ACM Queue 14(2016), 70–93.[4]Chen, L., Lingys, J., Chen, K., and Liu, F. Auto: Scaling deep reinforcementlearning for datacenter-scale automatic traffic optimization. InProceedings ofthe 2018 Conference of the ACM Special Interest Group on Data Communi-cation(New York, NY, USA, 2018), SIGCOMM ’18, ACM, pp. 191–205.[5]Cirne, W., and Berman, F. When the herd is smart: Aggregate behavior inthe selection of job request.IEEE Trans. Parallel Distrib. Syst. 14(2003),181–192.[6]Duan, Y., Chen, X., Houthooft, R., Schulman, J., and Abbeel, P. Benchmark-ing deep reinforcement learning for continuous control. InInternational Con-ference on Machine Learning(2016), pp. 1329–1338.[7]Eager, D. L., Zahorjan, J., and Lazowska, E. D. Speedup versus efficiencyin parallel systems.IEEE Transactions on Computers 38, 3 (March 1989),408–423.[8]Hagan, M. T., Demuth, H. B., and Beale, M.Neural Network Design. PWSPublishing Co., Boston, MA, USA, 1996.[9]Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A. D., Katz,R., Shenker, S., and Stoica, I. Mesos: A Platform for Fine-grained ResourceSharing in the Data Center. InProceedings of the 8th USENIX Conference onNSDI(2011), pp. 295–308.[10]Jain, N., Bhatele, A., Robson, M. P., Gamblin, T., and Kale, L. V. Predictingapplicationperformanceusingsupervisedlearningoncommunicationfeatures.InSC ’13: Proceedings of the International Conference on High PerformanceComputing, Networking, Storage and Analysis(Nov 2013), pp. 1–12.
[11]John Schulman, Filip Wolski, P. D. A. R., and Klimov, O. Proximal policyoptimization algorithms, 2017.[12]Mao, H., Alizadeh, M., Menache, I., and Kandula, S. Resource managementwith deep reinforcement learning. InProceedings of the 15th ACM Workshopon Hot Topics in Networks(2016), ACM, pp. 50–56.[13]Sergeev,A.,andBalso,M.D. Horovod: fastandeasydistributeddeeplearningin TensorFlow.arXiv preprint arXiv:1802.05799(2018).[14]Shi, S., Wang, Q., and Chu, X. Performance modeling and evalua-tion of distributed deep learning frameworks on gpus. In2018 IEEE16th Intl Conf on Dependable, Autonomic and Secure Computing, 16thIntl Conf on Pervasive Intelligence and Computing, 4th Intl Conf onBig Data Intelligence and Computing and Cyber Science and TechnologyCongress(DASC/PiCom/DataCom/CyberSciTech)(Aug 2018), pp. 949–957.[15]Silver,D.,Huang,A.,Maddison,C.J.,Guez,A.,Sifre,L.,VanDenDriessche,G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.Mastering the game of go with deep neural networks and tree search.nature529, 7587 (2016), 484.[16]Sutton,R.S.,andBarto,A.G.Introduction to ReinforcementLearning,1sted.MIT Press, Cambridge, MA, USA, 1998.[17]Sutton, R. S., McAllester, D., Singh, S., and Mansour, Y. Policy gradientmethods for reinforcement learning with function approximation. InProceed-ings of the 12th International Conference on Neural Information ProcessingSystems(Cambridge, MA, USA, 1999), NIPS’99, MIT Press, pp. 1057–1063.[18]Volodymyr Mnih, Koray Kavukcuoglu, D. S. A. G. I. A. D. W. M. R. Playingatari with deep reinforcement learning, 2013.[19]Volodymyr Mnih, Adrià Puigdomènech Badia, M. M. A. G. T. P. L. T. H. D.S. K. K. Asynchronous methods for deep reinforcement learning.[20]Wajahat, M., Gandhi, A., Karve, A., and Kochut, A. Using machine learningfor black-box autoscaling. In2016 Seventh International Green and Sustain-able Computing Conference (IGSC)(Nov 2016), pp. 1–8.[21]Wang, Z., Gwon, C., Oates, T., and Iezzi, A. Automated cloud provisioningon AWS using deep reinforcement learning.CoRR abs/1709.04305(2017).[22]Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., and De Fre-itas, N. Dueling network architectures for deep reinforcement learning.arXivpreprint arXiv:1511.06581(2015).[23]Yigitbasi, N., Willke, T. L., Liao, G., and Epema, D. Towards machinelearning-based auto-tuning of mapreduce. In2013 IEEE 21st InternationalSymposium on Modelling, Analysis and Simulation of Computer and Telecom-munication Systems(Aug 2013), pp. 11–20. [24]Yoo, A. B., Jette, M. A., and Grondona, M. Slurm: Simple linux utility forresource management. InJob Scheduling Strategies for Parallel Processing(Berlin, Heidelberg, 2003), D. Feitelson, L. Rudolph, and U. Schwiegelshohn,Eds., Springer Berlin Heidelberg, pp. 44–60.[25]Zeng, A., Song, S., Welker, S., Lee, J., Rodriguez, A., and Funkhouser, T.Learning synergies between pushing and grasping with self-supervised deepreinforcement learning.arXiv preprint arXiv:1803.09956(2018).

簡易檢索 / 詳目顯示

相關論文