Transfermer: 一種可遷移的基於Transformer的多智能體強化學習框架

簡易檢索 / 詳目顯示

回結果列表

研究生：	林柏均 Lin, Bor-Jiun
論文名稱：	Transfermer: 一種可遷移的基於Transformer的多智能體強化學習框架 Transfermer: A Transferable Transformer-based Multi-Agent Reinforcement Learning Framework
指導教授：	陳煥宗 Chen, Hwann-Tzong 李濬屹 Lee, Chun-Yi
口試委員:	邱維辰 Chiu, Wei-Chen 劉育綸 Liu, Yu-Lun
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2024
畢業學年度：	113
語文別：	英文
論文頁數：	51
中文關鍵詞：	遷移學習、多智能體強化學習
相關次數：	點閱：186 下載：3
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

遷移學習是深度學習領域中廣泛應用的一種技術。然而，其在多智能體強化學習(MARL)中的應用受到多種因素的影響，如智能體數量的變化、多樣的動作空間以及不同實體的組合。為了解決這些挑戰，我們提出了Transfermer，一種可遷移知識並基於Transformer的強化學習框架，旨在適應不同的輸入規模、智能體類型和動作空間。Transfermer結合了三種功能性Embedding、一個可適應不同環境輸入的智能體網絡以及一個可擴展的獎賞分配網路，其作用在提升訓練性能和轉移以學習的知識之能力。此外，我們提出了一種填充方案，該方案可以很好地泛化到現有的獎賞分配網路架構中。這一方案有助於在轉移到新環境時保持協作策略。為了驗證我們方法在不同基準場景中的泛用性，我們在三個著名的MARL環境上評估了Transfermer:粒子環境、SMAC和SMACv2。我們的實驗結果和切除研究分析表明，Transfermer在表現和可遷移性方面能夠超越多種基於計算價值的方法。

Transfer learning is a widely employed technique across various deep learning domains. However, its application in multi-agent reinforcement learning (MARL) is complicated by factors such as varying numbers of agents, diverse action spaces, and different combinations of entities. To tackle these challenges, we introduce Transfermer, a Transferable Transformer-based Reinforcement Learning framework designed to accommodate a range of input sizes, entity types, as well as action spaces. Transfermer incorporates three types of functional embeddings, a flexible agent network, and a scalable mixing network, all of which aim to enhance both training performance and transferability. In addition, we propose a padding scheme that can be readily generalized to existing mixing network architectures. This scheme facilitates the retention of collaborative policies when transferring to novel environments. In order to validate the applicability of our method across different benchmark scenarios, we evaluate Transfermer on three well-known MARL benchmarks: Particle Environments, SMAC, and SMACv2. Our experimental findings and ablation analyses demonstrate that Transfermer is able to outperform multiple value-based methods in performance and transferability.

Abstract (Chinese) I
Acknowledgements (Chinese) II
Abstract III
Acknowledgements IV
Contents V
List of Figures VII
List of Tables IX
Introduction 1
2 Preliminaries 6
2.0.1 CooperativeMARL....................... 6
2.0.2 MARLTransferLearning ................... 7
3 Methodology 8
3.0.1 OverviewoftheTransfermerFramework . . . . . . . . . . . 8
3.0.2 TheFlexibleAgentNetwork.................. 9
3.0.3 TheEmbeddingSchemes.................... 12
3.0.4 TheScalableMixingNetwork ................. 15
3.0.5 LossFunction.......................... 16
4 Experimental Results 17
4.0.1 ExperimentalSetups ...................... 17
4.0.2 Transfermer Agent Network Comparison . . . . . . . . . . . 21 4.0.3 TransferLearningPerformance ................ 22
4.0.4 AblationStudy ......................... 24
5 Conclusions 26
6 Appendix 27
6.0.1 NotationTable ......................... 27
6.0.2 Background of the Value Factorization Methods . . . . . . . 28 6.0.3 BackgroundofTransformer .................. 33
6.0.4 HyperparameterSettings.................... 34
6.0.5 AdditionalExperimentalResults ............... 34
6.0.6 Limitations and Potential Future Avenues . . . . . . . . . . 45
Bibliography 47
                                

[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, and et al., “Atten- tion is all you need,” in Proc. Conf. on Neural Information Processing Systems(NeurIPS), vol. 30, 2017.
[2] M. Samvelyan, T. Rashid, C. Schro ̈der de Witt, G. Farquhar, N. Nardelli, and et al., “The starcraft multi-agent challenge,” vol. arXiv:1902.04043, 2019.
[3] B. E., S. M., M. S., A. M., S. W., and et al., “Smacv2: An improved benchmark for cooperative multi-agent reinforcement learning,” vol. arXiv:2212.07489, 2022.
[4] K. Mason, P. Mannion, J. Duggan, and E. Howley, “Applying multi-agent reinforcement learning to watershed management,” in Proc. Int. Conf. on Autonomous Agents and MultiAgent Systems (AAMAS), 2016.
[5] S. S. Shai, S. Shammah, and A. Shashua, “Safe, multi-agent, reinforcement learning for autonomous driving,” vol. arXiv:1610.03295, 2016.
[6] M. Tan, “Multi-agent reinforcement learning: Independent versus cooperative agents,” in Proc. Int. Conf. on Machine Learning (ICML), 1997.
[7] F. A. Oliehoek, M. T. J. Spaan, and N. Vlassis, “Optimal and approximate q-value functions for decentralized pomdps,” Journal of Artificial Intelligence Research, vol. 32, p. 289–353, 2008.
[8] R. Lowe, Y. WU, A. Tamar, J. Harb, O. Abbeel, P., and et al., “Multi-agent actor-critic for mixed cooperative-competitive environments,” in Proc. Conf. on Neural Information Processing Systems (NeurIPS), vol. 30, 2017.
[9] J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counter- factual multi-agent policy gradients,” in Proc. Conf. on Artificial Intelligence
(AAAI), 2018.
[10] X. L., Y. X., B. D., and C. A., “Contrasting centralized and decentralized critics in multi-agent reinforcement learning,” in Proc. Int. Conf. on Autonomous Agents and MultiAgent Systems (AAMAS), 2021.
[11] T. Rashid, M. Samvelyan, C. Schroeder, G. Farquhar, J. Foerster, and S. White- son, “QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning,” in Proc. Int. Conf. on Machine Learning (ICML), 2018.
[12] Y. Yang, J. Hao, B. Liao, K. Shao, G. Chen, and et al., “Qatten: A gen- eral framework for cooperative multiagent reinforcement learning,” arXiv, vol. arXiv:2002.03939, 2020.
[13] T. Rashid, G. Farquhar, B. Peng, and S. Whiteson, “Weighted QMIX: Expand- ing monotonic value function factorisation for deep multi-agent reinforcement learning,” vol. 33, 2020.
[14] J. Wang, Z. Ren, T. Liu, Y. Yu, and C. Zhang, “QPLEX: duplex dueling multi-agent q-learning,” in Int. Conf. on Learning Representations (ICLR), 2021.
[15] Y. Yang, Y. Wen, Y. Chen, H. Wang, K. Shao, and et al., “Multi-agent determinantal q-learning,” in Proc. Int. Conf. on Machine Learning (ICML), 2020.
[16] O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A. S. Vezhnevets, and et al., “Starcraft II: A new challenge for reinforcement learning,” vol. arXiv:1708.04782, 2017.
[17] S. Hu, F. Zhu, X. Chang, and X. Liang, “UPDeT: Universal multi-agent reinforcement learning via policy decoupling with transformers,” in Int. Conf. on Learning Representations (ICLR), 2021.
[18] T. Zhou, F. Zhang, K. Shao, K. Li, W. Huang, and et al., “Cooperative multi-agent transfer learning with level-adaptive credit assignment,” vol. arXiv:2106.00517, 2021.
[19] Y. Guo, H. Shi, A. Kumar, K. Grauman, T. Rosing, and et al., “Spottune: transfer learning through adaptive fine-tuning,” in Int. Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4805–4814, 2019.
[20] S. Ruder, M. E. Peters, S. Swayamdipta, and T. Wolf, “Transfer learning in natural language processing,” in North American Chapter of the Association for Computational Linguistics (NAACL), 2019.
[21] T. Islam, D. M. H. Abid, T. Rahman, Z. Zaman, K. Mia, and R. Hossain, “Transfer learning in deep reinforcement learning,” in Proc. Int. Cong. on Information and Communication Technology (ICICT), pp. 145–153, 2023.
[22] L. Meng, M. Wen, Y. Yang, C. Le, X. Y. Li, and et al., “Offline pre-trained multi-agent decision transformer,” 2022.
[23] W. C. Tseng, T. H. Wang, Y. C. Lin, and P. Isola, “Offline multi-agent reinforcement learning with knowledge distillation,” in Proc. Conf. on Neural Information Processing Systems (NeurIPS), 2022.
[24] K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In Proc. Int. Conf. on Machine Learning (ICML), 2019.
[25] M. J. Hausknecht and P. Stone. Deep recurrent q-learning for partially observable mdps. In Proc. Conf. on Arti"cial Intelligence (AAAI), 2015.
[26] F. A. Oliehoek and C. Amato. A Concise Introduction to Decentralized POMDPs. Springer Publishing Company, Incorporated, 2016.
[27] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, and et al. Attention is all you need. In Proc. Conf. on Neural Information Processing Systems (NeurIPS), volume 30, 2017.
[28] M. Zhou, Z. Liu, P. Sui, Y. Li, and Y. Y. Chung. Learning implicit credit assignment for cooperative multi-agent reinforcement learning. In Proc. Conf. on Neural Information Processing Systems (NeurIPS), 2020.
[29] S. Shen, M. Qiu, J. Liu, W. Liu, Y. Fu, and et al. Resq: A residual q function-based approach for multi-agent reinforcement learning value factorization. In Proc Conf. on Neural Information Processing Systems (NeurIPS), 2022.
[30] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9:1735–1780, 1997.
[31] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. In Proc. Conf. on Neural Information Processing Systems Workshop (NeurIPSW), 2014.
[32] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and et al. End-to-end object detection with transformers. In Proc. European Conf. on Computer Vision (ECCV), 2020.
[33] Igor Mordatch and Pieter Abbeel. Emergence of grounded compositional language in multi-agent populations. arXiv:1703.04908, 2017.
[34] M. Samvelyan, T. Rashid, C. Schröder de Witt, G. Farquhar, N. Nardelli, and et al. The starcraft multi-agent challenge. arXiv:1902.04043, 2019.
[35] A. Dosovitskiy, L. Beyerm, A. Kolesnikov, D Weissenborn, X. Zhai, and et al. An image is worth 16x16 words: Transformers for image recognition at scale. In Int. Conf. on Learning Representations (ICLR), 2021.

簡易檢索 / 詳目顯示

相關論文