NWQMIX - 帶有負權重的競技型QMIX擴展版本｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	吳仲晏 Wu, Chung-Yen
論文名稱：	NWQMIX - 帶有負權重的競技型QMIX擴展版本 NWQMIX-an Extension of QMIX with Negative Weights for Competitive Play
指導教授：	李端興 Lee, Duan-Shin
口試委員:	張正尚 Chang, Cheng-Shang 易志偉 Yi, Chih-Wei
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2024
畢業學年度：	112
語文別：	英文
論文頁數：	27
中文關鍵詞：	多智能體強化學習、中心化訓練、分散式學習、QMIX 、競爭
外文關鍵詞：	multi-agent reinforcement learning, centralized training, decentralized execution, QMIX, competition
相關次數：	點閱：36 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在本文中，我們探討並緩解了合作競爭環境中多智能體強化學習 (MARL)中大狀態空間及動作空間所帶來的挑戰。我們的重點是智能體學習並適應對手策略時的迭代改進。我們提出了一種基於集中訓練和分散執行等技術的演算法。基於QMIX 框架，我們不僅考慮了隊友的資訊，還考慮了敵人的資訊並在混合網路中利用負權重混合，從而提高了以對抗性互動為特徵的環境中的學習效率和策略深度。透過推進這些演算法技術，我們的方法不僅加速了學習過程，而且還促進了競爭下的穩健決策。透過實驗，顯現我們的演算法在捕食者-獵物合作競爭環境中明顯超越現有的MARL 方法。

In this paper, we research and alleviate the challenges posed by the expansive state and action space in Multi-Agent Reinforcement Learning (MARL) within cooperative-competitive scenarios. Our focus is on the iterative improvement of agents as they learn from and adapt to their opponents’ strategies. We propose an algorithm based on techniques such as centralized training with decentralized execution. Building on the QMIX framework, our approach incorporates oppo- nent information. It utilizes negative weight mixing in the mixing network, which enhances learning efficiency and strategic depth in environments characterized by adversarial interactions. By advancing these algorithmic techniques, our approach not only accelerates the learning process but also fosters robust decision-making under competition. Through experiments, we demonstrate that our algorithm significantly outperforms existing MARL methods in Predator-Prey cooperative- competitive settings.

Abstract (Chinese) I
Abstract II
Acknowledgements (Chinese) III
Contents IV
List of Figures VI
List of Tables VII
1 Introduction 1
2 Background 4
2.1 VDN, QMIX, and QTRAN . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Deep Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Independent Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Methodology 9
4 Experimental Setup 15
4.1 Decentralized Predator-Prey . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Ablations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
IV
5 Results 18
5.1 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2 Ablation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6 Conclusion 23
Bibliography 24
                                

[1] Peter Fletcher, Hughes Hoyle, and C. Wayne Patty. Foundations of Discrete
Mathematics. PWS-KENT Publishing Company, 1991.
[2] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-
Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative ad-
versarial networks. Advances in neural information processing systems, 27,
2014.
[3] David Ha, Andrew M. Dai, and Quoc V. Le. Hypernetworks. CoRR,
abs/1609.09106, 2016.
[4] Landon Kraemer and Bikramjit Banerjee. Multi-agent reinforcement learning
as a rehearsal for decentralized planning. Neurocomputing, 190:82–94, 2016.
[5] Michael L. Littman. Friend-or-foe Q-learning in general sum games. In
ICML’01, pages 322 – 328, 2001.
[6] Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and
Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive
environments. Advances in neural information processing systems, 30, 2017.
[7] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis
Antonoglou, Daan Wierstra, and Martin A. Riedmiller. Playing Atari with
deep reinforcement learning. CoRR, abs/1312.5602, 2013.
[8] Frans A Oliehoek, Christopher Amato, et al. A concise introduction to de-
centralized POMDPs, volume 1. Springer, 2016.
[9] Frans A Oliehoek, Matthijs TJ Spaan, and Nikos Vlassis. Optimal and ap-
proximate Q-value functions for decentralized POMDPs. Journal of Artificial
Intelligence Research, 32:289–353, 2008.
[10] Tabish Rashid, Mikayel Samvelyan, Christian Schr ̈oder de Witt, Gregory Far-
quhar, Jakob N. Foerster, and Shimon Whiteson. QMIX: monotonic value
function factorisation for deep multi-agent reinforcement learning. CoRR,
abs/1803.11485, 2018.
[11] G. Rummery and Mahesan Niranjan. On-line Q-learning using connectionist
systems. Technical Report CUED/F-INFENG/TR 166, 11 1994.
[12] Ahmad EL Sallab, Mohammed Abdou, Etienne Perot, and Senthil Yogamani.
Deep reinforcement learning framework for autonomous driving. Electronic
Imaging, 29(19):70–76, January 2017.
[13] Mikayel Samvelyan, Tabish Rashid, Christian Schr ̈oder de Witt, Gregory Far-
quhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr,
Jakob N. Foerster, and Shimon Whiteson. The starcraft multi-agent challenge.
CoRR, abs/1902.04043, 2019.
[14] Yoav Shoham and Kevin Leyton-Brown. Multiagent systems: algorithmic,
game-theoretic, and logical foundations. Cambridge University Press, New
York City, United States, December 2008.
[15] David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre,
George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda
Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach,
Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game
of go with deep neural networks and tree search. Nature, 529(7587):484–489,
Jan 2016.
[16] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja
Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian
Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George
van den Driessche, Thore Graepel, and Demis Hassabis. Mastering the game
of Go without human knowledge. Nature, 550(7676):354–359, Oct 2017.
[17] Kyunghwan Son, Daewoo Kim, Wan Ju Kang, David Hostallero, and Yung
Yi. QTRAN: learning to factorize with transformation for cooperative multi-
agent reinforcement learning. CoRR, abs/1905.05408, 2019.
[18] Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki,
Vin ́ıcius Flores Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat,
Joel Z. Leibo, Karl Tuyls, and Thore Graepel. Value-decomposition networks
for cooperative multi-agent learning. CoRR, abs/1706.05296, 2017.
[19] Richard S. Sutton. Learning to predict by the methods of temporal differences.
Machine Learning, 3(1):9–44, Aug 1988.
[20] Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative
agents. In Proceedings of the tenth international conference on machine learn-
ing, pages 330–337, 1993.
[21] J. v. Neumann. Zur theorie der gesellschaftsspiele. Mathematische Annalen,
100(1):295–320, Dec 1928.
[22] Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learn-
ing, 8(3):279–292, May 1992.
[23] Hao Ye, Geoffrey Ye Li, and Biing-Hwang Fred Juang. Deep reinforcement
learning based resource allocation for V2V communications. IEEE Transac-
tions on Vehicular Technology, 68(4):3163–3173, 2019.

簡易檢索 / 詳目顯示

相關論文