簡易檢索 / 詳目顯示

研究生: 吳仲晏
Wu, Chung-Yen
論文名稱: NWQMIX - 帶有負權重的競技型QMIX擴展版本
NWQMIX-an Extension of QMIX with Negative Weights for Competitive Play
指導教授: 李端興
Lee, Duan-Shin
口試委員: 張正尚
Chang, Cheng-Shang
易志偉
Yi, Chih-Wei
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 27
中文關鍵詞: 多智能體強化學習中心化訓練分散式學習QMIX競爭
外文關鍵詞: multi-agent reinforcement learning, centralized training, decentralized execution, QMIX, competition
相關次數: 點閱:36下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本文中,我們探討並緩解了合作競爭環境中多智能體強化學習 (MARL)中大狀態空間及動作空間所帶來的挑戰。我們的重點是智能體 學習並適應對手策略時的迭代改進。我們提出了一種基於集中訓練和分散執行 等技術的演算法。基於QMIX 框架,我們不僅考慮了隊友的資訊,還考慮了敵 人的資訊並在混合網路中利用負權重混合,從而提高了以對抗性互動為特徵的 環境中的學習效率和策略深度。透過推進這些演算法技術,我們的方法不僅加 速了學習過程,而且還促進了競爭下的穩健決策。透過實驗,顯現我們的演算 法在捕食者-獵物合作競爭環境中明顯超越現有的MARL 方法。


    In this paper, we research and alleviate the challenges posed by the expansive state and action space in Multi-Agent Reinforcement Learning (MARL) within cooperative-competitive scenarios. Our focus is on the iterative improvement of agents as they learn from and adapt to their opponents’ strategies. We propose an algorithm based on techniques such as centralized training with decentralized execution. Building on the QMIX framework, our approach incorporates oppo- nent information. It utilizes negative weight mixing in the mixing network, which enhances learning efficiency and strategic depth in environments characterized by adversarial interactions. By advancing these algorithmic techniques, our approach not only accelerates the learning process but also fosters robust decision-making under competition. Through experiments, we demonstrate that our algorithm significantly outperforms existing MARL methods in Predator-Prey cooperative- competitive settings.

    Abstract (Chinese) I Abstract II Acknowledgements (Chinese) III Contents IV List of Figures VI List of Tables VII 1 Introduction 1 2 Background 4 2.1 VDN, QMIX, and QTRAN . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Deep Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Independent Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Methodology 9 4 Experimental Setup 15 4.1 Decentralized Predator-Prey . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Ablations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 IV 5 Results 18 5.1 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.2 Ablation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6 Conclusion 23 Bibliography 24

    [1] Peter Fletcher, Hughes Hoyle, and C. Wayne Patty. Foundations of Discrete
    Mathematics. PWS-KENT Publishing Company, 1991.
    [2] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-
    Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative ad-
    versarial networks. Advances in neural information processing systems, 27,
    2014.
    [3] David Ha, Andrew M. Dai, and Quoc V. Le. Hypernetworks. CoRR,
    abs/1609.09106, 2016.
    [4] Landon Kraemer and Bikramjit Banerjee. Multi-agent reinforcement learning
    as a rehearsal for decentralized planning. Neurocomputing, 190:82–94, 2016.
    [5] Michael L. Littman. Friend-or-foe Q-learning in general sum games. In
    ICML’01, pages 322 – 328, 2001.
    [6] Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and
    Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive
    environments. Advances in neural information processing systems, 30, 2017.
    [7] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis
    Antonoglou, Daan Wierstra, and Martin A. Riedmiller. Playing Atari with
    deep reinforcement learning. CoRR, abs/1312.5602, 2013.
    [8] Frans A Oliehoek, Christopher Amato, et al. A concise introduction to de-
    centralized POMDPs, volume 1. Springer, 2016.
    [9] Frans A Oliehoek, Matthijs TJ Spaan, and Nikos Vlassis. Optimal and ap-
    proximate Q-value functions for decentralized POMDPs. Journal of Artificial
    Intelligence Research, 32:289–353, 2008.
    [10] Tabish Rashid, Mikayel Samvelyan, Christian Schr ̈oder de Witt, Gregory Far-
    quhar, Jakob N. Foerster, and Shimon Whiteson. QMIX: monotonic value
    function factorisation for deep multi-agent reinforcement learning. CoRR,
    abs/1803.11485, 2018.
    [11] G. Rummery and Mahesan Niranjan. On-line Q-learning using connectionist
    systems. Technical Report CUED/F-INFENG/TR 166, 11 1994.
    [12] Ahmad EL Sallab, Mohammed Abdou, Etienne Perot, and Senthil Yogamani.
    Deep reinforcement learning framework for autonomous driving. Electronic
    Imaging, 29(19):70–76, January 2017.
    [13] Mikayel Samvelyan, Tabish Rashid, Christian Schr ̈oder de Witt, Gregory Far-
    quhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr,
    Jakob N. Foerster, and Shimon Whiteson. The starcraft multi-agent challenge.
    CoRR, abs/1902.04043, 2019.
    [14] Yoav Shoham and Kevin Leyton-Brown. Multiagent systems: algorithmic,
    game-theoretic, and logical foundations. Cambridge University Press, New
    York City, United States, December 2008.
    [15] David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre,
    George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda
    Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach,
    Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game
    of go with deep neural networks and tree search. Nature, 529(7587):484–489,
    Jan 2016.
    [16] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja
    Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian
    Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George
    van den Driessche, Thore Graepel, and Demis Hassabis. Mastering the game
    of Go without human knowledge. Nature, 550(7676):354–359, Oct 2017.
    [17] Kyunghwan Son, Daewoo Kim, Wan Ju Kang, David Hostallero, and Yung
    Yi. QTRAN: learning to factorize with transformation for cooperative multi-
    agent reinforcement learning. CoRR, abs/1905.05408, 2019.
    [18] Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki,
    Vin ́ıcius Flores Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat,
    Joel Z. Leibo, Karl Tuyls, and Thore Graepel. Value-decomposition networks
    for cooperative multi-agent learning. CoRR, abs/1706.05296, 2017.
    [19] Richard S. Sutton. Learning to predict by the methods of temporal differences.
    Machine Learning, 3(1):9–44, Aug 1988.
    [20] Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative
    agents. In Proceedings of the tenth international conference on machine learn-
    ing, pages 330–337, 1993.
    [21] J. v. Neumann. Zur theorie der gesellschaftsspiele. Mathematische Annalen,
    100(1):295–320, Dec 1928.
    [22] Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learn-
    ing, 8(3):279–292, May 1992.
    [23] Hao Ye, Geoffrey Ye Li, and Biing-Hwang Fred Juang. Deep reinforcement
    learning based resource allocation for V2V communications. IEEE Transac-
    tions on Vehicular Technology, 68(4):3163–3173, 2019.

    QR CODE