簡易檢索 / 詳目顯示

研究生: 劉緒紳
Liu, Hsu-Shen
論文名稱: 語言引導之集群機器人圖樣形成與多智能強化學習
Language-Guided Pattern Formation for Swarm Robotics with Multi-Agent Reinforcement Learning
指導教授: 李濬屹
Lee, Chun-Yi
口試委員: 孫民
Sun, Min
楊元福
Yang, Yuan-Fu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 32
中文關鍵詞: 大型語言模型智慧型機器人集群機器人強化學習多智能強化學習圖樣形成
外文關鍵詞: Large Language Model, Intelligent Robotics, Swarm Robotics, Reinforcement Learning, Multi-Agent Reinforcement Learning, Pattern Formation
相關次數: 點閱:49下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 這篇論文探討如何利用大型語言模型(LLMs)所具有的廣泛知識以解決集群機器人系統的圖樣形成問題,並提出了一個名為 LGPF(語言引導圖樣形成)的新框架來達成這個任務。此框架將圖樣形成分解為兩個關鍵部分:圖樣合成和集群機器人控制。對於前者,本研究利用 LLM 卓越的少樣本泛化能力(Few-Shot Generalibility),將高層次的自然語言描述轉化為所需的空間圖樣坐標。這種方法克服了以往在表示和設計複雜圖樣上的限制。該框架進一步採用基於集中訓練與分散執行(CTDE)的多智能強化學習(MARL)方法來控制集群機器人形成指定圖樣,同時避免互相碰撞。基於 CTDE 的 MARL 演算法所學習到的分散策略,能夠在部分可觀察(Partially Observable)環境中,考量機器人之間如何在缺乏直接通訊的情況下進行協調。為了驗證我們框架的有效性,我們在模擬器和現實環境中進行了大量實驗。這些實驗驗證了 LGPF 能準確且安全地形成使用者所指定的多樣化圖樣。


    This paper explores leveraging the vast knowledge encoded in Large Language Models (LLMs) to tackle pattern formation challenges for swarm robotics systems. A new framework, named LGPF (Language-Guided Pattern Formation), is proposed to address these challenges. The framework breaks down the pattern formation into two key components: pattern synthesis and swarm robotics control. For the former, this study utilizes the exceptional few-shot generalizability of LLMs to translate high-level natural language descriptions into the desired spatial pattern coordinates. This approach allows for overcoming previous limitations in representing and designing complex patterns. The framework further employs a centralized training with cecentralized execution (CTDE) based multi-agent reinforcement learning (MARL) approach to control the swarm robots in forming the specified pattern while avoiding collisions. The decentralized policies learned with the CTDE-based MARL algorithm consider coordination between robots without direct communication under a partially observable setup. To validate the effectiveness of our framework, we perform extensive experiments in both simulation
    and real-world environments. These experiments validate LGPF’s effectiveness in accurately and safely forming diverse user-specified patterns.

    Abstract (Chinese) --------------------------- I Acknowledgements (Chinese) --------------------------- II Abstract --------------------------- III Acknowledgements --------------------------- IV Contents --------------------------- V List of Figures --------------------------- VII List of Tables --------------------------- VIII 1 Introduction --------------------------- 1 2 Related Works --------------------------- 6 3 Preliminary --------------------------- 9 4 Methodology --------------------------- 13 5 Experimental Results --------------------------- 19 6 Conclusions --------------------------- 27 Bibliography --------------------------- 28

    [1] C. C. Cossette et al., “Optimal multi-robot formations for relative pose estimation using range measurements,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 2431–2437, 2022.
    [2] J. Alonso-Mora et al., “Multi-robot system for artistic pattern formation,” in IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 4512–4517, 2011.
    [3] H. W. Kuhn, “The hungarian method for the assignment problem,” Naval research logistics quarterly, vol. 2, no. 1-2, pp. 83–97, 1955.
    [4] J. Van Den Berg et al., “Reciprocal n-body collision avoidance,” in Robotics Research: The 14th Int. Symp. ISRR, pp. 3–19, 2011.
    [5] J. Alonso-Mora et al., “Optimal reciprocal collision avoidance for multiple non-holonomic robots,” in Distributed autonomous robotic systems: The 10th Int. symposium, pp. 203–216, Springer, 2013.
    [6] J. Gielis et al., “A critical review of communications in multi-robot systems,” Current Robotics Reports, vol. 3, no. 4, pp. 213–225, 2022.
    [7] F. A. Oliehoek et al., “Optimal and approximate Q-value functions for decentralized pomdps,” Journal of Artificial Intelligence Research, vol. 32, pp. 289-353, 2008.
    [8] R. Lowe et al., “Multi-agent actor-critic for mixed cooperative-competitive environments,” Advances in Neural Information Processing Systems, vol. 30, 2017.
    [9] M. Ahn et al., “Do As I Can, Not As I Say: Grounding language in robotic affordances,” arXiv:2204.01691, 2022.
    [10] W. Huang et al., “Grounded Decoding: Guiding text generation with grounded models for robot control,” arXiv:2303.00855, 2023.
    [11] W. Huang et al., “Language models as zero-shot planners: Extracting actionable knowledge for embodied agents,” in Int. Conf. on Machine Learning (ICML), pp. 9118–9147, 2022.
    [12] I. Singh et al., “ProgPrompt: Generating situated robot task plans using large language models,” in IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 11523–11530, 2023.
    [13] S. Vemprala et al., “ChatGPT for robotics: Design principles and model abilities,” Microsoft Auton. Syst. Robot. Res, vol. 2, p. 20, 2023.
    [14] A. Brohan et al., “RT-2: Vision-language-action models transfer web knowledge to robotic control,” arXiv:2307.15818, 2023.
    [15] D. Driess et al., “PaLM-E: An embodied multimodal language model,” arXiv:2303.03378, 2023.
    [16] F. A. Oliehoek et al., A concise introduction to decentralized POMDPs, vol. 1. Springer, 2016.
    [17] I. Mordatch et al., “Emergence of grounded compositional language in multi-agent populations,” arXiv:1703.04908, 2017.
    [18] A. H. Mong-ying and V. Kumar, “Pattern generation with multiple robots,” in IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 2442–2447, 2006.
    [19] C. C. Cheah et al., “Region-based shape control for a swarm of robots,” Automatica, vol. 45, no. 10, pp. 2406–2411, 2009.
    [20] M. A. Hsieh et al., “Decentralized controllers for shape generation with robotic swarms,” Robotica, vol. 26, no. 5, pp. 691–701, 2008.
    [21] H. Wang and M. Rubenstein, “Generating goal configurations for scalable shape formation in robotic swarms,” in Distributed Autonomous Robotic Systems: 15th Int. Symposium, pp. 1–15, Springer, 2022.
    [22] M. Rubenstein and W.-M. Shen, “Scalable self-assembly and self-repair in a collective of robots,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 1484–1489, 2009.
    [23] H. Wang and M. Rubenstein, “Shape formation in homogeneous swarms using local task swapping,” IEEE Transactions on Robotics, vol. 36, no. 3, pp. 597–612, 2020.
    [24] M. Alhafnawi et al., “Self-organised saliency detection and representation in robot swarms,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1487–1494, 2021.
    [25] G. Sun et al., “Mean-shift exploration in shape assembly of robot swarms,” Nature Communications, vol. 14, no. 1, p. 3476, 2023.
    [26] M. Rubenstein et al., “Programmable self-assembly in a thousand-robot swarm,” Science, vol. 345, no. 6198, pp. 795–799, 2014.
    [27] J. Wang et al., “Pattern-RL: Multi-robot cooperative pattern formation via deep reinforcement learning,” in IEEE Int. Conf. On Machine Learning And Applications (ICMLA), pp. 210–215, 2019.
    [28] E. A. O. Diallo and T. Sugawara, “Multi-agent pattern formation: a distributed model-free deep reinforcement learning approach,” in Int. Joint Conf. on Neural Networks (IJCNN), pp. 1–8, 2020.
    [29] P. Rezeck and L. Chaimowicz, “Chemistry-inspired pattern formation with robotic swarms,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9137–9144, 2022.
    [30] P. Sadhukhan and R. R. Selmic, “Multi-agent formation control with obstacle avoidance using proximal policy optimization,” in IEEE Int. Conf. on Systems, Man, and Cybernetics (SMC), pp. 2694–2699, 2021.
    [31] A. Vaswani et al., “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
    [32] J. Devlin et al., “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv:1810.04805, 2018.
    [33] T. Brown et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
    [34] J. Achiam et al., “GPT-4 technical report,” arXiv:2303.08774, 2023.
    [35] A. Q. Jiang et al., “Mixtral of experts,” arXiv:2401.04088, 2024.
    [36] H. Touvron et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv:2307.09288, 2023.
    [37] T. Rashid et al., “Monotonic value function factorisation for deep multi-agent reinforcement learning,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 7234–7284, 2020.
    [38] T. P. Lillicrap et al., “Continuous control with deep reinforcement learning,” arXiv:1509.02971, 2015.
    [39] I.-J. Liu et al., “PIC: Permutation invariant critic for multi-agent deep reinforcement learning,” in Conf. on Robot Learning (CoRL), 2020.
    [40] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv:1609.02907, 2016.
    [41] J. Wei et al., “Finetuned language models are zero-shot learners,” arXiv:2109.01652, 2021.
    [42] S. Ichihashi et al., “Swarm body: Embodied swarm robots,” arXiv:2402.15830, 2024.
    [43] A. Radford et al., “Learning transferable visual models from natural language supervision,” in Int. Conf. on Machine Learning (ICML), pp. 8748–8763, 2021.

    QR CODE