語言引導之集群機器人圖樣形成與多智能強化學習

簡易檢索 / 詳目顯示

回結果列表

研究生：	劉緒紳 Liu, Hsu-Shen
論文名稱：	語言引導之集群機器人圖樣形成與多智能強化學習 Language-Guided Pattern Formation for Swarm Robotics with Multi-Agent Reinforcement Learning
指導教授：	李濬屹 Lee, Chun-Yi
口試委員:	孫民 Sun, Min 楊元福 Yang, Yuan-Fu
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2024
畢業學年度：	112
語文別：	英文
論文頁數：	32
中文關鍵詞：	大型語言模型、智慧型機器人、集群機器人、強化學習、多智能強化學習、圖樣形成
外文關鍵詞：	Large Language Model, Intelligent Robotics, Swarm Robotics, Reinforcement Learning, Multi-Agent Reinforcement Learning, Pattern Formation
相關次數：	點閱：49 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

這篇論文探討如何利用大型語言模型（LLMs）所具有的廣泛知識以解決集群機器人系統的圖樣形成問題，並提出了一個名為 LGPF（語言引導圖樣形成）的新框架來達成這個任務。此框架將圖樣形成分解為兩個關鍵部分：圖樣合成和集群機器人控制。對於前者，本研究利用 LLM 卓越的少樣本泛化能力（Few-Shot Generalibility），將高層次的自然語言描述轉化為所需的空間圖樣坐標。這種方法克服了以往在表示和設計複雜圖樣上的限制。該框架進一步採用基於集中訓練與分散執行（CTDE）的多智能強化學習（MARL）方法來控制集群機器人形成指定圖樣，同時避免互相碰撞。基於 CTDE 的 MARL 演算法所學習到的分散策略，能夠在部分可觀察（Partially Observable）環境中，考量機器人之間如何在缺乏直接通訊的情況下進行協調。為了驗證我們框架的有效性，我們在模擬器和現實環境中進行了大量實驗。這些實驗驗證了 LGPF 能準確且安全地形成使用者所指定的多樣化圖樣。

This paper explores leveraging the vast knowledge encoded in Large Language Models (LLMs) to tackle pattern formation challenges for swarm robotics systems. A new framework, named LGPF (Language-Guided Pattern Formation), is proposed to address these challenges. The framework breaks down the pattern formation into two key components: pattern synthesis and swarm robotics control. For the former, this study utilizes the exceptional few-shot generalizability of LLMs to translate high-level natural language descriptions into the desired spatial pattern coordinates. This approach allows for overcoming previous limitations in representing and designing complex patterns. The framework further employs a centralized training with cecentralized execution (CTDE) based multi-agent reinforcement learning (MARL) approach to control the swarm robots in forming the specified pattern while avoiding collisions. The decentralized policies learned with the CTDE-based MARL algorithm consider coordination between robots without direct communication under a partially observable setup. To validate the effectiveness of our framework, we perform extensive experiments in both simulation
and real-world environments. These experiments validate LGPF’s effectiveness in accurately and safely forming diverse user-specified patterns.

Abstract (Chinese) --------------------------- I
Acknowledgements (Chinese) --------------------------- II
Abstract --------------------------- III
Acknowledgements --------------------------- IV
Contents --------------------------- V
List of Figures --------------------------- VII
List of Tables --------------------------- VIII

1 Introduction --------------------------- 1
2 Related Works --------------------------- 6
3 Preliminary --------------------------- 9
4 Methodology --------------------------- 13
5 Experimental Results --------------------------- 19
6 Conclusions --------------------------- 27
Bibliography --------------------------- 28
                                

[1] C. C. Cossette et al., “Optimal multi-robot formations for relative pose estimation using range measurements,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 2431–2437, 2022.
[2] J. Alonso-Mora et al., “Multi-robot system for artistic pattern formation,” in IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 4512–4517, 2011.
[3] H. W. Kuhn, “The hungarian method for the assignment problem,” Naval research logistics quarterly, vol. 2, no. 1-2, pp. 83–97, 1955.
[4] J. Van Den Berg et al., “Reciprocal n-body collision avoidance,” in Robotics Research: The 14th Int. Symp. ISRR, pp. 3–19, 2011.
[5] J. Alonso-Mora et al., “Optimal reciprocal collision avoidance for multiple non-holonomic robots,” in Distributed autonomous robotic systems: The 10th Int. symposium, pp. 203–216, Springer, 2013.
[6] J. Gielis et al., “A critical review of communications in multi-robot systems,” Current Robotics Reports, vol. 3, no. 4, pp. 213–225, 2022.
[7] F. A. Oliehoek et al., “Optimal and approximate Q-value functions for decentralized pomdps,” Journal of Artificial Intelligence Research, vol. 32, pp. 289-353, 2008.
[8] R. Lowe et al., “Multi-agent actor-critic for mixed cooperative-competitive environments,” Advances in Neural Information Processing Systems, vol. 30, 2017.
[9] M. Ahn et al., “Do As I Can, Not As I Say: Grounding language in robotic affordances,” arXiv:2204.01691, 2022.
[10] W. Huang et al., “Grounded Decoding: Guiding text generation with grounded models for robot control,” arXiv:2303.00855, 2023.
[11] W. Huang et al., “Language models as zero-shot planners: Extracting actionable knowledge for embodied agents,” in Int. Conf. on Machine Learning (ICML), pp. 9118–9147, 2022.
[12] I. Singh et al., “ProgPrompt: Generating situated robot task plans using large language models,” in IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 11523–11530, 2023.
[13] S. Vemprala et al., “ChatGPT for robotics: Design principles and model abilities,” Microsoft Auton. Syst. Robot. Res, vol. 2, p. 20, 2023.
[14] A. Brohan et al., “RT-2: Vision-language-action models transfer web knowledge to robotic control,” arXiv:2307.15818, 2023.
[15] D. Driess et al., “PaLM-E: An embodied multimodal language model,” arXiv:2303.03378, 2023.
[16] F. A. Oliehoek et al., A concise introduction to decentralized POMDPs, vol. 1. Springer, 2016.
[17] I. Mordatch et al., “Emergence of grounded compositional language in multi-agent populations,” arXiv:1703.04908, 2017.
[18] A. H. Mong-ying and V. Kumar, “Pattern generation with multiple robots,” in IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 2442–2447, 2006.
[19] C. C. Cheah et al., “Region-based shape control for a swarm of robots,” Automatica, vol. 45, no. 10, pp. 2406–2411, 2009.
[20] M. A. Hsieh et al., “Decentralized controllers for shape generation with robotic swarms,” Robotica, vol. 26, no. 5, pp. 691–701, 2008.
[21] H. Wang and M. Rubenstein, “Generating goal configurations for scalable shape formation in robotic swarms,” in Distributed Autonomous Robotic Systems: 15th Int. Symposium, pp. 1–15, Springer, 2022.
[22] M. Rubenstein and W.-M. Shen, “Scalable self-assembly and self-repair in a collective of robots,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 1484–1489, 2009.
[23] H. Wang and M. Rubenstein, “Shape formation in homogeneous swarms using local task swapping,” IEEE Transactions on Robotics, vol. 36, no. 3, pp. 597–612, 2020.
[24] M. Alhafnawi et al., “Self-organised saliency detection and representation in robot swarms,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1487–1494, 2021.
[25] G. Sun et al., “Mean-shift exploration in shape assembly of robot swarms,” Nature Communications, vol. 14, no. 1, p. 3476, 2023.
[26] M. Rubenstein et al., “Programmable self-assembly in a thousand-robot swarm,” Science, vol. 345, no. 6198, pp. 795–799, 2014.
[27] J. Wang et al., “Pattern-RL: Multi-robot cooperative pattern formation via deep reinforcement learning,” in IEEE Int. Conf. On Machine Learning And Applications (ICMLA), pp. 210–215, 2019.
[28] E. A. O. Diallo and T. Sugawara, “Multi-agent pattern formation: a distributed model-free deep reinforcement learning approach,” in Int. Joint Conf. on Neural Networks (IJCNN), pp. 1–8, 2020.
[29] P. Rezeck and L. Chaimowicz, “Chemistry-inspired pattern formation with robotic swarms,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9137–9144, 2022.
[30] P. Sadhukhan and R. R. Selmic, “Multi-agent formation control with obstacle avoidance using proximal policy optimization,” in IEEE Int. Conf. on Systems, Man, and Cybernetics (SMC), pp. 2694–2699, 2021.
[31] A. Vaswani et al., “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
[32] J. Devlin et al., “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv:1810.04805, 2018.
[33] T. Brown et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
[34] J. Achiam et al., “GPT-4 technical report,” arXiv:2303.08774, 2023.
[35] A. Q. Jiang et al., “Mixtral of experts,” arXiv:2401.04088, 2024.
[36] H. Touvron et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv:2307.09288, 2023.
[37] T. Rashid et al., “Monotonic value function factorisation for deep multi-agent reinforcement learning,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 7234–7284, 2020.
[38] T. P. Lillicrap et al., “Continuous control with deep reinforcement learning,” arXiv:1509.02971, 2015.
[39] I.-J. Liu et al., “PIC: Permutation invariant critic for multi-agent deep reinforcement learning,” in Conf. on Robot Learning (CoRL), 2020.
[40] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv:1609.02907, 2016.
[41] J. Wei et al., “Finetuned language models are zero-shot learners,” arXiv:2109.01652, 2021.
[42] S. Ichihashi et al., “Swarm body: Embodied swarm robots,” arXiv:2402.15830, 2024.
[43] A. Radford et al., “Learning transferable visual models from natural language supervision,” in Int. Conf. on Machine Learning (ICML), pp. 8748–8763, 2021.

簡易檢索 / 詳目顯示

相關論文