簡易檢索 / 詳目顯示

研究生: 蕭景文
Hsiao, Ching-Wen
論文名稱: 基於深度強化學習之多機器人合作任務分配系統
Multi-Robot Cooperative Task Allocation System Based on Deep Reinforcement Learning
指導教授: 陳榮順
Chen, Rong-Shun
口試委員: 黃浚鋒
Huang, Chun-Feng
張禎元
Chang, Jen-Yuan
學位類別: 碩士
Master
系所名稱: 工學院 - 動力機械工程學系
Department of Power Mechanical Engineering
論文出版年: 2022
畢業學年度: 111
語文別: 中文
論文頁數: 81
中文關鍵詞: 多機器人系統深度強化學習機器人作業系統相機位置估測
外文關鍵詞: Multi-robot system, Deep reinforcement learning, ROS, Camera pose estimation
相關次數: 點閱:7下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現有的多機器人多任務分配方法通常為啟發式演算法,當機器人及任務數量增加時,啟發式演算法在限制計算時間時,其任務分配性能會大幅下降。本研究提出一種基於深度強化學習多機器人協作演算法,用以解決多機器人多任務分配問題。在任務執行的同時,除了考慮機器人之間的合作式任務分配,另引入「對手」的概念,在執行任務場域中提供干擾。為避免碰撞發生以及機器人前往已被對手拿取任務目標之處,本研究提出的任務分配演算法具備動態重新計算的功能,使機器人在得知目標任務消失後能夠重新計算分配任務。另外,本研究運用獎勵重塑及課程式學習等技術,在高難度任務以及任務目標消失的條件下,避免模型在訓練過程中不容易獲得獎勵,導致模型訓練失敗,無法有效進行任務分配。本研究所提出的方法,相較啟發式演算法,能夠更快速找到較優的任務分配解,並且在執行較複雜任務時,仍能有效分配任務。


    The existing methods of the multi-robot task allocation are mostly market-based methods and heuristic algorithms. When the scale of robots and tasks is increasingly heightened, under the limited computing time, the task allocation by heuristic algorithms will perform poorly. This research proposes a novel algorithm based on the deep reinforcement learning to solve the multi-robot task allocation for a group of robots to perform a number of different tasks within a specific field. Following the stage of the task allocation, the concept of "opponent" is also introduced to impede the activity of the cooperative robots during the task execution. To handle this problem, the task allocation which this work develops can dynamically reallocate the tasks for the overall MRS, if the cooperative robots lose their original tasks that are snatched by the opponent robots. Finally, the simulation results show that the proposed methodology can solve a superior task allocation with a higher speed than other heuristic algorithms, and can still effectively assign tasks when performing complex task types.

    摘要i Abstract ii 誌謝iii 圖目錄vi 表目錄ix 第一章緒論1 1.1 研究動機與目的. . . . . . . . . . . . . . . . . . . . . . . 1 1.2 文獻回顧. . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . . 7 第二章研究相關理論與問題描述9 2.1 研究相關理論. . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 強化學習. . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2 深度強化學習. . . . . . . . . . . . . . . . . . . . 13 2.2 問題描述. . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 問題建模. . . . . . . . . . . . . . . . . . . . . . . . . . . 17 第三章多機器人任務分配系統23 3.1 啟發式演算法應用於MRTA . . . . . . . . . . . . . . . . 23 3.1.1 基因演算法. . . . . . . . . . . . . . . . . . . . . 24 3.1.2 貪婪演算法. . . . . . . . . . . . . . . . . . . . . 25 3.1.3 隨機選擇. . . . . . . . . . . . . . . . . . . . . . . 27 3.2 應用強化學習於MRTA . . . . . . . . . . . . . . . . . . . 27 3.2.1 模型架構. . . . . . . . . . . . . . . . . . . . . . . 27 3.2.2 系統狀態與設置. . . . . . . . . . . . . . . . . . . 28 3.2.3 提高模型訓練速度及可行性. . . . . . . . . . . . 36 3.3 啟發式演算法與強化學習比較. . . . . . . . . . . . . . . 39 第四章模擬及實驗結果49 4.1 模擬結果. . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.1.1 單機器人環境. . . . . . . . . . . . . . . . . . . . 49 4.1.2 單機器人具對手機器人環境. . . . . . . . . . . . 52 4.1.3 多機器人環境. . . . . . . . . . . . . . . . . . . . 53 4.1.4 多機器人具對手機器人環境. . . . . . . . . . . . 57 4.2 實驗方法與結果. . . . . . . . . . . . . . . . . . . . . . . 60 4.2.1 實驗流程. . . . . . . . . . . . . . . . . . . . . . . 60 4.2.2 機器人系統設計. . . . . . . . . . . . . . . . . . . 60 4.2.3 全向底盤運動學模型. . . . . . . . . . . . . . . . 64 4.2.4 ArUco 辨識系統機器人定位. . . . . . . . . . . . 66 4.3 實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . 69 第五章結論與未來工作75 5.1 結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2 未來工作. . . . . . . . . . . . . . . . . . . . . . . . . . . 77 參考文獻79

    [1] V. Janthra. (2020) Robots efficiently sorting stock photo. [Online]. Available: https://www.istockphoto.com/photo/ robots-efficiently-sorting-gm1283421236-380851168?phrase=agv%20warehouse, accessed: 2022-08-22.
    [2] P. R. Wurman, R. D’Andrea, and M. Mountz, “Coordinating hundreds of cooperative, autonomous vehicles in warehouses,” AI Mag., vol. 29, pp. 9–20, 2008.
    [3] P. M. Kornatowski, A. Bhaskaran, G. M. Heitz, S. Mintchev, and D. Floreano, “Last-centimeter personal drone delivery: Field deployment
    and user interaction,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3813–3820, 2018.
    [4] Cai Luo, A. P. Espinosa, D. Pranantha, and A. De Gloria, “Multirobot search and rescue team,” 2011 IEEE International Symposium
    on Safety, Security, and Rescue Robotics, pp. 296–301, 2011.
    [5] B. P. Gerkey and M. J. Matarić, “A formal analysis and taxonomy of task allocation in multi-robot systems,” The International Journal of Robotics Research, vol. 23(9), pp. 939–954, 2004.
    [6] H. Ma and S. Koenig, “Optimal target assignment and path finding for teams of agents,” Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems, ser. AAMAS ’16, p. 1144–1152, Richland, SC, 2016.
    [7] C. Henkel, J. Abbenseth, and M. Toussaint, “An optimal algorithm to solve the combined task allocation and path finding problem,” 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4140–4146, 2019.
    [8] C. Landry, R. Henrion, D. Hömberg, M. Skutella, and W. Welz, “Task assignment, sequencing and path-planning in robotic welding cells,” 2013 18th International Conference on Methods Models in Automation Robotics (MMAR), pp. 252–257, 2013.
    [9] C. Liu and A. Kroll, “A centralized multi-robot task allocation for industrial plant inspection by using a* and genetic algorithms,” Artificial Intelligence and Soft Computing, pp. 466–474, 2012.
    [10] K. Jose and D. K. Pratihar, “Task allocation and collision-free path planning of centralized multi-robots system for industrial plant inspection using heuristic methods,” Robotics Auton. Syst., vol. 80, pp. 34–42, 2016.
    [11] M. Koes, I. Nourbakhsh, and K. Sycara, “Heterogeneous multirobot
    coordination with spatial and temporal constraints,” ser. AAAI’05, p.
    1292–1297, 2005.
    [12] E. A. Khamis A., Hussein A., “Multi-robot task allocation: A review of the state-of-the-art,” Cooperative Robots and Sensor Networks 2015, M.-d. D. J. Koubâa A., Ed. Cham: Springer, 2015, vol. 604.
    [13] J. Yu and S. LaValle, “Structure and intractability of optimal multirobot path planning on graphs,” AAAI, 2013.
    [14] G. Wagner and H. Choset, “M*: A complete multirobot path planning
    algorithm with performance bounds,” 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3260–3267,
    2011.
    [15] M. Goldenberg, A. Felner, R. Stern, G. Sharon, N. Sturtevant, R. C. Holte, and J. Schaeffer, “Enhanced partial expansion a*,” J. Artif. Int. Res., vol. 50, no. 1, p. 141–187, May 2014.
    [16] G. Sharon, R. Stern, A. Felner, and N. R. Sturtevant, “Conflict-based search for optimal multi-agent pathfinding,” Artificial Intelligence, vol. 219, pp. 40 – 66, 2015.
    [17] E. Boyarski, A. Felner, R. Stern, G. Sharon, O. Betzalel, D. Tolpin, and S. E. Shimony, “Icbs: The improved conflict-based search algorithm for multi-agent pathfinding,” SOCS, 2015.
    [18] B. P. Gerkey, , and M. J. Matarić, “Multi-robot task allocation: analyzing the complexity and optimality of key architectures,” 2003
    IEEE International Conference on Robotics and Automation (Cat. No.03CH37422), vol. 3, pp. 3862–3868 vol.3, 2003.
    [19] B. P. Gerkey and M. J. Matarić, “A formal analysis and taxonomy of task allocation in multi-robot systems,” The International Journal of Robotics Research, vol. 23, no. 9, pp. 939–954, 2004.
    [20] E. Nunes, M. Manner, H. Mitiche, and M. Gini, “A taxonomy for task allocation problems with temporal and ordering constraints,” Robotics and Autonomous Systems, vol. 90, pp. 55–70, 2017, special Issue on New Research Frontiers for Intelligent Autonomous Systems
    [21] A. Mosteo and L. Montano, “Simulated annealing for multi-robot
    hierarchical task allocation with flexible constraints and objective
    functions,” Workshop on Network Robot Systems: Toward Intelligent Robotic Systems Integrated with Environments, 01 2006.
    [22] T. Au, O. Ilghami, U. Kuter, J. W. Murdock, D. S. Nau, D. Wu, and F. Yaman, “SHOP2: an HTN planning system,” CoRR, vol. abs/1106.4869, 2011.
    [23] Y. Li, W. Zeng, H. Zhou, and R. Chen, “Research on dynamic emergency task allocation with mdp,” The 2nd International Conference on Information Science and Engineering, pp. 1452–1455, 2010.
    [24] G. M. Skaltsis, H.-S. Shin, and A. Tsourdos, “A survey of task allocation techniques in mas,” 2021 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 488–497, 2021.
    [25] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement
    learning: A survey,” Journal of artificial intelligence research, vol. 4, pp. 237–285, 1996.
    [26] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.
    [27] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” Proceedings of the AAAI conference on artificial intelligence, vol. 30, no. 1, 2016.
    [28] Eurobot –international students robotic contest. [Online]. Available: https://www.eurobot.org/
    [29] Age of bots –2022. [Online]. Available: https://www.eurobot.org/eurobot-contest/eurobot-2022/
    [30] D. Singh, M. K. Singh, T. Singh, and R. Prasad, “Genetic algorithm for solving multiple traveling salesmen problem using a new crossover and population generation,” Computación y Sistemas, vol. 22, 07 2018.

    QR CODE