在機器人運動規劃任務中學習多樣化且高效的目標可達策略

簡易檢索 / 詳目顯示

回結果列表

研究生：	姚瀚程 Yao, Han-Cheng
論文名稱：	在機器人運動規劃任務中學習多樣化且高效的目標可達策略 Learning Diverse and Efficient Goal-reaching Policies for Motion Planning
指導教授：	金仲達 King, Chung-Ta
口試委員:	曾國師 Tseng, Kuo-Shih 劉靖家 Liou, Jing-Jia
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2022
畢業學年度：	111
語文別：	英文
論文頁數：	34
中文關鍵詞：	強化學習、機器人運動規劃、多樣化
外文關鍵詞：	Reinforcement Learning, Robotic Motion Planning, Diversity
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在機器人運動規劃任務中，提供多種不同的軌跡來引導機器人到達給定目標對於其對環境變化的適應能力是很重要的。近期在強化學習領域中的一些研究已提出了可以學習多樣化策略的方法，並在運動規劃任務展示了好的學習成果。這些方法使用了由兩部分獎勵組成的獎勵函數，分別為 goal-reaching reward 以及 diversity reward。由於 distance reward 能提供能在訓練過程中提供豐富的獎勵資訊驅使 agent 前往任務目標，因此經常被用來當作 goal-reaching reward 的部分。但 distance reward 會隨著 agent 跟目標之間的距離不同而有所變化，因此其在跟 diversity reward 結合時，可能會與後者相斥，影響學習表現。在這篇論文中，我們提出使用二元獎勵 (binary reward) 來當作 goal-reaching reward。Binary reward 的穩定及可預測性使我們可以協調 goal-reaching reward 和 diversity reward。我們進而在方法中加入 Hindsight Experience Replay (HER) 技巧來避免稀疏獎勵問題。實驗結果顯示我們的方法可以在運動規劃任務中學習高效的目標可達策略，與先前的多樣化策略學習方法相比，所學到的策略有更高的多樣性及對環境變化的穩健性。

In robot motion planning, providing multiple diverse trajectories to guide the robot to reach a given target is important for its resilience to environmental changes. Recent advances in reinforcement learning (RL) have shown promising results in enabling the agents to learn diverse trajectories in motion planning. The majority of such works employed a two-part reward system consisting of a goal-reaching reward and a diversity reward. For the former, distance rewards are normally adopted for their rich signals to guide the agents closer to the target. The problem is that distance rewards change constantly as the agents progress towards the target and, when combined with the diversity rewards, may conflict with the latter, causing poor learning results. In this paper, we propose to use binary rewards as the goal-reaching rewards. Binary rewards are static and predictable, allowing the diversity rewards to remain stable. The sparsity of binary rewards is resolved with Hindsight Experience Replay (HER). Experimental results show that the proposed method enables the agents to learn efficient goal-reaching policies that are more diverse and robust to environmental changes compared to the prior diverse-policies-learning approaches.

Acknowledgements
摘要 i
Abstract ii
Introduction 1
Related Work 5
1 Finding Diverse Solutions in Robot Motion Planning . . . . . . . . . . . . . . 5
2 Learning Policies with Diverse Behaviors in RL . . . . . . . . . . . . . . . . . 6
Preliminaries 9
1 Diversity Is All You Need (DIAYN) . . . . . . . . . . . . . . . . . . . . . . . 9
2 Hindsight Experience Replay (HER) . . . . . . . . . . . . . . . . . . . . . . . 10
3 Formulating Motion Planning as Conditioned MDP . . . . . . . . . . . . . . . 11
Method 13
1 Goal-reaching Reward Function . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Diversity Reward Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Incorporating HER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Training Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Experiments 21
1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1 Diversity and Functionality Evaluation . . . . . . . . . . . . . . . . . 24
2.2 Few-shot Robustness Evaluation . . . . . . . . . . . . . . . . . . . . . 26
Conclusion and Future Work 29
References 31
A Experiment Details and Additional Results 33
A.1 Hyperparameters for Training . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A.2 Additional Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
                                

References
[1] C. Voss, M. Moll, and L. E. Kavraki, “A heuristic approach to finding diverse short paths,”
International Conference on Robotics and Automation (ICRA), 2015.
[2] A. Upadhyay, B. Goldfarb, and C. Ekenna, “A topological approach to finding coarsely
diverse paths,” International Conference on Intelligent Robots and Systems (IROS), 2021.
[3] V. Vonásek, R. Pěnička, and B. Kozlíková, “Computing multiple guiding paths for
sampling-based motion planning,” International Conference on Advanced Robotics
(ICAR), 2019.
[4] V. Vonasek and M. Saska, “Increasing diversity of solutions in sampling-based path planning,” Proceedings of the 2018 4th International Conference on Robotics and Artificial
Intelligence, 2018.
[5] Y. Lee, J. Yang, and J. J. Lim, “Learning to coordinate manipulation skills via skill behavior diversification,” International conference on learning representations, 2019.
[6] S. Kumar, A. Kumar, S. Levine, and C. Finn, “One solution is not all you need: Fewshot extrapolation via structured maxent rl,” Advances in Neural Information Processing
Systems, 2020.
[7] H. Ha, J. Xu, and S. Song, “Learning a decentralized multi-arm motion planner,”
arXiv:2011.02608, 2020.
[8] M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew,
J. Tobin, O. Pieter Abbeel, and W. Zaremba, “Hindsight experience replay,” Advances in
neural information processing systems, 2017.
[9] S. M. LaValle, “Rapidly-exploring random trees: A new tool for path planning,” Technical
report 98-11, 1998.
[10] L. E. Kavraki, P. Svestka, J.-C. Latombe, and M. H. Overmars, “Probabilistic roadmaps for
path planning in high-dimensional configuration spaces,” IEEE transactions on Robotics
and Automation, 1996.
[11] C. Florensa, Y. Duan, and P. Abbeel, “Stochastic neural networks for hierarchical reinforcement learning,” arXiv:1704.03012, 2017.
[12] Z.-W. Hong, T.-Y. Shann, S.-Y. Su, Y.-H. Chang, T.-J. Fu, and C.-Y. Lee, “Diversity-driven
exploration strategy for deep reinforcement learning,” Advances in neural information
processing systems, 2018.
[13] K. Hausman, J. T. Springenberg, Z. Wang, N. Heess, and M. Riedmiller, “Learning an embedding space for transferable robot skills,” International Conference on Learning Representations, 2018.
[14] S. Mohamed and D. Jimenez Rezende, “Variational information maximisation for intrinsically motivated reinforcement learning,” Advances in neural information processing systems, 2015.
[15] B. Eysenbach, A. Gupta, J. Ibarz, and S. Levine, “Diversity is all you need: Learning
skills without a reward function,” International Conference on Learning Representations,
2018.
[16] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum
entropy deep reinforcement learning with a stochastic actor,” International conference on
machine learning, 2018.
[17] T. Schaul, D. Horgan, K. Gregor, and D. Silver, “Universal value function approximators,”
International conference on machine learning, 2015.
[18] O. Michel, “Cyberbotics ltd. webots™: professional mobile robot simulation,” International Journal of Advanced Robotic Systems, 2004.

簡易檢索 / 詳目顯示

相關論文