基於強化學習建立機械手臂之速度規劃｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃浩軒 Huang, Hao-Hsuan
論文名稱：	基於強化學習建立機械手臂之速度規劃 The Robotic Arm Velocity Planning Based on Reinforcement Learning
指導教授：	蔡宏營 Tsai, Hung-Yin
口試委員:	丁川康 Ting, Chuan-Kang 徐秋田 Hus, Chin-Tien
學位類別：	碩士 Master
系所名稱：	工學院 - 動力機械工程學系 Department of Power Mechanical Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	中文
論文頁數：	88
中文關鍵詞：	強化學習、速度曲線規劃、機械手臂
外文關鍵詞：	Reinforcement learning, Velocity planning, Robotic arm
相關次數：	點閱：5 下載：2
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著工業4.0的發展，機械手臂扮演著舉足輕重的角色。在機械手臂速度規劃上，良好的速度規劃能有效的降低手臂的移動時間並提高移動過程的穩定度，然而在目前機械手臂速度規劃上無法納入動態因素進行考慮，導致精度上有所下降。為了使手臂能在動態因素的影響下達成目標，本研究在模擬系統下建立一套以人工智慧開發之機械手臂速度規劃模型，此模型考慮機械手臂動態因素影響，能有效改善機械手臂表現。
本研究分為三個部分進行：第一部分為模擬環境建置，本研究以V-rep作為模擬環境，其中為了貼近實際機械手臂運動情形於模擬環境下引入Vortex物理引擎，此引擎將物體間的摩擦力、運動學和慣性作用等因素進行考慮，而在模擬環境中選用IRB140六軸多功能機械手臂做為驗證模型；第二部分為編寫人工智慧，於Python環境編寫強化學習演算法中的深度確定性策略梯度方法（Deep deterministic policy gradients, DDPG），並透過建立V-rep與Python的連接進行資料之間的傳遞，使網路能有效地進行學習；第三部分為獎勵函數設定，針對手臂移動過程的狀態進行獎懲，並將獎懲函數區分為位置獎懲、速度獎懲以及穩定度獎懲，來使手臂有效到達精度範圍內，且加速學習過程的收斂時間。
本研究建立之速度規劃系統能夠設定不同客製化之條件如加工精度與轉動角度等進行學習，與傳統速度規劃相比，訓練後的速度規劃策略之手臂移動時間僅差約0.03秒，而移動誤差卻降低約0.05度，其訓練時間約1小時即可得到適當的速度規劃策略，相信亦符合工業界之訓練時間成本。由於在策略上增加動態因素影響之考量，因此機械手臂在運動上有較好的性能表現，未來可藉由將策略導入實際機械手臂證明此速度規劃系統之可行性。

With the development of Industry 4.0, the robotic arm plays a vital role in the industry. In the case of velocity planning of the robotic arm, the better design of the planning can effectively reduce the duration and improve the stability of the movement. However, the dynamic factors cannot be considered in traditional velocity planning of the robotic arm. Therefore, the precision of the robotic arm will be decreased. In order to achieve the target position under the influence of dynamic factors and improve the performance of the robotic arm effectively, the study established a robotic arm velocity planning model developed by artificial intelligence in the environment of the simulation system which considered the dynamic factors of the robotic arm.
The study can be divided into three parts. First is the construction of the simulation environment. In this study, V-rep was used as the simulation environment. In order to be close to the real mechanical environment, the simulation software was equipped with a Vortex physics engine. The simulation software takes into account the friction, kinematics, and inertia during the movement. The IRB 140 six axes multipurpose industrial robot was selected as a verification model in the simulation environment. Second is to compile artificial intelligence. This model was developed in Python environment by Deep Deterministic Policy Gradients (DDPG), which is one of reinforcement learning methods. By establishing the connection between V-rep and Python, the model training could be completed efficiently. The last part is adjusting the appropriate reward function. The reward function was designed based on the state of the robotic arm during the simulation. The reward function was divided into position reward, velocity reward, and stability reward. Accordingly, the robotic arm could effectively reach the target position and accelerate the convergence time of the learning process.
The velocity planning system which was established in this study can set different customized conditions such as machining accuracy and rotation angle for learning. Compared with the traditional velocity planning, the duration of the proposed velocity planning strategy was increased by about 0.03 seconds, while the error was reduced by about 0.05 degrees. In addition, the proposed velocity planning strategy could be obtained after just one hour of training. It was expected that the average training time can meet the demand for the time cost of the industry. On account of the dynamic factors of the robotic arm was considered in the proposed strategy, the robotic arm will have better performance in motion. In the future, the feasibility of the velocity planning system can be validated by introducing the strategy into the actual robotic arm system.

摘要    I
ABSTRACT    II
致謝    IV
目錄    VII
圖目錄    X
表目錄    XVI
第一章 緒論    1
1 前言    1
2 研究動機    1
第二章 文獻回顧    4
1 順向運動學與逆向運動學    4
2 運動規劃    7
2.1 S曲線模型    7
2.2 非對稱模型    13
2.3 學習模型    14
3 神經網路    16
4 強化學習    17
4.1 強化學習簡介    17
4.2 強化學習連續域控制    21
4.3 強化學習應用於機械領域    24
第三章 研究方法    28
1 模擬環境建置    30
1.1 機械手臂    30
1.2 物理引擎    31
1.3 模擬環境設定    32
2 系統整合    35
3 傳統速度規劃    39
4 強化學習之速度規劃    41
4.1 強化學習模型    44
4.2 Ornstein-Uhlenbeck process    48
4.3 獎懲函數    49
第四章 研究結果與討論    51
1 實驗參數設定    51
2 傳統速度規劃    53
3 獎勵函數訂定    56
3.1 稀疏獎勵函數    56
3.2 本研究之獎勵函數    63
4 速度規劃結果    67
4.1 無負重之速度規劃    67
4.2 有負重之速度規劃    70
4.3 強化學習之動作輸出    73
5 訓練過程討論    75
5.1 收斂結果    75
5.2 不同層數討論    76
5.3 運算時間統計    79
第五章 結論與未來展望    80
1 本研究之貢獻    80
2 未來展望    82
參考文獻    84

                                

[1] Z. Bingul, H. Ertunc, and C. Oysu, "Applying neural network to inverse kinematic problem for 6R robot manipulator with offset wrist," 7th International Conference on Adaptive and Natural Computing Algorithms, pp. 112-115, 2005.
[2] A. V. Duka, "Neural network based inverse kinematics solution for trajectory tracking of a robotic arm," Procedia Technology, vol. 12, pp. 20-27, 2014.
[3] A. R. Almusawi, L. C. Dülger, and S. Kapucu, "A new artificial neural network approach in solving inverse kinematics of robotic arm (denso vp6242)," Computational intelligence and neuroscience, vol. 2016, pp. 1-10, 2016.
[4] B. Tondu and S. A. Bazaz, "The three-cubic method: an optimal online robot joint trajectory generator under velocity, acceleration, and wandering constraints," The International Journal of Robotics Research, vol. 18, no. 9, pp. 893-901, 1999.
[5] S. Macfarlane and E. A. Croft, "Jerk-bounded manipulator trajectory planning: design for real-time applications," IEEE Transactions on Robotics and Automation, vol. 19, no. 1, pp. 42-52, 2003.
[6] J. R. G. Martínez, J. R. Reséndiz, M. Á. M. Prado, and E. E. C. Miguel, "Assessment of jerk performance s-curve and trapezoidal velocity profiles," 2017 XIII International Engineering Congress, pp. 1-7, 2017.
[7] H. Liu, X. Lai, and W. Wu, "Time-optimal and jerk-continuous trajectory planning for robot manipulators with kinematic constraints," Robotics and Computer-Integrated Manufacturing, vol. 29, no. 2, pp. 309-317, 2013.
[8] S. Liu, "An on-line reference-trajectory generator for smooth motion of impulse-controlled industrial manipulators," 7th International Workshop on Advanced Motion Control. Proceedings pp. 365-370, 2002.

[9] M. Boryga and A. Graboś, "Planning of manipulator motion trajectory with higher-degree polynomials use," Mechanism and machine theory, vol. 44, no. 7, pp. 1400-1419, 2009.
[10] H. Li, Z. Gong, W. Lin, and T. Lippa, "Motion profile planning for reduced jerk and vibration residuals," SIMTech technical reports, vol. 8, no. 1, pp. 32-37, 2007.
[11] A. Y. Lee and Y. Choi, "Smooth trajectory planning methods using physical limits," Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, vol. 229, no. 12, pp. 2127-2143, 2015.
[12] Y. Fang, J. Hu, W. Liu, Q. Shao, J. Qi, and Y. Peng, "Smooth and time-optimal S-curve trajectory planning for automated robots and machines," Mechanism and Machine Theory, vol. 137, pp. 127-153, 2019.
[13] C. W. Ha, K. H. Rew, and K. S. Kim, "A complete solution to asymmetric S-curve motion profile: Theory & experiments," 2008 International Conference on Control, Automation and Systems, pp. 2845-2849, 2008.
[14] F. Zou, D. Qu, and F. Xu, "Asymmetric s-curve trajectory planning for robot point-to-point motion," 2009 IEEE International Conference on Robotics and Biomimetics, pp. 2172-2176, 2009.
[15] K. H. Rew and K. S. Kim, "A closed-form solution to asymmetric motion profile allowing acceleration manipulation," IEEE Transactions on Industrial Electronics, vol. 57, no. 7, pp. 2499-2506, 2010.
[16] M. Giftthaler, F. Farshidian, T. Sandy, L. Stadelmann, and J. Buchli, "Efficient kinematic planning for mobile manipulators with non-holonomic constraints using optimal control," 2017 IEEE International Conference on Robotics and Automation, pp. 3411-3417, 2017.
[17] M. Lepetič, G. Klančar, I. Škrjanc, D. Matko, and B. Potočnik, "Time optimal path planning considering acceleration limits," Robotics and Autonomous Systems, vol. 45, no. 3-4, pp. 199-210, 2003.
[18] J. Huang, P. Hu, K. Wu, and M. Zeng, "Optimal time-jerk trajectory planning for industrial robots," Mechanism and Machine Theory, vol. 121, pp. 530-544, 2018.
[19] S. Lu, J. Zhao, L. Jiang, and H. Liu, "Solving the time-jerk optimal trajectory planning problem of a robot using augmented lagrange constrained particle swarm optimization," Mathematical Problems in Engineering, vol. 2017, pp. 1-10, 2017.
[20] A. Gasparetto and V. Zanotto, "Optimal trajectory planning for industrial robots," Advances in Engineering Software, vol. 41, no. 4, pp. 548-556, 2010.
[21] A. Machmudah, S. Parman, A. Zainuddin, and S. Chacko, "Polynomial joint angle arm robot motion planning in complex geometrical obstacles," Applied Soft Computing, vol. 13, no. 2, pp. 1099-1109, 2013.
[22] W. S. McCulloch and W. Pitts, "A logical calculus of the ideas immanent in nervous activity," The bulletin of mathematical biophysics, vol. 5, no. 4, pp. 115-133, 1943.
[23] F. Rosenblatt, "The perceptron: a probabilistic model for information storage and organization in the brain," Psychological Review, vol. 65, no. 6, pp. 386-408, 1958.
[24] P. J. Werbos, "Backpropagation through time: what it does and how to do it," Proceedings of the IEEE, vol. 78, no. 10, pp. 1550-1560, 1990.
[25] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, "A convolutional neural network cascade for face detection," The IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325-5334, 2015.
[26] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," Advances in neural information processing systems, pp. 91-99, 2015.
[27] G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition," IEEE Transactions on Audio Speech and Language Processing, vol. 20, no. 1, pp. 30-42, 2012.
[28] T. Wuest, D. Weimer, C. Irgens, and K.-D. Thoben, "Machine learning in manufacturing: advantages, challenges, and applications," Production & Manufacturing Research, vol. 4, no. 1, pp. 23-45, 2016.
[29] 郭憲、方勇純，最新人工智慧應用:用強化學習快速上手 AI， 2018。
[30] T. P. Lillicrap, J. J. Hunt, and A. Pritzel, "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015.
[31] V. Mnih, K. Kavukcuoglu, and D. Silver, "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013.
[32] H. Van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double q-learning," Thirtieth AAAI Conference on Artificial Intelligence, pp. 2094-2100, 2016.
[33] G. E. Uhlenbeck and L. S. Ornstein, "On the theory of the Brownian motion," Physical review, vol. 36, no. 5, p. 823, 1930.
[34] R. Yu, Z. Shi, C. Huang, T. Li, and Q. Ma, "Deep reinforcement learning based optimal trajectory tracking control of autonomous underwater vehicle," 2017 36th Chinese Control Conference, pp. 4958-4965, 2017.
[35] S. J. Kim, H. S. Kim, and D. J. Kang, "Vibration Control of a Vehicle Active Suspension System Using a DDPG Algorithm," 2018 18th International Conference on Control, Automation and Systems, pp. 1654-1656, 2018.
[36] G. Hartman, Z. Shiller, and A. Azaria, "Deep Reinforcement Learning for Time Optimal Velocity Control using Prior Knowledge," arXiv preprint arXiv:1811.11615, 2018.
[37] Y. Zhu, Z. Wang, and J. Merel, "Reinforcement and imitation learning for diverse visuomotor skills," arXiv preprint arXiv:1802.09564, 2018.
[38] E. Johns, S. Leutenegger, and A. J. Davison, "Deep learning a grasp function for grasping under gripper pose uncertainty," 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems pp. 4461-4468, 2016.
[39] S. James and E. Johns, "3d simulation for robot arm control with deep q-learning," arXiv preprint arXiv:1609.03759, 2016.
[40] M. Večerík, T. Hester, and J. Scholz, "Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards," arXiv preprint arXiv:1707.08817, 2017.
[41] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized experience replay," arXiv preprint arXiv:1511.05952, 2015.
[42] T. Inoue, G. De Magistris, A. Munawar, T. Yokoya, and R. Tachibana, "Deep reinforcement learning for high precision assembly tasks," 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 819-825, 2017.
[43] ABB IRB 140 product specification. (October, 2018). Available: https://search-ext.abb.com/library/Download.aspx?DocumentID=3HAC041346-001&LanguageCode=en&DocumentPartId=&Action=Launch
[44] 李冠霖，機械手臂之關節間隙評估與運行精度提升之方法，碩士論文，國立臺灣大學機械工程學研究所，台北，台灣，2016。
[45] M. Plappert, R. Houthooft, and P. Dhariwal, "Parameter space noise for exploration," arXiv preprint arXiv:1706.01905, 2017.

簡易檢索 / 詳目顯示

相關論文