簡易檢索 / 詳目顯示

研究生: 邱笠維
Chiu, Li-Wei
論文名稱: 應用強化學習於混成控制雙足機器人之步態規劃
Optimization of Walking Trajectory for Bipedal Robot under Hybrid Control using Reinforcement Learning
指導教授: 葉廷仁
Yeh, Ting-Jen
口試委員: 劉承賢
Liu, Cheng-Hsien
陳國聲
Chen, Kuo-Shen
學位類別: 碩士
Master
系所名稱: 工學院 - 動力機械工程學系
Department of Power Mechanical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 84
中文關鍵詞: 雙足機器人雙質量倒單擺零力矩點深度強化學習行走軌跡規劃
外文關鍵詞: Bipedal robot, Dual-mass inverted pendulum model, Zero moment point, Deep reinforcement learning, Walking pattern planning
相關次數: 點閱:6下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究旨在透過強化學習及階層化控制使雙足機器人穩定行走。階層化控制將機器人分為質心軌跡規劃與姿態控制。質心規劃根據零力矩點(zero moment point, ZMP)參考軌跡,由預觀控制(Preview Control)生成機器人質心軌跡;姿態控制混合運動學及動力學控制:運動學控制將機器人動態定錨為雙質量倒單擺系統(dual-mass inverted pendulum, DMIP),動力學控制利用踝關節之串聯彈性致動器(serial elastic actuator, SEA)進行力矩控制,使機器人質心跟隨規劃軌跡。強化學習作為預觀控制的上層架構,負責產生零力矩點參考軌跡。利用MATLAB Simscape Multibody建立物理模型進行訓練,將行走穩定度放入獎勵函數的設計,使強化學習對零力矩點參考軌跡進行最佳化,提升機器人行走的性能。本研究將此架構實現於實驗室開發之雙足機器人,利用模擬及實驗驗證此架構之性能表現。


    This thesis proposes a hierarchical control structure combined with reinforcement learning for bipedal robots. Hierarchical control structure consists of walking trajectory planner and low-level motion controller. Walking trajectory planner generates CoM trajectory based on predefined zero moment point (ZMP) trajectory. Low-level controller consists of kinematic controller and dynamic controller. Kinematic controller anchors robot dynamics as a dual-mass inverted pendulum (DMIP) system. Dynamic controller tracks CoM trajectory by controlling ankle torque driven by serial elastic actuator. Reinforcement learning generates ZMP trajectory. Training is implemented in MATLAB Simscape Multibody. By incorporating stability into reward function, RL optimizes parameters of ZMP trajectory to improve walking performance. The control structure is implemented on a bipedal robot built in-house. Simulations and experiments verify the robot’s performance and effectiveness of RL.

    摘要 i Abstract ii 致謝 iii 目錄 iv 圖目錄 vi 表目錄 ix 符號表 x 1. 緒論 1 1.1 研究動機與目的 1 1.2 文獻回顧 3 1.3 論文架構 6 2. 硬體架構 7 2.1 機構設計 7 2.2 機電架構 9 3. 軌跡規劃與階層控制 11 3.1 軌跡規劃 12 3.1.1 線性倒單擺模型(LIPM) 12 3.1.2 雙質量倒單擺模型(DMIP) 13 3.1.3 零力矩點參考軌跡及擺動腳軌跡 14 3.1.4 預觀控制(Preview Control) 16 3.1.5 軌跡規劃流程 20 3.2 運動學控制 21 3.2.1 順向運動學 21 3.2.2 運動學控制器設計 22 3.3 動力學控制 26 3.4 質心速度估測 29 4. 強化學習 33 4.1 強化學習架構 33 4.2 模型建立 39 4.3 強化學習之獎勵設計 41 4.4 目標狀態之設計 44 4.4.1 固定目標 44 4.4.2 離散模型 44 4.5 訓練成果 48 4.5.1 固定目標 48 4.5.2 離散模型 51 4.6 多步行走之模擬 54 5. 實驗結果 56 5.1 底層控制實驗 56 5.1.1 雙支撐Y方向控制器測試 56 5.1.2 雙支撐X方向控制器測試 57 5.2 離線行走實驗 58 5.3 強化學習行走實驗 62 5.3.1 實驗架構 62 5.3.2 通訊架構測試 62 5.3.3 線上與離線軌跡之穩定性分析 63 5.3.4 固定目標狀態之行走實驗 66 5.3.5 外部干擾實驗 68 5.3.6 離散模型之行走實驗 71 6. 結論與未來工作 74 6.1 結論 74 6.2 優缺點分析與比較 76 6.3 未來工作 78 7. 參考文獻 81

    [1] K. Kaneko, K. Harada, F. Kanehiro, G. Miyamori, and K. Akachi, "Humanoid robot HRP-3," in 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008: IEEE, pp. 2471-2478.
    [2] B.-K. Cho, S.-S. Park, and J.-h. Oh, "Controllers for running in the humanoid robot, HUBO," in 2009 9th IEEE-RAS International Conference on Humanoid Robots, 2009: IEEE, pp. 385-390.
    [3] B. Dynamics. "Atlas | Partners in Parkour." https://youtu.be/tF4DML7FIWk (accessed.
    [4] A. Robotics. "CASSIE." https://www.agilityrobotics.com/cassie (accessed.
    [5] K. Kim, P. Spieler, E.-S. Lupu, A. Ramezani, and S.-J. Chung, "A bipedal walking robot that can fly, slackline, and skateboard," Science Robotics, vol. 6, no. 59, p. eabf8136, 2021.
    [6] B. G. Katz, "A low cost modular actuator for dynamic robots," Massachusetts Institute of Technology, 2018.
    [7] M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V. Tsounis, J. Hwangbo, K. Bodie, P. Fankhauser, and M. Bloesch, "Anymal-a highly mobile and dynamic quadrupedal robot," in 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS), 2016: IEEE, pp. 38-44.
    [8] G. A. Pratt and M. M. Williamson, "Series elastic actuators," in Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots, 1995, vol. 1: IEEE, pp. 399-406.
    [9] S. Kajita, H. Hirukawa, K. Harada, and K. Yokoi, Introduction to humanoid robotics. Springer, 2014.
    [10] S. Kajita, F. Kanehiro, K. Kaneko, K. Yokoi, and H. Hirukawa, "The 3D linear inverted pendulum mode: A simple modeling for a biped walking pattern generation," in Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No. 01CH37180), 2001, vol. 1: IEEE, pp. 239-246.
    [11] J. H. Park and K. D. Kim, "Biped robot walking using gravity-compensated inverted pendulum mode and computed torque control," in Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No. 98CH36146), 1998, vol. 4: IEEE, pp. 3528-3533.
    [12] A. Albert and W. Gerth, "Analytic path planning algorithms for bipedal robots without a trunk," Journal of Intelligent and Robotic Systems, vol. 36, no. 2, pp. 109-127, 2003.
    [13] G. Ficht and S. Behnke, "Fast Whole-Body Motion Control of Humanoid Robots with Inertia Constraints," in 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020: IEEE, pp. 6597-6603.
    [14] J.-H. Kim, J. H. Choi, and B.-K. Cho, "Walking pattern generation for a biped walking robot using convolution sum," Advanced Robotics, vol. 25, no. 9-10, pp. 1115-1137, 2011.
    [15] A. Aboudonia, N. Scianca, D. De Simone, L. Lanari, and G. Oriolo, "Humanoid gait generation for walk-to locomotion using single-stage MPC," in 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), 2017: IEEE, pp. 178-183.
    [16] S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada, K. Yokoi, and H. Hirukawa, "Biped walking pattern generation by using preview control of zero-moment point," in 2003 IEEE International Conference on Robotics and Automation (Cat. No. 03CH37422), 2003, vol. 2: IEEE, pp. 1620-1626.
    [17] Z. Xie, G. Berseth, P. Clary, J. Hurst, and M. van de Panne, "Feedback control for cassie with deep reinforcement learning," in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018: IEEE, pp. 1241-1246.
    [18] Z. Li, X. Cheng, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath, "Reinforcement learning for robust parameterized locomotion control of bipedal robots," in 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021: IEEE, pp. 2811-2817.
    [19] C. Zhang, Q. Wu, L. Ma, and H. Su, "Adaptive Mimic: Deep Reinforcement Learning of Parameterized Bipedal Walking from Infeasible References," arXiv e-prints, p. arXiv: 2112.03735, 2021.
    [20] J.-L. Lin, K.-S. Hwang, W.-C. Jiang, and Y.-J. Chen, "Gait balance and acceleration of a biped robot based on Q-learning," IEEE access, vol. 4, pp. 2439-2449, 2016.
    [21] D. Rodriguez and S. Behnke, "DeepWalk: Omnidirectional bipedal gait by deep reinforcement learning," in 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021: IEEE, pp. 3033-3039.
    [22] X. Wu, S. Liu, T. Zhang, L. Yang, Y. Li, and T. Wang, "Motion control for biped robot via DDPG-based deep reinforcement learning," in 2018 WRC Symposium on Advanced Robotics and Automation (WRC SARA), 2018: IEEE, pp. 40-45.
    [23] J. Siekmann, K. Green, J. Warila, A. Fern, and J. Hurst, "Blind Bipedal Stair Traversal via Sim-to-Real Reinforcement Learning," arXiv e-prints, p. arXiv: 2105.08328, 2021.
    [24] J. Reher and A. D. Ames, "Dynamic Walking: Toward Agile and Efficient Bipedal Robots," Annual Review of Control, Robotics, and Autonomous Systems, vol. 4, pp. 535-572, 2021.
    [25] E. R. Westervelt, J. W. Grizzle, and D. E. Koditschek, "Hybrid zero dynamics of planar biped walkers," IEEE transactions on automatic control, vol. 48, no. 1, pp. 42-56, 2003.
    [26] G. A. Castillo, B. Weng, W. Zhang, and A. Hereid, "Hybrid zero dynamics inspired feedback control policy design for 3d bipedal locomotion using reinforcement learning," in 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020: IEEE, pp. 8746-8752.
    [27] T. Li, H. Geyer, C. G. Atkeson, and A. Rai, "Using deep reinforcement learning to learn high-level policies on the atrias biped," in 2019 International Conference on Robotics and Automation (ICRA), 2019: IEEE, pp. 263-269.
    [28] A. Yamaguchi, S.-H. Hyon, and T. Ogasawara, "Reinforcement learning for balancer embedded humanoid locomotion," in 2010 10th IEEE-RAS International Conference on Humanoid Robots, 2010: IEEE, pp. 308-313.
    [29] R. Beranek, M. Karimi, and M. Ahmadi, "A behavior-based reinforcement learning approach to control walking bipedal robots under unknown disturbances," IEEE/ASME Transactions on Mechatronics, 2021.
    [30] 蔡禾庭, "整合重心估測與力矩控制於提昇雙足機器人行走穩定性," 碩士, 動力機械工程學系, 國立清華大學, 2016. [Online]. Available: http://thesis.lib.nccu.edu.tw/record/#G021040335400%22.
    [31] 鄭逸倫, "整合力矩控制與重心估測並利用增強式學習提升雙足機器人行走穩定性," 碩士, 動力機械工程學系, 國立清華大學, 2017. [Online]. Available: http://thesis.lib.nccu.edu.tw/record/#G021050335280%22.
    [32] 殷昭駿, "基於預觀控制之步態規劃於運動學與動力學混成控制之雙足機器人," 碩士, 動力機械工程學系, 國立清華大學, 2020. [Online]. Available: http://thesis.lib.nccu.edu.tw/record/#G021080335300%22.
    [33] Ultimaker. "TDS TPU 95A v3.010 ZH(TW)." https://support.ultimaker.com/hc/en-us/articles/360012664440-Ultimaker-TPU-95A-TDS (accessed 05/16, 2017).
    [34] STMicroelectronics. "STM32F446RE." https://www.st.com/en/evaluation-tools/nucleo-f446re.html (accessed.
    [35] ROBOTIS. "Dynamixel MX-106T." https://emanual.robotis.com/docs/en/dxl/mx/mx-106-2/ (accessed.
    [36] ams. "AS5145B Rotary Sensor Datasheet." https://ams.com/zh/as5145b (accessed.
    [37] Xsens. "MTi-2 VRU." https://www.xsens.com/mti-2 (accessed.
    [38] 黃凱辰, "基於雙質量倒單擺動態之雙足機器人速度/力矩混成控制與步態規劃," 碩士, 動力機械工程學系, 國立清華大學, 2019. [Online]. Available: http://thesis.lib.nccu.edu.tw/record/#G021070335270%22.
    [39] T. Katayama, T. Ohki, T. Inoue, and T. Kato, "Design of an optimal controller for a discrete-time system subject to previewable demand," International Journal of Control, vol. 41, no. 3, pp. 677-699, 1985.
    [40] S. Thrun, "Probabilistic robotics," Communications of the ACM, vol. 45, no. 3, pp. 52-57, 2002.
    [41] Y. Kim and H. Bang, "Introduction to Kalman filter and its applications," Introduction and Implementations of the Kalman Filter, vol. 1, pp. 1-16, 2018.
    [42] H. Prautzsch, W. Boehm, and M. Paluszny, Bézier and B-spline techniques. Springer, 2002.
    [43] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari with Deep Reinforcement Learning," arXiv e-prints, p. arXiv: 1312.5602, 2013.
    [44] MathWorks. "rlDQNAgentOptions." https://www.mathworks.com/help/reinforcement-learning/ref/rldqnagentoptions.html (accessed.
    [45] MathWorks. "Deep Q-Network Agents." https://www.mathworks.com/help/reinforcement-learning/ug/dqn-agents.html (accessed.
    [46] J. Pratt, J. Carff, S. Drakunov, and A. Goswami, "Capture point: A step toward humanoid push recovery," in 2006 6th IEEE-RAS international conference on humanoid robots, 2006: IEEE, pp. 200-207.
    [47] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal Policy Optimization Algorithms," arXiv e-prints, p. arXiv: 1707.06347, 2017.
    [48] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, "Continuous control with deep reinforcement learning," arXiv e-prints, p. arXiv: 1509.02971, 2015.

    QR CODE