簡易檢索 / 詳目顯示

研究生: 陳令臻
CHEN, LING-CHEN
論文名稱: 針對機器人控制以關鍵點資訊改善基於畫面的強化學習
KeyState:Improving Image-based Reinforcement Learning with Keypoint Information for Robot Control
指導教授: 金仲達
King, Chung-Ta
口試委員: 江振瑞
Jiang, Jehn-Ruey
許秋婷
Hsu, Chiou-Ting
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2022
畢業學年度: 111
語文別: 英文
論文頁數: 25
中文關鍵詞: 強化學習機器人控制基於畫面的強化學習
外文關鍵詞: Reinforcement Learning, Robot control, Image-based Reinforcement Learning
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 從高維度畫面中學習是強化學習(RL)很重要的問題,因為它可以直接透過他人與環境互動的視覺畫面來學習。目前方法大多是透過從畫面中萃取與任務相關的資訊,並用其學習一個描述系統狀態的表示法。雖然現有方法可以有效地找到好的靜態資訊,它仍然需要很多時間才能學會系統中的動態資訊,然而動態資訊在機器人控制等應用卻是至關重要的。對於機器人控制任務來說,最重要的系統動態資訊莫過於機器人本身的動作,若機器人的狀態是已知的,基於圖像的強化學習就可以更有效率。在本篇論文中,我們提議提取機器人的關鍵點作為輔助資訊,來提昇強化學習從畫面學習的效率。 在這篇論文中,我們採用DeepMind Control Suite 來評估我們的方法,其是評估強化學習數據效率和性能的常見基準。實驗結果顯示,我們方法的效能平均較先前基於畫面的強化學習演算法高1.65倍。


    Learning from high-dimensional images is essential for reinforcement learning (RL) to train autonomous agents that can interact directly with the environment using visual observations. A general strategy is to extract task-relevant information in the images to learn a representation that characterizes the system states. Although existing works can find good representations for static states efficiently, they still take a long time to learn system dynamics, which is critical for applications such as robot control. For robot control, since the most important system dynamics is the motions of the robot itself, image-based RL can be more efficient if the state of the robot is known. In this thesis, we propose to extract the keypoints of the robot as auxiliary information to improve the data-efficiency of RL from image pixels. The proposed method, called KeyState, is evaluated on DeepMind Control Suite, a common benchmark for evaluating data-efficiency and performance of RL agents. The experimental results show that the performance of KeyState is, on average, 1.65 times better than the prior pixel-based methods.

    Acknowledgements 摘要 i Abstract ii 1 Introduction 1 2 Related Work 5 3 Method 7 3.1 Pre-trained Keypoint Discovery Model . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Keypoint-based Auxiliary Information . . . . . . . . . . . . . . . . . . . . . . 9 4 Experiments 11 4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.2 Experimental Results in Rich Reward Environments . . . . . . . . . . . . . . 13 4.3 Experimental Results in Sparse Reward Environments . . . . . . . . . . . . . 13 4.4 Experimental Results in Complex Environments . . . . . . . . . . . . . . . . . 14 4.5 Correlation between Physical States and Keypoint-based Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.6 Physical States Ablations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.7 The Necessity of Approximating Angles with Keypoints . . . . . . . . . . . . 18 5 Conclusion and Future Work 21 References 23

    [1] B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, “Building machines that
    learn and think like people,” Behavioral and brain sciences, vol. 40, 2017.
    [2] P. Tsividis, T. Pouncy, J. L. Xu, J. B. Tenenbaum, and S. J. Gershman, “Human learning
    in atari,” in Proceedings of the AAAI Spring Symposia, 2017.
    [3] L. Kaiser, M. Babaeizadeh, P. Milos, B. Osinski, R. H. Campbell, K. Czechowski, D. Erhan, C. Finn, P. Kozakowski, S. Levine, et al., “Model-based reinforcemenst learning for
    atari,” arXiv preprint arXiv:1903.00374, 2019.
    [4] D. Yarats, A. Zhang, I. Kostrikov, B. Amos, J. Pineau, and R. Fergus, “Improving sample
    efficiency in model-free reinforcement learning from images,” in Proceedings of the AAAI
    Conference on Artificial Intelligence, vol. 35, pp. 10674–10681, 2021.
    [5] D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson, “Learning
    latent dynamics for planning from pixels,” in Proceedings of the International conference
    on machine learning, pp. 2555–2565, PMLR, 2019.
    [6] D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi, “Dream to control: Learning behaviors by
    latent imagination,” in Proceedings of the International Conference on Learning Representations, 2020.
    [7] A. X. Lee, A. Nagabandi, P. Abbeel, and S. Levine, “Stochastic latent actor-critic: Deep
    reinforcement learning with a latent variable model,” Advances in Neural Information
    Processing Systems, vol. 33, pp. 741–752, 2020.
    [8] M. Laskin, A. Srinivas, and P. Abbeel, “Curl: Contrastive unsupervised representations
    for reinforcement learning,” Proceedings of the 37th International Conference on Machine
    Learning, Vienna, Austria, PMLR 119, 2020. arXiv:2004.04136.
    [9] M. Laskin, K. Lee, A. Stooke, L. Pinto, P. Abbeel, and A. Srinivas, “Reinforcement learning with augmented data,” Advances in neural information processing systems, vol. 33,
    pp. 19884–19895, 2020.
    [10] I. Kostrikov, D. Yarats, and R. Fergus, “Image augmentation is all you need: Regularizing
    deep reinforcement learning from pixels,” arXiv preprint arXiv:2004.13649, 2020.
    [11] Y. Tassa, Y. Doron, A. Muldal, T. Erez, Y. Li, D. d. L. Casas, D. Budden, A. Abdolmaleki,
    J. Merel, A. Lefrancq, et al., “Deepmind control suite,” arXiv preprint arXiv:1801.00690, 2018.
    [12] E. Shelhamer, P. Mahmoudieh, M. Argus, and T. Darrell, “Loss is its own reward: Selfsupervision for reinforcement learning,” arXiv preprint arXiv:1612.07307, 2016
    [13] T. Jakab, A. Gupta, H. Bilen, and A. Vedaldi, “Unsupervised learning of object landmarks
    through conditional image generation,” Advances in neural information processing systems, vol. 31, 2018.
    [14] Y. Zhang, Y. Guo, Y. Jin, Y. Luo, Z. He, and H. Lee, “Unsupervised discovery of object landmarks as structural representations,” in Proceedings of the IEEE Conference on
    Computer Vision and Pattern Recognition, pp. 2694–2703, 2018.
    [15] D. Lorenz, L. Bereska, T. Milbich, and B. Ommer, “Unsupervised part-based disentangling of object shape and appearance,” in Proceedings of the IEEE/CVF Conference on
    Computer Vision and Pattern Recognition, pp. 10955–10964, 2019.
    [16] T. Jakab, A. Gupta, H. Bilen, and A. Vedaldi, “Self-supervised learning of interpretable
    keypoints from unlabelled videos,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8787–8797, 2020.
    [17] J. J. Sun, S. Ryou, R. H. Goldshmid, B. Weissbourd, J. O. Dabiri, D. J. Anderson,
    A. Kennedy, Y. Yue, and P. Perona, “Self-supervised keypoint discovery in behavioral
    videos,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
    Recognition, pp. 2171–2180, 2022.
    [18] M. H. Graham, “Confronting multicollinearity in ecological multiple regression,” Ecology, vol. 84, no. 11, pp. 2809–2815, 2003.
    [19] R. Jonschkowski, R. Hafner, J. Scholz, and M. Riedmiller, “Pves: Position-velocity
    encoders for unsupervised learning of structured state representations,” arXiv preprint
    arXiv:1705.09805, 2017.
    [20] W. Shang, X. Wang, A. Srinivas, A. Rajeswaran, Y. Gao, P. Abbeel, and M. Laskin,
    “Reinforcement learning with latent flow,” Advances in Neural Information Processing
    Systems, vol. 34, pp. 22171–22183, 2021.
    [21] R. Boney, A. Ilin, and J. Kannala, “End-to-end learning of keypoint representations for
    continuous control from images,” arXiv preprint arXiv:2106.07995, 2021.
    [22] T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu,
    A. Gupta, P. Abbeel, et al., “Soft actor-critic algorithms and applications,” arXiv preprint
    arXiv:1812.05905, 2018.
    [23] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves,
    M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., “Human-level control through deep
    reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
    [24] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
    [25] M. Hessel, J. Modayil, H. Van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan,
    B. Piot, M. Azar, and D. Silver, “Rainbow: Combining improvements in deep reinforcement learning,” in Proceedings of the Thirty-second AAAI conference on artificial intelligence, 2018.
    [26] A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive
    coding,” arXiv preprint arXiv:1807.03748, 2018.
    [27] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised
    visual representation learning,” in Proceedings of the IEEE/CVF conference on computer
    vision and pattern recognition, pp. 9729–9738, 2020.
    [28] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive
    learning of visual representations,” in Proceedings of the International conference on machine learning, pp. 1597–1607, PMLR, 2020.
    [29] T. D. Kulkarni, A. Gupta, C. Ionescu, S. Borgeaud, M. Reynolds, A. Zisserman, and
    V. Mnih, “Unsupervised learning of object keypoints for perception and control,” Advances in neural information processing systems, vol. 32, 2019.
    [30] B. Chen, P. Abbeel, and D. Pathak, “Unsupervised learning of visual 3d keypoints for
    control,” in Proceedings of the International Conference on Machine Learning, pp. 1539–
    1549, PMLR, 2021.
    [31] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from
    error visibility to structural similarity,” IEEE transactions on image processing, vol. 13,
    no. 4, pp. 600–612, 2004.

    QR CODE