簡易檢索 / 詳目顯示

研究生: 張耀明
Zhang, Yao-Ming
論文名稱: 具備單雙手抓取能力的工業應用人形機器人視覺系統開發
Development of a Vision System for Humanoid Robots with Single- and Two-Hand Grasping Capabilities for Industrial Applications
指導教授: 馬席彬
Ma, Hsi-Pin
口試委員: 黃稚存
Huang, Chih-Tsun
張添烜
Chang, Tian-Sheuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 113
語文別: 英文
論文頁數: 55
中文關鍵詞: 深度學習人形機器人抓取產生
外文關鍵詞: Deep Learning, Humanoid Robots, Grasp Generation
相關次數: 點閱:66下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在當今快速發展的世界中,人形機器人系統因其類似人類的能力而引起了廣泛關注。在物件抓取任務中,具有雙手的人形機器人展現出更大的潛力。然而,先前的研究大多集中於單手抓取,並未從系統層級考量機器人系統應具備的功能。

    本論文旨在探討人形機器人系統應包含的基本功能。本研究基於關鍵點抓取網路(KGN)的方法,提出了一種基於深度學習的人形機器人視覺系統,整合了物件偵測與抓取生成模組,專門針對工業物件進行識別。本研究提出的系統包含物件偵測模組、單手抓取生成模組與雙手抓取生成模組,為人形機器人提供單手和雙手的抓取候選。該系統以RGB-D影像作為輸入,並由網路模型辨識影像中的物體位置,同時生成單手和雙手的抓取候選。接著,本研究提出的抓取物件配對模組會將每個抓取候選與偵測到的物件進行配對,並作為系統輸出。此外,為了訓練網路,本研究開發了一個專注於晶圓廠物件的數據集,用於訓練和測試。

    相比於先前的研究,本研究所提出的方法增加了物件偵測與雙手抓取生成的功能。為了驗證此多任務架構的可行性,我使用本研究所開發的數據集,將本研究提出的方法與KGN在單手抓取生成方面進行比較。結果顯示,相較於 KGN,所提出的方法在抓取成功率(GSR)上提高了 2\%,但在抓取覆蓋率(GCR)上降低了 2\%。GSR 和 GCR 均達到 70\%以上的準確率。這顯示出儘管在準確度上略有取捨,所提出的視覺系統在加入物件偵測與雙手抓取生成功能的基礎上仍然具備競爭力,證明此方法的可行性及其未來研究的潛力。


    In today's rapidly evolving world, humanoid robot systems have drawn significant attention due to their human-like capabilities. In object grasping tasks, humanoid robots with two hands exhibit greater potential. However, previous research has mostly focused on single-hand grasping and has not considered the functionalities a robotic system should possess from a systems-level perspective.

    This thesis aims to explore the essential functionalities a humanoid robot system should include. Based on the method in Keypoint-graspnet(KGN), this study proposes a deep-learning-based vision system for humanoid robots, which integrates object detection and grasp generation modules, specifically targeting industrial objects. The proposed system includes modules for object detection, single-hand grasp generation, and two-hand grasp generation, providing both single-hand and two-hand grasp candidates for the humanoid robot. The system takes RGB-D images as input, and the network model identifies the positions of objects in the image and generates single-hand and two-hand grasp candidates. Then, the grasp-object matching module proposed in this study pairs each grasp candidate with a detected object, which the system outputs. Additionally, to train the network, a dataset focused on wafer fabrication objects was developed for training and testing.

    Compared to previous studies, the proposed method adds object detection and two-hand grasp generation functionalities. To validate the feasibility of this multi-task architecture, I used the dataset developed in this study to compare the proposed method with KGN in terms of single-hand grasp generation. The results show that, compared to KGN, the proposed method increases the grasp success rate (GSR) by 2\% while reducing the grasp coverage rate (GCR) by 2\%. Both GSR and GCR achieve an accuracy of over 70\%. It demonstrates that despite the slight trade-off in accuracy, the proposed vision system remains competitive by incorporating object detection and two-hand grasp generation on top of single-hand grasping, proving the approach's feasibility and its potential for future research.

    摘要--------------------------------------------i 誌謝--------------------------------------------iii Abstract----------------------------------------v 1. Introduction---------------------------------1 2. Background Knowledge and Literature Survey---5 3. Proposed Algorithm---------------------------15 4. Implementation Results and Comparison--------35 5. Conclusion and Future Works------------------51 References--------------------------------------53

    1. S. Saeedvand, M. Jafari, H. S. Aghdasi, and J. Baltes, “A comprehensive survey on humanoid robot development,” The Knowledge Engineering Review, vol. 34, p. e20, 2019.

    2. Y. Tong, H. Liu, and Z. Zhang, “Advancements in humanoid robots: A comprehensive review and future prospects,” IEEE/CAA Journal of Automatica Sinica, vol. 11, no. 2, pp. 301–328, 2024.

    3. R. Newbury, M. Gu, L. Chumbley, A. Mousavian, C. Eppner, J. Leitner, J. Bohg, A. Morales, T. Asfour, D. Kragic, D. Fox, and A. Cosgun, “Deep learning approaches to grasp synthesis: A review,” IEEE Transactions on Robotics, vol. 39, no. 5, pp. 3994-4015, 2023.

    4. Y. Cong, R. Chen, B. Ma, H. Liu, D. Hou, and C. Yang, “A comprehensive study of 3d vision-based robot manipulation,” IEEE Transactions on Cybernetics, vol. 53, no. 3, pp. 1682–1698, 2021.

    5. E.Chisari, N. Heppert, T.Welschehold, W.Burgard, and A.Valada, “Centergrasp: Object-aware implicit representation learning for simultaneous shape reconstruction and 6-dof grasp estimation,” IEEE Robotics and Automation Letters, vol. 9, no. 6, pp. 5094–5101, 2024.

    6. J. Rojas-Quintero and M. Rodríguez-Liñán, “A literature review of sensor heads for humanoid robots,” Robotics and Autonomous Systems, vol. 143, p. 103834, 2021.

    7. X. Zhou, D. Wang, and P. Krähenbühl, “Objects as points,” arXiv preprint arXiv:1904.07850, 2019.

    8. J. Redmon, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.

    9. T.-Y. Ross and G. Dollár, “Focal loss for dense object detection,” in proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2980–2988, 2017.

    10. G. Zhai, D. Huang, S.-C. Wu, H. Jung, Y. Di, F. Manhardt, F. Tombari, N. Navab, and B. Busam, “Monograspnet: 6-dof grasping with a single rgb image,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 1708–1714, IEEE, 2023.

    11. H. Cheng, Y. Wang, and M. Q.-H. Meng, “A vision-based robot grasping system,” IEEE Sensors Journal, vol. 22, no. 10, pp. 9610–9620, 2022.

    12. R. Xu, F.-J. Chu, and P. A. Vela, “Gknet: Grasp keypoint network for grasp candidates detection,” The International Journal of Robotics Research, vol. 41, no. 4, pp. 361–389, 2022.

    13. A.Depierre, E.Dellandréa, and L.Chen, “Jacquard: A large scale dataset for robotic grasp detection,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3511–3516, Oct 2018.

    14. Y. Chen, Y. Lin, R. Xu, and P. A. Vela, “Keypoint-graspnet: Keypoint-based 6-dof grasp generation from the monocular rgb-d input,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 7988–7995, 2023.

    15. P. Wang, H. Jung, Y. Li, S. Shen, R. P. Srikanth, L. Garattoni, S. Meier, N. Navab, and B. Busam, “Phocal: A multi-modal dataset for category-level object pose estimation with photometrically challenging objects,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 21222–21231, 2022.

    16. H. Jung, P. Ruhkamp, G. Zhai, N. Brasch, Y. Li, Y. Verdie, J. Song, Y. Zhou, A. Armagan, S. Ilic, et al., “On the importance of accurate geometry data for dense 3d vision tasks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 780–791, 2023.

    17. M.Z.Irshad, T. Kollar, M. Laskey, K. Stone, and Z. Kira, “Centersnap: Single-shot multi-object 3d shape reconstruction and categorical 6d pose and size estimation,” in 2022 International Conference on Robotics and Automation (ICRA), pp. 10632–10640, IEEE, 2022.

    18. Z. Jiang, Y. Zhu, M. Svetlik, K. Fang, and Y. Zhu, “Synergies between affordance and geometry: 6-dof grasp detection via implicit representations,” arXiv preprint arXiv:2104.01542, 2021.

    19. M. Sundermeyer, A. Mousavian, R. Triebel, and D. Fox, “Contact-graspnet: Efficient 6-dof grasp generation in cluttered scenes,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13438–13444, IEEE, 2021.

    20. A. Mousavian, C. Eppner, and D. Fox, “6-dof graspnet: Variational grasp generation for object manipulation,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 2901–2910, 2019.

    21. G. S. INC., “Gyro humanoid robot..” https://www.gyro.com.tw/, 2024.

    22. F. Yu, D. Wang, E. Shelhamer, and T. Darrell, “Deep layer aggregation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2403–2412, 2018.

    23. J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in Proceedings of the IEEE international conference on computer vision, pp. 764–773, 2017.

    24. Y. Lin, J. Tremblay, S. Tyree, P. A. Vela, and S. Birchfield, “Single-stage keypoint-based category-level object pose estimation from an rgb image,” in 2022 International Conference on Robotics and Automation (ICRA), pp. 1547–1553, IEEE, 2022.

    25. A. Ahmadyan, L. Zhang, A. Ablavatski, J. Wei, and M. Grundmann, “Objectron: A large scale dataset of object-centric videos in the wild with pose annotations,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7822–7831, 2021.

    QR CODE