簡易檢索 / 詳目顯示

研究生: 陳柏霖
Chen, Po-Lin
論文名稱: AI 與機器人的橋樑:實現ChatGPT 作為機器人大腦的見解
Bridging AI and Robotics: Insights from Implementing ChatGPT as a Robotic Brain
指導教授: 張正尚
Chang, Cheng-Shang
口試委員: 許健平
Sheu, Jang-Ping
李端興
Lee, Duan-Shin
楊谷洋
Yonug, Kuu-Young
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 通訊工程研究所
Communications Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 45
中文關鍵詞: ChatGPT機器人AlfWorld無人機物聯網
外文關鍵詞: ChatGPT, AlfWorld, Smart home, Airsim, Robotics
相關次數: 點閱:80下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 該研究論文探討了將OpenAI 的ChatGPT 引入機器人系統的情況,提供了適應
    性強的提示方式,以便將自然語言指令轉化為可執行的機器人命令。我們展示了
    ChatGPT 在各種機器人領域的多樣性應用,包括物聯網應用、機器人仿真以及模擬家
    庭環境中的實體代理。在最初的幾節中,我們研究了ChatGPT 整理文本輸入、生成
    代碼或從文本中提取信息以啟動功能和動作的能力。
    此外,我們在諸如AlfWorld 的交互式決策基準測試中使用了多個ChatGPT 實例,
    取得了驚人的98% 成功率。總體而言,本研究強調了提示工程的重要性以及利用自然
    語言指令建立用戶友好的交互。通過提供實用知識和寶貴見解,本研究為機器人社區
    做出了貢獻,促進了人機協作的進展。研究結果突顯了ChatGPT 在真實環境中理解
    和執行複雜任務的潛力,從而為機器人領域的進一步發展鋪平了道路。


    This research thesis examines the incorporation of OpenAI’s ChatGPT into robotics systems, presenting adaptable prompts that facilitate the conversion of natural language instructions into executable commands for robots. We showcase the versatility of ChatGPT in various domains of robotics, ranging from IoT applications to robotic simulations and embodied agents in simulated household environments. In the initial sections, we investigate ChatGPT’s capability to organize textual input and generate code or extract information from text to initiate functions and actions.
    Furthermore, we employ multiple instances of ChatGPT in interactive decision-making benchmarks, such as AlfWorld, achieving an impressive success rate of 98%. Overall, this study underscores the significance of prompt engineering and the utilization of natural language instructions to establish user-friendly interaction. By providing practical knowledge and valuable insights, this research contributes to the robotics community, fostering advancements in collaborative human-robot endeavors.The outcomes highlight the potential of ChatGPT to comprehend and execute complex tasks effectively in real-world environments, thus paving the way for further progress in the field of robotics.

    Contents 1 List of Figures 4 List of Tables 5 1 Introduction 6 2 Related work 9 3 ChatGPT’s aptitude and experiments 12 3.1 Simple task planning : smart home demo . . . . . 13 3.1.1 Sending email . . . . . 14 3.1.2 Smart plug . . . . . 15 3.2 Complex reasoning task : Robot simulation . . . . . 18 3.2.1 Robot arm . . . . . 18 3.2.2 Airsim drone . . . . . 19 3.3 AlfWorld simulation . . . . . 21 3.3.1 AlfWorld Dataset . . . . . 22 3.3.2 Model architecture . . . . . 24 3.3.3 Evaluation . . . . . 26 4 Discussion and Limitations 29 4.1 Integrating APIs . . . . . 29 4.2 Automation control with ChatGPT . . . . . 30 4.3 Error assessment with supervisor . . . . . 30 5 Conclusion 32 6 Appendix 34 6.1 Basic prompts for smart home examples . . . . . 34 6.1.1 Sending email . . . . . 34 6.1.2 Smart plug . . . . . 34 6.2 Parser content for smart home examples . . . . . 35 6.2.1 Sending email . . . . . 35 6.2.2 Smart plug . . . . . 36 6.3 AlfWorld Experiment details . . . . . 36

    [1] OpenAI, ChatGPT. https://chat.openai.com. 2023.
    [2] Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron
    David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog,et al. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
    [3] Kostas Alexis, Christos Papachristos, Roland Siegwart, and Anthony Tzes. Uniform
    coverage structural inspection path–planning for micro aerial vehicles. In 2015 IEEE
    international symposium on intelligent control (ISIC), pages 59–64. IEEE, 2015.
    [4] Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis,
    Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine
    Hsu, et al. Rt-1: Robotics transformer for real-world control at scale. arXiv preprint
    arXiv:2212.06817, 2022.
    [5] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla
    Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell,
    et al. Language models are few-shot learners. Advances in neural information processing
    systems, 33:1877–1901, 2020.
    [6] Devendra Singh Chaplot, Deepak Pathak, and Jitendra Malik. Differentiable spatial
    planning using transformers. In International Conference on Machine Learning,
    pages 1484–1495. PMLR, 2021.
    [7] Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin,
    Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing
    systems, 34:15084–15097, 2021.
    [8] Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav
    Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian
    Gehrmann, et al. Palm: Scaling language modeling with pathways. arXiv
    preprint arXiv:2204.02311, 2022.
    [9] Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus,
    Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. Scaling
    instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
    [10] Antonia Creswell and Murray Shanahan. Faithful reasoning using large language
    models. arXiv preprint arXiv:2208.14271, 2022.
    [11] E Datasheet. Esp8266ex datasheet. Espressif Systems Datasheet, 1:31, 2015.
    [12] William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to
    trillion parameter models with simple and efficient sparsity. The Journal of Machine
    Learning Research, 23(1):5232–5270, 2022.
    [13] Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick.
    Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF
    Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
    [14] Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. Language
    models as zero-shot planners: Extracting actionable knowledge for embodied agents.
    In International Conference on Machine Learning, pages 9118–9147. PMLR, 2022.
    [15] Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy
    Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, et al. Inner monologue:
    Embodied reasoning through planning with language models. arXiv preprint
    arXiv:2207.05608, 2022.
    [16] Steven M LaValle and James J Kuffner Jr. Randomized kinodynamic planning. The
    international journal of robotics research, 20(5):378–400, 2001.
    [17] Kuang-Huei Lee, Ofir Nachum, Mengjiao Sherry Yang, Lisa Lee, Daniel Freeman,
    Sergio Guadarrama, Ian Fischer, Winnie Xu, Eric Jang, Henryk Michalewski, et al.
    Multi-game decision transformers. Advances in Neural Information Processing Systems,
    35:27921–27936, 2022.
    [18] Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu
    Timofte. Swinir: Image restoration using swin transformer. In Proceedings of the
    IEEE/CVF international conference on computer vision, pages 1833–1844, 2021.
    [19] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin,
    and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted
    windows. In Proceedings of the IEEE/CVF international conference on computer
    vision, pages 10012–10022, 2021.
    [20] Guoyu Lu, Sheng Li, Gengchen Mai, Jin Sun, Dajiang Zhu, Lilong Chai, Haijian
    Sun, Xianqiao Wang, Haixing Dai, Ninghao Liu, et al. Agi for agriculture. arXiv
    preprint arXiv:2304.06136, 2023.
    [21] Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah
    Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Selfrefine:
    Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651,
    2023.
    [22] Varun Nair, Elliot Schumacher, Geoffrey Tso, and Anitha Kannan. Dera: enhancing
    large language model completions with dialog-enabled resolving agents. arXiv
    preprint arXiv:2303.17071, 2023.
    [23] Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, Henryk Michalewski, Jacob
    Austin, David Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, David
    Luan, et al. Show your work: Scratchpads for intermediate computation with language
    models. arXiv preprint arXiv:2112.00114, 2021.
    [24] OpenAI. Gpt-4 technical report. arXiv, 2023.
    [25] Dongwon Park and Se Young Chun. Classification based grasp detection using spatial
    transformer network. arXiv preprint arXiv:1803.01356, 2018.
    [26] Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and
    Ilya Sutskever. Robust speech recognition via large-scale weak supervision. arXiv
    preprint arXiv:2212.04356, 2022.
    [27] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael
    Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer
    learning with a unified text-to-text transformer. The Journal of Machine Learning
    Research, 21(1):5485–5551, 2020.
    [28] Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. Airsim: Highfidelity
    visual and physical simulation for autonomous vehicles. In Field and Service
    Robotics: Results of the 11th International Conference, pages 621–635. Springer,
    2018.
    [29] Noah Shinn, Beck Labash, and Ashwin Gopinath. Reflexion: an autonomous agent
    with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366, 2023.
    [30] Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Cliport: What and where pathways
    for robotic manipulation. In Conference on Robot Learning, pages 894–906. PMLR,
    2022.
    [31] Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han,
    Roozbeh Mottaghi, Luke Zettlemoyer, and Dieter Fox. Alfred: A benchmark
    for interpreting grounded instructions for everyday tasks. In Proceedings of the
    IEEE/CVF conference on computer vision and pattern recognition, pages 10740–
    10749, 2020.
    [32] Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Cˆot´e, Yonatan Bisk, Adam Trischler,
    and Matthew Hausknecht. Alfworld: Aligning text and embodied environments for
    interactive learning. arXiv preprint arXiv:2010.03768, 2020.
    [33] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N
    Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in
    neural information processing systems, 30, 2017.
    34] Sai Vemprala, Rogerio Bonatti, Arthur Bucker, and Ashish Kapoor. Chatgpt for
    robotics: Design principles and model abilities. 2023, 2023.
    [35] Naoki Wake, Atsushi Kanehira, Kazuhiro Sasabuchi, Jun Takamatsu, and Katsushi
    Ikeuchi. Chatgpt Empowered Long-Step Robot Control in Various Environments:
    A Case Application. arXiv preprint arXiv:2304.03893, 2023.
    [36] ZihaoWang, Shaofei Cai, Anji Liu, Xiaojian Ma, and Yitao Liang. Describe, explain,
    plan and select: Interactive planning with large language models enables open-world
    multi-task agents. arXiv preprint arXiv:2302.01560, 2023.
    [37] Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian
    Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, et al.
    Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022.
    [38] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and
    Denny Zhou. Chain of thought prompting elicits reasoning in large language models.
    arXiv preprint arXiv:2201.11903, 2022.
    [39] Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan,
    and Yuan Cao. React: Synergizing reasoning and acting in language models. arXiv
    preprint arXiv:2210.03629, 2022.
    [40] Liu Zhuang, Lin Wayne, Shi Ya, and Zhao Jun. A robustly optimized bert pretraining
    approach with post-training. In Proceedings of the 20th chinese national
    conference on computational linguistics, pages 1218–1227, 2021.

    QR CODE