AI 與機器人的橋樑：實現ChatGPT 作為機器人大腦的見解

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳柏霖 Chen, Po-Lin
論文名稱：	AI 與機器人的橋樑：實現ChatGPT 作為機器人大腦的見解 Bridging AI and Robotics: Insights from Implementing ChatGPT as a Robotic Brain
指導教授：	張正尚 Chang, Cheng-Shang
口試委員:	許健平 Sheu, Jang-Ping 李端興 Lee, Duan-Shin 楊谷洋 Yonug, Kuu-Young
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 通訊工程研究所 Communications Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	45
中文關鍵詞：	ChatGPT 、機器人、AlfWorld 、無人機、物聯網
外文關鍵詞：	ChatGPT, AlfWorld, Smart home, Airsim, Robotics
相關次數：	點閱：80 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

該研究論文探討了將OpenAI 的ChatGPT 引入機器人系統的情況，提供了適應
性強的提示方式，以便將自然語言指令轉化為可執行的機器人命令。我們展示了
ChatGPT 在各種機器人領域的多樣性應用，包括物聯網應用、機器人仿真以及模擬家
庭環境中的實體代理。在最初的幾節中，我們研究了ChatGPT 整理文本輸入、生成
代碼或從文本中提取信息以啟動功能和動作的能力。
此外，我們在諸如AlfWorld 的交互式決策基準測試中使用了多個ChatGPT 實例，
取得了驚人的98% 成功率。總體而言，本研究強調了提示工程的重要性以及利用自然
語言指令建立用戶友好的交互。通過提供實用知識和寶貴見解，本研究為機器人社區
做出了貢獻，促進了人機協作的進展。研究結果突顯了ChatGPT 在真實環境中理解
和執行複雜任務的潛力，從而為機器人領域的進一步發展鋪平了道路。

This research thesis examines the incorporation of OpenAI’s ChatGPT into robotics systems, presenting adaptable prompts that facilitate the conversion of natural language instructions into executable commands for robots. We showcase the versatility of ChatGPT in various domains of robotics, ranging from IoT applications to robotic simulations and embodied agents in simulated household environments. In the initial sections, we investigate ChatGPT’s capability to organize textual input and generate code or extract information from text to initiate functions and actions.
Furthermore, we employ multiple instances of ChatGPT in interactive decision-making benchmarks, such as AlfWorld, achieving an impressive success rate of 98%. Overall, this study underscores the significance of prompt engineering and the utilization of natural language instructions to establish user-friendly interaction. By providing practical knowledge and valuable insights, this research contributes to the robotics community, fostering advancements in collaborative human-robot endeavors.The outcomes highlight the potential of ChatGPT to comprehend and execute complex tasks effectively in real-world environments, thus paving the way for further progress in the field of robotics.

Contents 1
List of Figures 4
List of Tables 5
Introduction 6
Related work 9
ChatGPT’s aptitude and experiments 12
1 Simple task planning : smart home demo  . . . . . 13
1.1 Sending email . . . . . 14
1.2 Smart plug . . . . . 15
2 Complex reasoning task : Robot simulation . . . . . 18
2.1 Robot arm . . . . . 18
2.2 Airsim drone . . . . . 19
3 AlfWorld simulation . . . . . 21
3.1 AlfWorld Dataset . . . . . 22
3.2 Model architecture . . . . . 24
3.3 Evaluation . . . . . 26
Discussion and Limitations 29
1 Integrating APIs . . . . . 29
2 Automation control with ChatGPT . . . . . 30
3 Error assessment with supervisor . . . . . 30
Conclusion 32
Appendix 34
1 Basic prompts for smart home examples . . . . . 34
1.1 Sending email . . . . . 34
1.2 Smart plug . . . . . 34
2 Parser content for smart home examples . . . . . 35
2.1 Sending email . . . . . 35
2.2 Smart plug . . . . . 36
3 AlfWorld Experiment details . . . . . 36
                                

[1] OpenAI, ChatGPT. https://chat.openai.com. 2023.
[2] Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron
David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog,et al. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
[3] Kostas Alexis, Christos Papachristos, Roland Siegwart, and Anthony Tzes. Uniform
coverage structural inspection path–planning for micro aerial vehicles. In 2015 IEEE
international symposium on intelligent control (ISIC), pages 59–64. IEEE, 2015.
[4] Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis,
Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine
Hsu, et al. Rt-1: Robotics transformer for real-world control at scale. arXiv preprint
arXiv:2212.06817, 2022.
[5] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla
Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell,
et al. Language models are few-shot learners. Advances in neural information processing
systems, 33:1877–1901, 2020.
[6] Devendra Singh Chaplot, Deepak Pathak, and Jitendra Malik. Differentiable spatial
planning using transformers. In International Conference on Machine Learning,
pages 1484–1495. PMLR, 2021.
[7] Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin,
Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing
systems, 34:15084–15097, 2021.
[8] Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav
Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian
Gehrmann, et al. Palm: Scaling language modeling with pathways. arXiv
preprint arXiv:2204.02311, 2022.
[9] Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus,
Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. Scaling
instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
[10] Antonia Creswell and Murray Shanahan. Faithful reasoning using large language
models. arXiv preprint arXiv:2208.14271, 2022.
[11] E Datasheet. Esp8266ex datasheet. Espressif Systems Datasheet, 1:31, 2015.
[12] William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to
trillion parameter models with simple and efficient sparsity. The Journal of Machine
Learning Research, 23(1):5232–5270, 2022.
[13] Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick.
Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
[14] Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. Language
models as zero-shot planners: Extracting actionable knowledge for embodied agents.
In International Conference on Machine Learning, pages 9118–9147. PMLR, 2022.
[15] Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy
Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, et al. Inner monologue:
Embodied reasoning through planning with language models. arXiv preprint
arXiv:2207.05608, 2022.
[16] Steven M LaValle and James J Kuffner Jr. Randomized kinodynamic planning. The
international journal of robotics research, 20(5):378–400, 2001.
[17] Kuang-Huei Lee, Ofir Nachum, Mengjiao Sherry Yang, Lisa Lee, Daniel Freeman,
Sergio Guadarrama, Ian Fischer, Winnie Xu, Eric Jang, Henryk Michalewski, et al.
Multi-game decision transformers. Advances in Neural Information Processing Systems,
35:27921–27936, 2022.
[18] Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu
Timofte. Swinir: Image restoration using swin transformer. In Proceedings of the
IEEE/CVF international conference on computer vision, pages 1833–1844, 2021.
[19] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin,
and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted
windows. In Proceedings of the IEEE/CVF international conference on computer
vision, pages 10012–10022, 2021.
[20] Guoyu Lu, Sheng Li, Gengchen Mai, Jin Sun, Dajiang Zhu, Lilong Chai, Haijian
Sun, Xianqiao Wang, Haixing Dai, Ninghao Liu, et al. Agi for agriculture. arXiv
preprint arXiv:2304.06136, 2023.
[21] Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah
Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Selfrefine:
Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651,
2023.
[22] Varun Nair, Elliot Schumacher, Geoffrey Tso, and Anitha Kannan. Dera: enhancing
large language model completions with dialog-enabled resolving agents. arXiv
preprint arXiv:2303.17071, 2023.
[23] Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, Henryk Michalewski, Jacob
Austin, David Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, David
Luan, et al. Show your work: Scratchpads for intermediate computation with language
models. arXiv preprint arXiv:2112.00114, 2021.
[24] OpenAI. Gpt-4 technical report. arXiv, 2023.
[25] Dongwon Park and Se Young Chun. Classification based grasp detection using spatial
transformer network. arXiv preprint arXiv:1803.01356, 2018.
[26] Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and
Ilya Sutskever. Robust speech recognition via large-scale weak supervision. arXiv
preprint arXiv:2212.04356, 2022.
[27] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael
Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer
learning with a unified text-to-text transformer. The Journal of Machine Learning
Research, 21(1):5485–5551, 2020.
[28] Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. Airsim: Highfidelity
visual and physical simulation for autonomous vehicles. In Field and Service
Robotics: Results of the 11th International Conference, pages 621–635. Springer,
2018.
[29] Noah Shinn, Beck Labash, and Ashwin Gopinath. Reflexion: an autonomous agent
with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366, 2023.
[30] Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Cliport: What and where pathways
for robotic manipulation. In Conference on Robot Learning, pages 894–906. PMLR,
2022.
[31] Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han,
Roozbeh Mottaghi, Luke Zettlemoyer, and Dieter Fox. Alfred: A benchmark
for interpreting grounded instructions for everyday tasks. In Proceedings of the
IEEE/CVF conference on computer vision and pattern recognition, pages 10740–
10749, 2020.
[32] Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Cˆot´e, Yonatan Bisk, Adam Trischler,
and Matthew Hausknecht. Alfworld: Aligning text and embodied environments for
interactive learning. arXiv preprint arXiv:2010.03768, 2020.
[33] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N
Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in
neural information processing systems, 30, 2017.
34] Sai Vemprala, Rogerio Bonatti, Arthur Bucker, and Ashish Kapoor. Chatgpt for
robotics: Design principles and model abilities. 2023, 2023.
[35] Naoki Wake, Atsushi Kanehira, Kazuhiro Sasabuchi, Jun Takamatsu, and Katsushi
Ikeuchi. Chatgpt Empowered Long-Step Robot Control in Various Environments:
A Case Application. arXiv preprint arXiv:2304.03893, 2023.
[36] ZihaoWang, Shaofei Cai, Anji Liu, Xiaojian Ma, and Yitao Liang. Describe, explain,
plan and select: Interactive planning with large language models enables open-world
multi-task agents. arXiv preprint arXiv:2302.01560, 2023.
[37] Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian
Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, et al.
Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022.
[38] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and
Denny Zhou. Chain of thought prompting elicits reasoning in large language models.
arXiv preprint arXiv:2201.11903, 2022.
[39] Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan,
and Yuan Cao. React: Synergizing reasoning and acting in language models. arXiv
preprint arXiv:2210.03629, 2022.
[40] Liu Zhuang, Lin Wayne, Shi Ya, and Zhao Jun. A robustly optimized bert pretraining
approach with post-training. In Proceedings of the 20th chinese national
conference on computational linguistics, pages 1218–1227, 2021.

簡易檢索 / 詳目顯示

相關論文