在獵人與獵物問題中利用模組化增強式學習法學習團隊分組

簡易檢索 / 詳目顯示

回結果列表

研究生：	許世龍 Hsu, Shih-Lung
論文名稱：	在獵人與獵物問題中利用模組化增強式學習法學習團隊分組 Team Partition Using Modular Reinforcement Learning in Hunter-Prey Problems
指導教授：	蘇豐文 Soo, Von-Wun
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2009
畢業學年度：	97
語文別：	英文
論文頁數：	31
中文關鍵詞：	多代理人系統、獵人獵物問題、模組化增強式學習、分組、合作
外文關鍵詞：	Multi-agent System, Hunter Prey Problem, Modular Reinforcement Learning, Team Partition, Cooperative
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

關於在多重代理人環境中如何利用學習來協調代理人之間的合作行為，目前已經有許多的研究有效地利用增強式學習，讓代理人學到完成任務所需要的策略。然而，當環境中有許多不同的任務需要完成時，單純地利用增強式學習而產生的策略，並不能自動地學習到分組來讓代理人同步地完成不同任務，而是讓所有代理人一起先完成一個任務，接著在依序一起完成剩下的任務。在本論文中，我們利用模組化增強式學習，讓代理人在擁有多重任務的多重代理人環境中，除了學習動作之外，同時學習選擇目標，並利用目標的選擇來達成有效分組。我們將我們的方法應用在獵人與獵物問題中，結果顯示一群獵人在學習分組後，可以有效地縮短獵捕所有獵物所需的時間以及學習所需的收斂時間，並且從追捕的過程中，可以看到代理人的分組情況。

About how to coordinate the behaviors of multiple agents, several studies have been made to allow multiple agents to synthesize the coordinated decision policy needed to accomplish their common goals through reinforcement learning effectively. When there are multiple goals in the environment, however, the agents can not obtain the policy that can allow them to split into teams and achieve the goals simultaneously and automatically through reinforcement learning. On the other hand, they attempt to achieve the same goal together and then the next respectively. In this thesis, we use modular reinforcement learning to allow agents to learn not only action-selection but also target-selection in multi-goal cooperative MAS. Through target-selection, we can achieve team partition. We apply our method to hunter-prey problems and the results show that our method has better a performance in the average capturing steps and the convergence speed of learning. We also demonstrate the result of team partition by showing a scenario in the hunting process.

摘要    II
ABSTRACT    III
ACKNOWLEDGEMENT    IV
TABLE OF CONTENTS    V
LIST OF TABLES    VI
LIST OF FIGURES    VII
CHAPTER 1  INTRODUCTION    1
1.1 HUNTER-PREY PROBLEM    1
1.2 REINFORCEMENT LEARNING    3
1.3 RELATED WORK    6
1.4 ORGANIZATION OF THE THESIS    7
CHAPTER 2 METHODOLOGY    9
2.1 MODULAR ARCHITECTURE    9
2.2 MODULAR Q-LEARNING WITH TARGET MODULE    11
2.3 MULTI-AGENT LEARNING ALGORITHM    14
CHAPTER 3 EXPERIMENTS AND DISCUSSION    18
3.1 COMPARISON OF DIFFERENT TARGET-RESELECT PERIODS    18
3.2 COMPARISON WITH TYPICAL MODULAR Q-LEARNING    20
3.3 RESULTS OF TEAM-PARTITION    24
CHAPTER 4 CONCLUSION    28
REFERENCES    29

                                

Puterman ML. 2005. Markov Decision Processes Discrete Stochastic Dynamic Programming. Wiley.
N. Ono and K. Fukumoto. 1996. Multi-agent Reinforcement Learning: A Modular Approach. Proceedings of the 2nd International Conference on Multi-agent Systems, pp: 252-258.
N. Ono and A.T. Rahmani. 1993. Self-Organization of Communication in Distributed Learning Classifier System. In Proceedings of International Conference on Artificial Neural Nets and Genetic Algorithm.
N. Ono, T. Ohira, and A.T. Rahmani. 1995. Evolution of Interspecies Communication in Q-learning Artificial Organisms. In ISCA International Conference: Fourth Golden West Conference on Intelligent Systems.
M. Tan. 1993. Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents. In Proceedings of the 10th International Conference on Machine Learning. Morgan Kaufmann.
R.S. Sutton and A.G. Barto. 1998. Reinforcement Learning: An introduction. Cambridge: MIT Press.
M. Benda, V. Jagannathan, and R. Dodhiawalla, 1985. On Optimal Cooperation of Knowledge Sources. Technical Report. BCS-G2010-28. Boeing AI Center.
Richard S. Sutton. 2003. Reinforcement Learning And Artificial Intelligence, iCORE Research Grant Proposal.
Stuart Russell and Peter Norvig. 2003. Artificial Intelligence: A Modern Approach. Prentice-Hall, Englewood Cliffs, NJ, 2nd edition.
L.P. Kaelbling, M.L. Littman, and Andrew Moore. 1996. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4:237-285.
C. J. Watkins. Learning from Delayed Rewards. 1989. PhD thesis, Cambridge University.
C. J. Watkins and P. Dayan, 1992. Technical Note: Q-Learning. Machine Learning, 8(3/4), Kluwer Academic Publishers.
S. Whitehead et. al. 1993. Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging. In J.H. Conell et. al. eds. 1993. Robot Learning. Kluwer Academic Press.
A. Drogoul, J.ferber, B. Corbara, and D. Fresnean. 1991. A Behavioral Simulation Model for the study of Emergent Social Structures. In Proceedings of the First European conference on Artificial Life. The MIT Press.
H. Yanco and L.A. Stein. 1992. An Adaptive Communication Protocol for Cooperating Mobile Robots. In From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior. The MIT Press.
T. Haynes, K. Lau, and S. Sen. 1996. Learning Cases to Compliment Rules for Conflict Resolution in Multiagent Systems. Working Notes for the AAAI Symposium on Adaptation, Co-evolution and Learning in Multiagent Systems. pp. 51-56. AAAI Press.
Geoff Nitschke. 2003. Emergence of Cooperation in a Multiple Predator, Single Prey Game. FLAIRS Conference. pp. 234-238.
Sachiyo Arai and Katia Sycara. 2000. Effective Learning Approach for Planning and Scheduling in Multi-agent Domain. In Proceedings of 6th ISAB – From Animals to Animats 6. pp. 507-516.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文