在增強式學習中以拉普拉斯運算為基礎做離散狀態值函式轉換

簡易檢索 / 詳目顯示

回結果列表

研究生：	曹怡亭 Yi-Ting Tsao
論文名稱：	在增強式學習中以拉普拉斯運算為基礎做離散狀態值函式轉換 Laplacian Based State-value Function Transfer in Discrete Reinforcement Learning
指導教授：	蘇豐文 Von-Wun Soo
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2008
畢業學年度：	96
語文別：	英文
論文頁數：	35
中文關鍵詞：	增強式學習、轉換學習、拉普拉斯運算
外文關鍵詞：	Reinforcement Learning, Transfer Learning, Laplacian
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在學習這件事中，分別學習兩個相似的問題可能會造成時間上的浪費，而這個浪費是由於重複學習同樣的子問題所造成。因此，所謂的轉換學習指得是縮短學習相似問題的時間，也就是再利用某一問題所學到的知識。很多過去的研究都注重於如何利用轉移函數來做知識的轉換，但是設計轉換函數必須要十分了解問題的特性，因此就算是此問題的專家也是不容易的一件事，所以我們提出一個以拉普拉斯為基礎的轉換方式。我們修改了原有的拉普拉斯運算，使之變成不只反應問題的拓撲特性，也反應在增強式學習中所需的獎勵。此外，我們也簡單地敘述這個方法的特性和解釋為何這個方法可以達到加速學習的目的。透過這個方法，我們可以直接轉換兩個增強式學習的問題而不需要轉換函式。在本論文中，我們研究三種不同的轉換形式，而從實驗的結果中可知這樣的轉換方式對於減少學習時間是有幫助的。

Abstract    iii
Acknowledgement    iv
Contents    v
Chapter 1 Introduction    2
1.1 Problem Statement    2
1.2 Related Work    3
1.3 The Transfer Types    4
Chapter 2 Background    6
2.1 Markov Decision Process    6
2.2 Reinforcement Learning    7
2.3 Laplacian    8
2.4 Example    9
Chapter 3 Methodology    11
3.1 The Modified Laplacian    11
3.2 The Property    12
3.3 The Transfer Method    14
Chapter 4 Experiments    16
4.1 The Scaling Cases    18
4.2 The Topological transfer Cases    21
4.3 The Reward Change Case    25
4.4 The Maze Case    27
Chapter 5 Discussions    30
Chapter 6 Future Work    32
Reference    33

                                

Chung FRK. 1997. Spectral graph theory: American Mathematical Society.
Hessling Av, Goel AK. 2005. Abstracting reusable cases from reinforcement learning. In Proceedings of the Sixth International Conference on Case-Based Reasoning Workshop.
Kimberly F, Mahadevan S. 2006. Proto-transfer learning in Markov decision processes using spectral methods. In Proceedings of the Twenty-Third International Conference on Machine Learning Workshop on Structural Knowledge Transfer for Machine Learning.
Liu Y, Stone P. 2006. Value-function-based transfer for reinforcement learning using structure mapping. In Proceedings of the Twenty-First National Conference on Artificial Intelligence. p 415-420.
Mahadevan S. 2005. Proto-value functions: Developmental reinforcement learning. In Proceedings of the Twenty-Second International Conference on Machine Learning.
Mahadevan S, Maggioni M. 2006. Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. Technical Report.
Mahadevan S, Maggioni M. 2007. Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. Journal of Machine Learning Research 8:2169-2231.
Puterman ML. 2005. Markov decision processes discrete stochastic dynamic programming: Wiley.
Russell S, Norvig P. 2003. Artificial intelligence a modern approach: Prentice Hall.
Sutton RS, Barto AG. 1998. Reinforcement learning an introduction: MIT press.
Taylor ME, Stone P. 2007. Cross-domain transfer for reinforcement learning. In Proceedings of the Twenty-Fourth International Conference on Machine Learning. p 879-886.
Taylor ME, Stone P, Liu Y. 2005. Value functions for RL-based behavior transfer: A comparative study. In Proceedings of the Twentieth National Conference on Artificial Intelligence. p 880-885.
Taylor ME, Whiteson S, Stone P. 2007. Transfer via inter-task mappings in policy search reinforcement learning. In Proceedings of the Sixth International Conference on Autonomous Agents and Multiagent Systems.
Tsao Y-T, Xiao K-T, Soo V-W. 2008. Graph Laplacian based transfer learning in reinforcement learning. In Proceedings of the Seventh International Conference on Autonomous Agents and Multiagent Systems. p 1349-1352.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文