Dealing with Perceptual Aliasing by Using Pruning Suffix Tree Memory in Reinforcement Learning

簡易檢索 / 詳目顯示

回結果列表

研究生：	葉念昆 Yeh, Nien-Kun
論文名稱：	Dealing with Perceptual Aliasing by Using Pruning Suffix Tree Memory in Reinforcement Learning 在增強式學習中用修剪字尾樹處理感知混淆現象
指導教授：	蘇豐文 Soo, Von-Wun
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2009
畢業學年度：	97
語文別：	英文
論文頁數：	32
中文關鍵詞：	部分可觀測馬可夫決策過程、感知混淆現象、字尾樹、增強式學習
外文關鍵詞：	POMDP, Perceptual Aliasing, Suffix Tree, Reinforcement Learning
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在一個部分可觀測馬可夫決策過程中，增強式學習代理人有時候會因為偵
測系統的限制而無法分辨出問題中兩個不同的狀態，也就是俗稱的感知混淆現
象。為了解決這個問題，有些研究合併先前事件的記憶來區別出受到感知混淆
的狀態。McCallum使用Utile Suffix Memory (USM)：一種將實例儲存在樹狀結
構來表示狀態的實例基底方法。他使用了邊緣(fringe) (在真正樹下的預定義深
度之延伸子樹)的概念並提供了一個有底限的前瞻演算法。然而，但是使用邊緣
會造成整棵樹使用過多的節點。我們介紹了一種以使用不同於USM的分離樹葉
節點方法取代邊緣的改良版USM來解決這個問題。我們在實驗中展現了我們的
方法總是產生比USM還要小的樹並且代理人可以學到一個可接受的策略。

In a POMDP (Partially Observable Markov Decision Process) problem, the
Reinforcement Learning agent always has a chance to unable to distinguish two
different state of the world, called perceptual aliasing, due to the limitation of sensory
system. To solve this problem, some researchers have incorporated memory
of preceding events to distinguish perceptually-aliased states. McCallum proposed
Utile Suffix Memory (USM) [7], an instance-based method using a tree to store
instances and to represent states. He use of a fringe (an extension of the tree to
a pre-specified depth below the real tree) provides the algorithm a limited degree
of lookahead capability. However, the use of a fringe make the tree hold more
node in terms of tree size. We introduce a modification of USM to solve this issue
without the use of fringe by using a different criterion with USM to split a leaf
node. In our experiments, we have show that our method always produces trees
that contain fewer nodes than USM and the agent learns a applicable policy.

Introduction 1
1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Memory-less Methods . . . . . . . . . . . . . . . . . . . . . 3
2.2 Memory-based Methods . . . . . . . . . . . . . . . . . . . . 5
Background 7
1 Markov Decision Process . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Perceptual Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Methodology 11
1 Utile Suffix Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Greedy Utile Suffix Memory . . . . . . . . . . . . . . . . . . . . . . 14
3 Modified USM using pruning methods . . . . . . . . . . . . . . . . 16
3.1 Shrink the PST . . . . . . . . . . . . . . . . . . . . . . . . . 19
Results 21
Conclusion 29
Reference 32

                                

Sachiyo Arai and Katia Sycara. Credit assignment method for learning effective
stochastic policies in uncertain domains.
[2] Leonard Breslow. Greedy utile suffix memory for reinforcement learning with
perceptually-aliased states. Technical report, Research Laboratory, Navy
Center for, 1996.
[3] Lonnie Chrisman. Reinforcement learning with perceptual aliasing: The perceptual
distinctions approach. In In Proceedings of the Tenth National Conference
on Artificial Intelligence, pages 183–188. AAAI Press, 1992.
[4] Le Tien Dung, Takashi Komeda, Motoki Takagi, and Shibaur. Mixed reinforcement
learning for partially observable markov decision process. In
Proceedings of the 2007 IEEE International Symposium on Computational
Intelligence in Robotics and Automation, 2007.
[5] Long-Ji Lin and Tom M. Mitchell. Reinforcement learning with hidden states.
In Proceedings of the second international conference on From animals to
animats 2 : simulation of adaptive behavior, pages 271–280, Cambridge, MA,
USA, 1993. MIT Press.
[6] Andrew Kachites Mccallum. Reinforcement learning with selective perception
and hidden state. PhD thesis, 1996. Supervisor-Ballard, Dana.
[7] R. Andrew Mccallum. Instance-based utile distinctions for reinforcement
learning with hidden state. In In Proceedings of the Twelfth International
Conference on Machine Learning, pages 387–395. Morgan Kaufmann, 1995.
[8] M. Ohta, Y. Kumada, and I. Noda. Using suitable action selection rule
in reinforcement learning. In Systems, Man and Cybernetics, 2003. IEEE
International Conference on, volume 5, pages 4358–4363 vol.5, Oct. 2003.
[9] Dana Ron, Yoram Singer, and Naftali Tishby. Learning probabilistic automata
with variable memory length. In In Proceedings of the Seventh Annual
ACM Conference on Computational Learning Theory, pages 35–46. ACM
Press, 1994.
[10] Anton Maximilian Sch‥afer, Steffen Udluft, and Departement Neural Computation.
Solving partially observable reinforcement learning problems with
recurrent neural networks. In In Workshop Proc. of the European Conference
on Machine Learning, 2005.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文