簡易檢索 / 詳目顯示

研究生: 葉念昆
Yeh, Nien-Kun
論文名稱: Dealing with Perceptual Aliasing by Using Pruning Suffix Tree Memory in Reinforcement Learning
在增強式學習中用修剪字尾樹處理感知混淆現象
指導教授: 蘇豐文
Soo, Von-Wun
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 32
中文關鍵詞: 部分可觀測馬可夫決策過程感知混淆現象字尾樹增強式學習
外文關鍵詞: POMDP, Perceptual Aliasing, Suffix Tree, Reinforcement Learning
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在一個部分可觀測馬可夫決策過程中,增強式學習代理人有時候會因為偵
    測系統的限制而無法分辨出問題中兩個不同的狀態,也就是俗稱的感知混淆現
    象。為了解決這個問題,有些研究合併先前事件的記憶來區別出受到感知混淆
    的狀態。McCallum使用Utile Suffix Memory (USM):一種將實例儲存在樹狀結
    構來表示狀態的實例基底方法。他使用了邊緣(fringe) (在真正樹下的預定義深
    度之延伸子樹)的概念並提供了一個有底限的前瞻演算法。然而,但是使用邊緣
    會造成整棵樹使用過多的節點。我們介紹了一種以使用不同於USM的分離樹葉
    節點方法取代邊緣的改良版USM來解決這個問題。我們在實驗中展現了我們的
    方法總是產生比USM還要小的樹並且代理人可以學到一個可接受的策略。


    In a POMDP (Partially Observable Markov Decision Process) problem, the
    Reinforcement Learning agent always has a chance to unable to distinguish two
    different state of the world, called perceptual aliasing, due to the limitation of sensory
    system. To solve this problem, some researchers have incorporated memory
    of preceding events to distinguish perceptually-aliased states. McCallum proposed
    Utile Suffix Memory (USM) [7], an instance-based method using a tree to store
    instances and to represent states. He use of a fringe (an extension of the tree to
    a pre-specified depth below the real tree) provides the algorithm a limited degree
    of lookahead capability. However, the use of a fringe make the tree hold more
    node in terms of tree size. We introduce a modification of USM to solve this issue
    without the use of fringe by using a different criterion with USM to split a leaf
    node. In our experiments, we have show that our method always produces trees
    that contain fewer nodes than USM and the agent learns a applicable policy.

    1 Introduction 1 1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Memory-less Methods . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 Memory-based Methods . . . . . . . . . . . . . . . . . . . . 5 2 Background 7 2.1 Markov Decision Process . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Perceptual Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Methodology 11 3.1 Utile Suffix Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Greedy Utile Suffix Memory . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Modified USM using pruning methods . . . . . . . . . . . . . . . . 16 3.3.1 Shrink the PST . . . . . . . . . . . . . . . . . . . . . . . . . 19 4 Results 21 5 Conclusion 29 Reference 32

    Sachiyo Arai and Katia Sycara. Credit assignment method for learning effective
    stochastic policies in uncertain domains.
    [2] Leonard Breslow. Greedy utile suffix memory for reinforcement learning with
    perceptually-aliased states. Technical report, Research Laboratory, Navy
    Center for, 1996.
    [3] Lonnie Chrisman. Reinforcement learning with perceptual aliasing: The perceptual
    distinctions approach. In In Proceedings of the Tenth National Conference
    on Artificial Intelligence, pages 183–188. AAAI Press, 1992.
    [4] Le Tien Dung, Takashi Komeda, Motoki Takagi, and Shibaur. Mixed reinforcement
    learning for partially observable markov decision process. In
    Proceedings of the 2007 IEEE International Symposium on Computational
    Intelligence in Robotics and Automation, 2007.
    [5] Long-Ji Lin and Tom M. Mitchell. Reinforcement learning with hidden states.
    In Proceedings of the second international conference on From animals to
    animats 2 : simulation of adaptive behavior, pages 271–280, Cambridge, MA,
    USA, 1993. MIT Press.
    [6] Andrew Kachites Mccallum. Reinforcement learning with selective perception
    and hidden state. PhD thesis, 1996. Supervisor-Ballard, Dana.
    [7] R. Andrew Mccallum. Instance-based utile distinctions for reinforcement
    learning with hidden state. In In Proceedings of the Twelfth International
    Conference on Machine Learning, pages 387–395. Morgan Kaufmann, 1995.
    [8] M. Ohta, Y. Kumada, and I. Noda. Using suitable action selection rule
    in reinforcement learning. In Systems, Man and Cybernetics, 2003. IEEE
    International Conference on, volume 5, pages 4358–4363 vol.5, Oct. 2003.
    [9] Dana Ron, Yoram Singer, and Naftali Tishby. Learning probabilistic automata
    with variable memory length. In In Proceedings of the Seventh Annual
    ACM Conference on Computational Learning Theory, pages 35–46. ACM
    Press, 1994.
    [10] Anton Maximilian Sch‥afer, Steffen Udluft, and Departement Neural Computation.
    Solving partially observable reinforcement learning problems with
    recurrent neural networks. In In Workshop Proc. of the European Conference
    on Machine Learning, 2005.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE