研究生: |
邱中鎮 Chung-Cheng Chiu |
---|---|
論文名稱: |
階層式解問題中的問題切割和狀態抽象化 Problem Decomposition and State Abstraction in Hierarchical Problem Solving |
指導教授: |
蘇豐文
Von-Wun Soo |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2007 |
畢業學年度: | 95 |
語文別: | 英文 |
論文頁數: | 82 |
中文關鍵詞: | 馬可夫決策過程 、增強式學習 、階層式增強式學習 、圖光譜分析 |
外文關鍵詞: | Markov Decision Processes, Reinforcement Learning, Hierarchical Reinforcement Learning, Spectral Graph Theory |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
這個研究的目的是要針對解問題的方法上,提出演算法將原有的問題做切割並做狀態抽象化以降低問題的複雜度。在維度降低的空間中,解問題的方法能夠花費更少的計算量來求出問題的解法。將問題做切割並降低維度,建出階層式架構以增進解問題的效率是一種常被採用的方法,而過去主要的研究多仰賴設計者手動執行這些分析。一些研究提出自動辨認的方法,但缺乏針對切割的問題做狀態抽象化的程序,原有複雜度並未被降低。這篇研究基於針對拉普拉斯圖的光譜分析提出新的問題切割演算法,並在切割過的子問題上分析參數相關度來施行狀態抽象化。每個子問題保留與解決該子目標相關的參數,再透過特徵比對方式將相同子問題合併。原有的完整問題被轉換成多個子問題的組合,在這維度降低的空間中求解將更有效率。其對解問題方法的幫助將在後面的實驗展示。
The purpose of this work is to propose an algorithm to decompose the problem and perform state abstraction to reduce the complexity of problem solving. In dimension-reduced state space, the computational cost for problem solving is lessened. Decomposing the problem and reducing its dimension to construct hierarchy structure is one of approach applied in problem solving, and it requires manual construction. Some previous works proposed for automatically subproblems identification, but with the lack of state abstraction, the complexity is not reduced. We propose an algorithm based on spectral analysis on graph Laplacian to decompose the problem and perform parameter relativity analysis to provide state abstraction. In each decomposed subproblem, only parameters in projected state space related to its subgoal are reserved, and identical subproblems are integrated into one through features comparison. The whole problem is transformed into a combination of projected subproblems, and problem solving in this space is more efficient. The paper demonstrates its improvement on problem solving experimentally.
Ahuja, R. K., Magnanti, T. L. and Orlin, J.B. (1993). Network Flows: Theory, Algorithms, and Applications. Prentice Hall.
Andre, D. and Russell, S. (2002). State Abstraction for Programmable Reinforcement Learning Agents. In Proceedings of the AAAI-02, Edmonton, Alberta. AAAI Press.
Belkin, M. and Niyogi, P. (2002). Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. Proceedings of Advances in Neural Information Processing Systems, 14.
Belkin, M. and Niyogi, P. (2004). Semi-supervised learning on Riemannian manifolds. Machine Learning, 56, 209-239.
Barto, A. G. and Mahadevan, S. (2003). Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems, 13(4), 341 – 379.
Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific,Belmont, Massachusetts.
Botea A., Müller M. and Schaeffer J. (2004). Using Component Abstraction for Automatic Generation of Macro-Actions. In Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling, 181-190
Cheeger, J. (1970). A lower bound for the smallest eigenvalue of the Laplacian. In Problems in analysis: a symposium in honor of Salomon Bochner, 195-199. Princeton University Press.
Chung, F. (1997). Spectral Graph Theory. Number 92. CBMS-AMS.
Chung, F. (2005). Laplacians and the Cheeger inequality for directed graphs. Annals of Combinatorics, 9, 1-19.
Digney, B. (1998). Learning Hierarchical Control Structure for Multiple Tasks and Changing Environments. In Proceedings of the Fifth Conference on the
79
Simulation of Adaptive Behavior. MIT Press.
Dietterich, T. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13, 227-303
Erol, K., Hendler, J. and Nau. D. (1996). Complexity results for HTN planning. Annals of Mathematics and Artificial Intelligence, 18(1), 69-93.
Hengst, B. (2002). Discovering Hierarchy in Reinforcement Learning with HEXQ. In Proceedings of the Nineteenth International Conference on Machine Learning, 243-250.
Johns, J. and Mahadevan, S. (2007). Constructing Basis Functions from Directed Graphs for Value Function Approximation. International Conference on Machine Learning (ICML), 2007, Corvallis, Oregon.
Keller, P. W., Precup, D. and Mannor, S. (2006). Automatic Basis Function Construction for Approximate Dynamic Programming and Reinforcement Learning. International Conference on Machine Learning (ICML), 2006.
Knoblock, C. A. (1994). Automatically Generating Abstractions for Planning. Artificial Intelligence, 68(2).
Lagoudakis, M. and Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107-1149.
Langley, P. and Choi, D. (2006). Learning recursive control programs from problem solving. Journal of Machine Learning Research, 7, 493-518
Lin, L. (1992). Self-Improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8, 293–321.
Maggiono, M. and Mahadevan, S. (2006). A Multiscale Framework For Markov Decision Processes using Diffusion Wavelets. Technical Report TR-2006-36, University of Massachusetts, Department of Computer Science.
Mahadevan, S. and Maggiono, M. (2006). Proto-Value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision
80
Processes. Technical Report TR-2006-35, University of Massachusetts, Department of Computer Science.
Mannor, S., Menache, I., Hoze, A. and Klein, U. (2004). Dynamic abstraction in reinforcement learning via clustering. In Proceedings of the Twenty-First International Conference on Machine Learning, 560–567. ACM Press.
Marthi, B. and Russell S. (2006). A Compact, Hierarchical Q-Function Decomposition. In Proceeding of UAI-06, Cambridge, MA.
McGovern, A., and Barto, A. G. (2001). Automatic discovery of subgoals in reinforcement learning using diverse density. In Proceedings of the Eighteenth International Conference on Machine Learning, 361-368. Morgan Kaufmann.
McGovern, A. (2002). Autonomous Discovery of Temporal Abstractions from Interaction with an Environment. Ph.D. Thesis, University of Massachusetts Amherst.
Menache, I., Mannor, S. and Shimkin, N. (2002). Q-Cut - Dynamic discovery of sub-goals in reinforcement learning. In Proceedings of the Thirteenth European Conference on Machine Learning, 295–306. Springer.
Munos, R. (2003). Error bounds for approximate policy iteration. In Procedding of International Conference on Machine Learning, 2003.
Parr, R. and Russell, S. (1998). Reinforcement Learning with Hierarchies of Machines. In Advances in Neural Information Processing Systems 10, MIT Press.
Roweis , S. and Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323--2326.
Shi, J., and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888–905.
Şimşek, Ö., and Barto, A. G. (2004). Using relative novelty to identify useful temporal abstractions in reinforcement learning. In Proceedings of the Twenty-First International Conference on Machine Learning, 751–758. ACM Press.
81
82
Şimşek, Ö., Wolfe, A. P. and Barto, A.G. (2005). Identifying Useful Subgoals in Reinforcement Learning by Local Graph Partitioning. In Proceedings of the Twenty-Second International Conference on Machine Learning, 816-813.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press.
Sutton, R. S., Precup, D. and Singh, S. P. (1999). Between MDPs and Semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181-211.
Tenenbaum, J.B., de Silva, V. and Langford, J. C. (2000). A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290, 2319-2323.