研究生: |
水谷英二 Eiji Mizutani |
---|---|
論文名稱: |
Artificial Neural Networks Nonlinear Least Squares Learning 類神經網路非線性最小平方學習法 |
指導教授: | 張智星 |
口試委員: | |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2006 |
畢業學年度: | 94 |
語文別: | 英文 |
論文頁數: | 389 |
中文關鍵詞: | 類神經網路 、非線性最小平方學習法 |
外文關鍵詞: | Neural Networks, Nonlinear Least Squares Learning |
相關次數: | 點閱:4 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
論文摘要 :
機器學習是包含許多因素的複雜功能,我們終極的目標是去實現一個聰明的代理人可以透過互動的環境中學習正確的事物。為了撰寫論文,我們必須把代理學習這個大題目縮小範圍到一個適當的的主題,也就是類神經網路學習。雖然基本的精神是由生物學所來,
但所採用的方法是當今的電腦和各種資訊科學的技術。
在論文中最重要的部分是在可以套用於類神經學習的演算及運算結果,
特別是在監督式學習方面, 採用的類神經網路模型是針對輸入及指定的輸出反應來做最佳化。直覺的方法一般會導引出所謂的「類神經網路非線性最小平方問題」。
我們針對提出的問題採用當今的數直線性代數方法,特別是那些跟類神經網路有明顯關係的部分,並針對資料的稀疏性,
對稱性階段架構和參數的分離性來做特徵。這些關鍵的特徵卻經常在類神經網路的文獻裡被忽略。
這裡說明為何稀疏問題在多重反應的問題中的重要性,一般來說多重反應問題會包含兩個典型的稀疏矩陣。而利用資料的稀疏性可以導出有效學習的演算法並可以套用於機器學習和最佳化問題。
下一個要提出的是對稱的架構並嵌入在一個多重層的感知器。一個通用的多層次向前式類神經網路,會在參數空間形成一個圓錐形(對稱)。並且利用離散階段的最佳控制理論導出進階的學習策略。這個方法用來設定初始值的敏感度並套用在在一個典型的二類歸類問題,用新的學習策略可以得到一個值得注意的結果。
Machine learning is a complicated function of many elements.
Our ultimate goal is to realize an intelligent agent that
can learn to do the right things from reinforcement through
interactions with the environment. As a dissertation theme,
we have narrowed down a vast scope of ``agent learning'' to
a small yet indispensable ``brain modeling'' subsidiary,
called ``artificial'' neural-network (NN) learning.
It is quite artificial because our approaches (described
in this dissertation) are tied to efficient implementations
on modern computers designed from engineering and computer
science perspectives although its fundamental concept is
biologically inspired.
Our primary interest in this dissertation resides in
the development of algorithmic and computational results
applicable to NN-learning, especially to supervised
learning, where our NN model is optimized to learn
the designated outputs in response to certain input stimuli.
A straightforward formulation often gives rise to
what we call ``neural networks nonlinear least squares
problems.'' We attack the posed problems in conjunction with
modern numerical linear algebra techniques specially geared
to conspicuous characteristics arising in NN-learning;
specifically, we identify and exploit data sparsity,
symmetric stagewise architecture, and parameter separability.
These key features are often neglected in the NN literature.
We begin to explain how sparsity issues stand up in
multiple-response problems, which commonly entail two
typical sparse matrix formats: a block-arrow Hessian matrix,
and a block-angular Jacobian matrix of the residual vector.
Exploiting the data sparsity leads to very efficient learning
algorithms suitable for a wide variety of machine learning
and optimization problems: In small- or medium-scale
optimization problems, the sparsity-exploitation
makes an efficient matrix factorization on the Hessian
matrix, while in large-scale problems it fulfills a sparse
matrix-vector multiply for extracting the Hessian information
in the Krylove subspace. The latter method comes into play
as a new learning mode as ``iterative batch learning''
implementable in either full-batch or
mini-batch (i.e., block) mode.
We next direct our special attention to a symmetric
``stagewise'' structure embedded in a so-called multi-layer
perceptron (MLP), a popular feed-forward NN model with multiple
layers (or stages); geometrically, an MLP forms a (symmetric)
cone in the parameter space. The theory of discrete-stage
optimal control dictates advanced learning strategies such as
the introduction of ``stage costs'' in addition to the
terminal cost, leading to what we call ``hidden-node teaching.''
A remarkable result obtained by this new learning scheme
is that it can develop insensitivity to initial parameters
in a classical two-class classification parity benchmark problem.
More significantly, the theory serves to exploit the
nice multi-stage symmetric structure for evaluating the Hessian
matrix just as the well-known (first-order) backpropagation
computes the gradient vector in a stagewise fashion.
Our newly-developed ``stagewise'' second-order backpropagation
algorithm, derived from the second-order optimal control theory,
can evaluate the full Hessian matrix faster than ``standard''
methods that obtain only the Gauss-Newton Hessian matrix
(e.g., see Matlab NN-toolbox for such a procedure); this is
a truly tremendous breakthrough in the nonlinear least squares
sense. In reality, the full Hessian matrix may not be
positive (semi-)definite during the learning phase, but
the widely-employed trust-region nonlinear
optimization method can deal excellently with the indefinite
Hessian since the underlying theory has thrived on the
``negative curvatures'' over the last two decades.
The trust-region approach based on
the full Hessian matrix is of immense value in solving
real-world ``large-residual'' nonlinear least squares problems
because the matrix of second derivatives is important to
efficiency. In consequence, our stagewise second-order
backpropagation approach would prove practically useful
for general nonlinear optimization in a broader
sense as long as a posed problem possesses a stagewise
constitution.
Furthermore, a model of mixed linear and nonlinear parameters
may become of great concern in various contexts of machine learning.
In numerical linear algebra, the variable projection (VP) algorithm
has been the standard approach to the ``separable'' nonlinear
(i.e., mixed linear & nonlinear) least squares problems since
early 1970s. For the sake of second-order algorithms, we desire
to use as much Hessian information as possible while manipulating
certain structural properties associated with a given NN model.
Looking in this spirit toward further exploitation of parameter
separability, we have endeavored to devise an extension of VP
algorithms that employ the full Hessian matrix.
The consequent method aims at solving large-residual machine
learning problems when both linear and nonlinear parameters
co-exist in a given learning model. Although this approach
still needs further investigation, it would probably help
in optimizing other machine learning models such as
generalized linear discriminant functions.
Special structure should always be exploited when it arises.
The multi-stage NN-learning is an excellent challenge, for
it exhibits a great deal of structure; the principal ingredients
are analyzed out to be sparse, symmetric, stagewise, and separable.
Along the guidance on structure exploitation,
we emphasize the rigorous mathematical theory of optimal control
as well as the practical use of modern numerical linear algebra
and nonlinear numerical optimization for algorithmic design purposes.
Our proposed learning methods could apply broadly
to learning machines in yet unexplored domains and therefore have
enormous potential for diverse future extensions.
John E. Dennis, David M. Gay, and Roy E. Welsch.
An adaptive nonlinear least-squares algorithm.
ACM Transactions on Mathematical Software, 7(3):348--368,
September 1981.
C. M. Bishop.
Exact calculation of the {Hessian matrix for the multilayer
perceptron.
Neural Computation, 4(4):494--501, 1992.
Larry Nazareth.
Some recent approaches to solving large residual nonlinear least
squares problems.
In Proceedings of the ninth interface symposium on computer
science and statistics, pages 161--167, April 1976.
J. E. Dennis and R. E. Welsch.
Techniques for nonlinear least squares and robust regression.
Communications in Statistics: Simulation and Computation,
B7(4):345--359, 1978.
Eiji Mizutani.
Powell's dogleg trust-region steps with the quasi-{Newton augmented
{Hessian for neural nonlinear least-squares learning.
In Proceedings of the IEEE Int'l Conf. on Neural Networks
(vol.2), pages 1239--1244, Washington, D.C., July 1999.
Eiji Mizutani and James W. Demmel.
On iterative {Krylov-dogleg trust-region steps for solving neural
networks nonlinear least squares problems.
In T. Leen, T. Dietterich, and V. Tresp, editors, Advances in
Neural Information Processing Systems (NIPS 2000), volume 13, pages
605--611. MIT Press, 2001.
John E. Dennis and Robert B. Schnabel.
Numerical Methods for Unconstrained Optimization and Nonlinear
Equations.
Prentice Hall, New Jersey, 1983.
R. L. Watrous.
Learning algorithms for connectionist networks: Applied gradient
methods of nonlinear optimization.
In Proceedings of the IEEE International Conference on Neural
Networks, volume 2, pages 619--627, June 1987.
R. Battiti and F. Masulli.
{BFGS optimization for faster and automated supervised learning.
In Proceedings of the IEEE International Conference on Neural
Networks, volume 2, pages 757--760, 1990.
Eiji Mizutani.
Computing {Powell's dogleg steps for solving adaptive networks
nonlinear least-squares problems.
In Proc. of the 8th Int'l Fuzzy Systems Association World
Congress, vol.2, pages 959--963, Hsinchu, Taiwan, August 1999.
Andrew R. Conn, Nicholas IM Gould, and Philippe L. Toint.
Trust-Region Methods.
SIAM MPS/SIAM Series on Optimization, 2000.
J.J. Mor\'{e and D.C. Sorensen.
Computing a trust region step.
SIAM J. Sci. Stat. Comp, 4(3):553--572, 1983.
D.C. Sorensen.
Newton's method with a model trust region modification.
SIAM Journal on Numerical Analysis, 19:409--426, 1982.
J. J. Mor\'{e.
The {Levenberg-{Marquardt algorithm: implementation and theory.
In G. A. Watson, editor, Nemerical analysis, lecture notes in
mathematics 630, pages 105--116. Springer-Verlag, London, 1977.
S. M. Goldfeld, R. E. Quandt, and H. F. Trotter.
Maximization by quadratic hill climbing.
Econometrica, 34:541--551, 1966.
K. Levenberg.
A method for the solution for certain non-linear problems in least
squares.
Quart. Appl. Math., 2:164--168, 1944.
Donald W. Marquardt.
An algorithm for least squares estimation of nonlinear parameters.
Journal of the Society of Industrial and Applied Mathematics,
2:431--441, 1963.
Timothy Masters.
Practical neural network recipe in C++.
Academic Press, Inc., 1993.
Justin A. Boyan.
Technical updates: Least-squares temporal difference learning.
Machine Learning, 25:1--15, 1999.
P. E. Gill and W. Murray.
Newton-type methods for unconstrained and linearly constrained
optimization.
Mathematical Programming, 28:311--350, 1974.
Robert B. Schnabel and Elizabeth Eskow.
A revised modified cholesky factorization algorithm.
SIAM journal on Optimization, 9(4):1135--1148, 1999.
James W. Demmel.
Applied {Numerical {Linear {Algebra.
SIAM, 1997.
$\dot{\mbox{A$ke Bj$\ddot{\mbox{o$rck.
Numerical Methods for Least Squares Problems.
SIAM, 1996.
P. E. Gill, W. Murray, and M. H. Wright.
Practical optimization.
Academic Press, New York, 1981.
R. Fletcher.
Practical methods of optimization: unconstrained optimization.
John Wiley \& Sons, New York, 1980.
Jorge J. Mor\'{e.
Recent developments in algorithms and software for trust region
methods.
In A. Bachem, M. Grotschel, and B. Korte, editors, Mathematical
programming : the state of the art, pages 258--287. Springer-Verlag, 1983.
M.T. Hagan and M. Menhaj.
Training feedforward networks with the {Marquardt algorithm.
IEEE Transactions on Neural Networks, 5(6):989--993, 1994.
H. Demuth and M. Beale.
Neural {Network {Toolbox for {Use with {MATLAB.
The MathWorks, Inc., Natick, Massachusetts, 1998.
User's Guide (version 3.0).
M. D. Hebden.
An algorithm for minimization using exact second derivatives.
Technical Report AERE-TP515, Harwell Report, 1973.
M. J. D. Powell.
A new algorithm for unconstrained optimization.
In Ritter Rosen, Mangasarian, editor, Nonlinear Programming,
pages 31--65. Academic Press, New York, 1970.
R. Fletcher.
A modified {Marquardt subroutine for nonlinear least squares.
Technical Report AERE-R6799, Harwell Report, 1971.
J. E. Dennis and H.H.-W. Mei.
Two new unconstrained optimization algorithms which use function and
gradient values.
Journal of Optimization: Theory and Applications, 28:453--482,
1977.
E. Mizutani, K. Nishio, N. Katoh, and M. Blasgen.
Color device characterization of electronic cameras by solving
adaptive networks nonlinear least squares problems.
In Proceedings of the 8th IEEE International Conference on Fuzzy
Systems, vol. 2, pages 858--862, Seoul, Korea, August 1999.
Eiji Mizutani and James W. Demmel.
On separable nonlinear least squares algorithms for neuro-fuzzy
modular network learning.
In Proceedings of the International Joint Conference on Neural
Networks, pages 2399--2404, Honolulu USA, May 2002.
Trond Steihaug.
The conjugate gradient method and trust regions in large scale
optimization.
SIAM J. Numer. Anal., 20(3):626--637, 1983.
Philippe L. Toint.
Towards an efficient sparsity exploiting {Newton method for
minimization.
In I. S. Duff, editor, Sparse Matrices and Their Uses, pages
57--88. Academic Press, 1981.
G. H. Golub and C. F. Van Loan.
Matrix computations (3rd Ed.).
Johns Hopkins University Press, 1996.
Owe Axelsson.
Iterative solution methods.
Cambridge University Press, 1994.
E. Mizutani, S.E. Dreyfus, and K. Nishio.
On derivation of {MLP backpropagation from the {Kelley-{Bryson
optimal-control gradient formula and its application.
In Proceedings of the IEEE International Conference on Neural
Networks (vol.2), pages 167--172, Como, Italy, July 2000.
{\scriptsize{Available at {\sf
http://www.ieor.berkeley.edu/People/Faculty/dreyfus-pubs/ijcnn2k.ps.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams.
Learning internal representations by error propagation,
volume 1.
MIT press, Cambridge, MA., 1986.
Gene H. Golub and Randall J. LeVeque.
Extensions and uses of the variable projection algorithm for solving
nonlinear least squares problems.
Technical Report SU 326, Dept. of Computer Science, Stanford
University, 1978.
also ARO Report 79-3).
S. C. Eisenstat and H. F. Walker.
Choosing the forcing terms in an inexact {Newton method.
SIAM. J. Sci. Comput., 17(1):16--32, 1996.
R. S. Dembo and T. Steihaug.
Truncated-{Newton algorithms for large-scale unconstrained
optimization.
Math. Prog., 26:190--212, 1983.
R. S. Dembo, S. C. Eisenstat, and T. Steihaug.
Inexact {Newton methods.
SIAM J. Numerical Analysis, 19(2):400--408, 1982.
Philippe L. Toint.
On large scale nonlinear least squares calculations.
SIAM J. Sci. Statist. Comput., 8(3):416--435, 1987.
Eiji Mizutani and Stuart E. Dreyfus.
{MLP's hidden-node saturations and insensitivity to initial weights
in two classification benchmark problems: parity and two-spirals.
In Proceedings of the IEEE International Joint Conference on
Neural Networks (IJCNN'02), part of the 2002 World Congress on Computational
Intelligence (WCCI'02), volume 3, pages 2831--2836, Honolulu, USA, May 2002.
T. Caelli, L. Guan, and W. Wen.
Modularity in neural computing.
Proceedings of the IEEE, 87(9):1497--1518, September 1999.
S.-Y. Kung, J. Taur, and S.-H. Lin.
Synergistic modeling and applications of hierarchical fuzzy neural
networks.
Proceedings of the IEEE, 87(9):1550--1574, September 1999.
R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton.
Adaptive mixtures of local experts.
Neural Computation, 3:79--87, 1991.
David Opitz and Richard Maclin.
Popular ensemble methods: An empirical study.
Journal of Artificial Intelligence Research, 11:169--198, 1999.
R. A. Jacobs and M. I. Jordan.
A competitive modular connectionist architecture.
In R. P. Lippmann, J. E. Moody, and D. J. Touretzky, editors, {\em
Advances in Neural Information Processing Systems 3, pages 767--773, San
Mateo, CA, 1991. Morgan Kaufmann.
S. J. Nowlan and G. E. Hinton.
Evaluation of adaptive mixtures of competing experts.
In R. P. Lippmann, J. E. Moody, and D. J. Touretzky, editors, {\em
Advances in Neural Information Processing Systems 3, pages 774--780, San
Mateo, CA, 1991. Morgan Kaufmann.
Christopher M. Bishop.
Neural Networks for Pattern Recognition.
Oxford Press, 1995.
Eiji Mizutani.
Chapter 13: {Coactive {Neuro-{Fuzzy {Modeling: {Toward
{Generalized {ANFIS.
In Neuro-{Fuzzy and {Soft {Computing: a computational
approach to learning and machine intelligence, pages 369--400. J.-S. Roger
Jang, C.-T. Sun and E. Mizutani. Prentice Hall, 1997.
E. Mizutani, J.-S. R. Jang, K. Nishio, H Takagi, and D. M. Auslander.
Coactive neural networks with adjustable fuzzy membership functions
and their applications.
In Proceedings of the International Conference on Fuzzy Logic
and Neural Networks, pages 581--582, Iizuka, Japan, August 1994.
Eiji Mizutani and Kenichi Nishio.
Fuzzy mixtures of complementary local experts: Towards neuro-fuzzy
modular networks.
In Proceedings of the IEEE International Conference on Fuzzy
Systems (FUZZ-IEEE), part of the 2002 World Congress on Computational
Intelligence (WCCI'02): Vol. 2, pages 1192--1197, Honolulu USA, May 2002.
Eiji Mizutani and Kenichi Nishio.
Multi-illuminant color reproduction for electronic cameras via canfis
neuro-fuzzy modular network device characterization.
IEEE Transactions on Neural Networks: Special Issue on
Intelligent Multimedia Processing, 13(4):1009--1022, July 2002.
Eiji Mizutani.
Chapter 22: {Soft {Computing for {Color {Recipe {Prediction.
In Neuro-{Fuzzy and {Soft {Computing: a computational
approach to learning and machine intelligence, pages 568--594. J.-S. Roger
Jang, C.-T. Sun and E. Mizutani. Prentice Hall, 1997.
Y. Liu and X. Yao.
Simultaneous training of negatively correlated neural networks in an
ensemble.
IEEE Transactions on Systems, Man, and Cybernetics, Part B:
Cybernetics, 29(6):716--725, December 1999.
R. Vuduc, J. Demmel, K. Yelick, S. Kamil, R. Nishtala, and B. Lee.
Performance optimizations and bounds for sparse matrix-vector
multiply.
In Electronic Proceedings of the IEEE/ACM Conference on
Supercomputing, Baltimore, Maryland, USA, November 2002.
({\footnotesize{Available at
www.cs.berkeley.edu/$\widetilde{$richie/bebop/pubs/vuduc2002-smvm-bounds.pd%
f).
H. Drucker, C. Cortes, L. D. Jackel, Y. LeCun, and V. Vapnik.
Boosting and other ensemble methods.
Neural Computation, 6:1289--1301, 1994.
Trevor Hastie, Robert Tibshirani, and Jerome Friedman.
The Elements of Statistical Learning.
Springer-Verlag, 2001.
(Corrected printing 2002).
Linda Kaufman and Garrett Sylvester.
Separable nonlinear least squares with multiple right-hand sides.
SIAM J. Matrix Anal. Appl., 13(1):68--89, January 1992.
Eiji Mizutani and James W. Demmel.
On separable nonlinear least squares algorithms for neuro-fuzzy
modular network learning.
In Proceedings of the IEEE International Joint Conference on
Neural Networks, part of the World Congress on Computational Intelligence
(WCCI'02), volume 3, pages 2399--2404, Honolulu, USA, May 2002.
Linda Kaufman.
A variable projection method for solving separable nonlinear least
squares problems.
BIT, 15:49--57, 1975.
Gene H. Golub and Victor Pereyra.
The differentiation of pseudo-inverses and nonlinear least squares
problems whose variables separate.
SIAM Journal of Numerical Analysis, 10(2):413--432, April 1973.
Gene H. Golub and Victor Pereyra.
Differentiation of pseudoinverses, separable nonlinear least square
problems and other tales.
In M. Zuhair Nashed, editor, Generalized Inverses and
Applications, pages 303--324. Academic Press, 1976.
Gene H. Golub and Victor Pereyra.
The differentiation of pseudoinverses and nonlinear least squares
problems whose variables separate.
Technical Report STAN-CS-72-261, Computer Science Dept., Stanford
University, February 1972.
David Gay and Linda Kaufman.
Tradeoffs in algorithms for separable nonlinear least squares.
In Proceedings of the 13th World Congress on Computational and
Applied Mathematics (IMACS'91), pages 157--158, December 1990.
Eiji Mizutani and Kenichi Nishio.
Adaptive-network approaches to multi-illuminant color device
characterization of cmyg electronic cameras for color correction.
In Proceedings of the 10th Sony Research Forum, pages 51--55,
Tokyo JAPAN, Dec. 15 2000, 2000.
(Received an ``SRF 2000 Paper Award.'').
Timothy Masters.
Advanced algorithms for neural networks: a C++ sourcebook.
John Wiley \& Sons, New York, 1995.
Y. LeCun, B. Bose, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, and L.D.
Jackel.
Backpropagation applied to handwritten zip code recognition.
Neural Computation, 1:541--551, 1989.
Peter W. Frey and David J. Slate.
Letter recognition: Using holland-style adaptive classifiers.
Machine Learning, 6:161--182, 1991.
A. Wayne Whitney.
A direct method of nonparametric measurement selection.
IEEE Transactions on Computers, pages 1100--1103, 1971.
Dimitri P. Bertsekas and John N. Tsitsiklis.
Neuro-{Dynamic {Programming.
Athena Scientific, Belmont, MA, 1996.
Eiji Mizutani and James W. Demmel.
On sparsity-exploiting memory-efficient trust-region regularized
nonlinear least squares algorithms for neural-network learning.
In Proc. of the INNS-IEEE International Joint Conference on
Neural Networks (Vol.1), pages 242--247, Portland Oregon, USA, July 2003.
D. P. Bertsekas.
Incremental least squares methods and the extended kalman filter.
SIAM journal on Optimization, 1995.
W. C. Davidon.
New least squares algorithms.
Journal of Optimization Theory and Applications, 18:187--197,
1976.
Shun-Ichi Amari.
Natural gradient works efficiently in learning.
Neural Computation, 10:251--276, 1998.
H. Park, S. Amari, , and K. Fukumizu.
Adaptive natural gradient learning algorithms for various stochastic
models.
Neural Networks, 13:755--764, 2000.
P.E. Gill, G.H. Golub, W. Murray, and M.A. Saunders.
Methods for modifying matrix factorizations.
Mathematics of Computations, 28(126):505--535, 1974.
David W. Aha.
Generalizing from case studies: A case study.
Proceedings of the Ninth International Conference on Machine
Learning (MLC-92), pages 1--10, 1992.
Sumio Watanabe.
Algebraic analysis for singular statistical estimation.
Lecture Notes in Computer Science, 1720:39--50, 1999.
K. Fukumizu and S.-I. Amari.
Local minima and plateaus in hierarchical structures of multilayer
perceptrons.
Neural Networks, 13(3):317--327, 2000.
Shmuel S. Oren.
Self-scaling variable metric algorithms for unconstrained
minimization.
PhD thesis, Engineering Economic Systems, Stanford University, CA,
1972.
Shmuel S. Oren.
Self-scaling variable metric algorithms without line search for
unconstrained minimization.
Mathematics of Computation, 27:873--885, 1973.
Peter N. Brown and Youcef Saad.
Hybrid {Krylov methods for nonlinear systems of equations.
SIAM. J. Sci. Stat. Comput., 11:450--481, 1990.
Marielba Rojas.
A {Large-{Scale {Trust-{Region {Approach to the
{Regularization of {Discrete {Ill-{Posed {Problems.
PhD thesis, Dept. of Computational and Applied Mathematics, Rice
University, Houston, Texas, 1998.
Richard S. Sutton and Andrew G. Barto.
Reinforcement {Learning: {An {Introduction.
MIT Press, Cambridge, MA., 1998.
Eiji Mizutani.
Chapter 10: {Learning from {Reinforcement.
In Neuro-{Fuzzy and {Soft {Computing: a computational
approach to learning and machine intelligence, pages 258--300. J.-S. Roger
Jang, C.-T. Sun and E. Mizutani. Prentice Hall, 1997.
Eiji Mizutani and Stuart E. Dreyfus.
Totally model-free reinforcement learning by actor-critic {Elman
networks in non-{Markovian domains.
In Proceedings of the IEEE International Conference on Neural
Networks, part of the World Congress on Computational Intelligence
(Wcci'98), pages 2016 -- 2021, Alaska, USA, May 1998.
Peter Marbach.
Simulation-based optimization of {Markov decision processes.
PhD thesis, EECS Department, Massachusetts Institute of Technology,
Cambridge, MA 02139, September 1998.
Eiji Mizutani.
Sample path-based policy-only learning by actor neural networks.
In Proceedings of the IEEE International Conference on Neural
Networks (vol. 2), pages 1245--1250, Washington, D.C., July 1999.