Using Reinforcement Learning in Chess Engines
Related Work
Temporal difference was first tested in the program SAL by Michael Gherrity [5].
The structure of SAL allows the realisation of a move generation for different
games and the determination of the best next move using a search-tree based algorithm.
SAL learns good and bad moves from the played games. The evaluation
of individual moves is performed using an artificial neuronal network. TD was
used for the optimization of the networks parameters by comparing the evaluation
values for the root nodes of the search tree. In a test against the prominent
chess program GNUChess [18], where SAL was using 1031 position evaluation
factors, 8 remis could be achieved in 4200 games (while the rest was lost).
The chess program NeuroChess, developed by Sebastian Thrun, also uses a
neuronal network as position evaluation and a TD-method based on the root
nodes to modify the coefficients. In contrast to SAL, NeuroChess only learns
from itself. Games from a grand master database have mostly been used as entry
points of the learning process (90%), while only 10% of the training games
where played from the initial positioning. Later experiments with other programs
showed that a learning strategy based on playing against oneself, does
not yield satisfying results. In an experiment against GNUChess, where both
programs where calculating a move depth of 2 and using the same evaluation,
316 out of 2400 games could be won by NeuroChess and the learned coefficients.
Thrun, the main developer of NeuroChess admitted two fundamental problems
of his approach: the large training time and the incompleteness of the evaluation
coefficients. Thrun concludes that it is unclear whether TD-based solutions will ever find usage in chess programming
link:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.218.9867&rep=rep1&type=pdf
Burada yeni gibi görünüyorsunuz. Eğer katılmak istiyorsanız düğmelerden birine tıklayınız.