Value function approximation in zero–sum Markov games

 
This item is provided by the institution :

Repository :
Institutional Repository Technical University of Crete
see the original item page
in the repository's web site and access all digital files if the item*
share




2002 (EN)

Value function approximation in zero–sum Markov games (EN)

Λαγουδακης Μιχαηλ (EL)
Lagoudakis Michael (EN)
Parr,R. (EN)

Πολυτεχνείο Κρήτης (EL)
Technical University of Crete (EN)

This paper investigates value function approximation in the context of zero-sum Markov games, which can be viewed as a generalization of the Markov decision process (MDP) framework to the two-agent case. We generalize error bounds from MDPs to Markov games and describe generalizations of reinforcement learning algorithms to Markov games. We present a generalization of the optimal stopping problem to a two-player simultaneous move Markov game. For this special problem, we provide stronger bounds and can guarantee convergence for LSTD and temporal difference learning with linear value function approximation. We demonstrate the viability of value function approximation for Markov games by using the Least squares policy iteration (LSPI) algorithm to learn good policies for a soccer domain and a flow control problem. (EN)

full paper
conferenceItem

Artificial Intelligence (EN)


18th Conference on Uncertainty in Artificial Intelligence (EL)

English

2002





*Institutions are responsible for keeping their URLs functional (digital file, item page in repository site)