Rollout sampling approximate policy iteration

This item is provided by the institution :

Technical University of Crete

Repository :
Institutional Repository Technical University of Crete

see the original item page
in the repository's web site and access all digital files if the item^*

Semantic enrichment by EKT

ΕΚΤ item type

Journal part (EN)

Scientific article (EN)

EKT year

2008 (EN)

EKT historical period

Title

Rollout sampling approximate policy iteration (EN)

Creator

Λαγουδακης Μιχαηλ (EL)

Lagoudakis Michael (EN)

Dimitrakakis Christos (EN)

Contributor

Πολυτεχνείο Κρήτης (EL)

Technical University of Crete (EN)

Description

Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervised learning problem. This paper proposes variants of an improved policy iteration scheme which addresses the core sampling problem in evaluating a policy through simulation as a multi-armed bandit machine. The resulting algorithm offers comparable performance to the previous algorithm achieved, however, with significantly less computational effort. An order of magnitude improvement is demonstrated experimentally in two standard reinforcement learning domains: inverted pendulum and mountain-car. (EN)

Type

journalArticle

Subject

Approximate policy iteration (EN)

Reinforcement learning (EN)

Rollouts (EN)

Classification (EN)

Sample complexity (EN)

Bandit problems (EN)

Provider

Technical University of Crete

Repository / collection

Institutional Repository Technical University of Crete

Subcollections

School of Production Engineering and Management - Journal Publications

Journal

Machine Learning (EN)

Language

English

Issued

2008

Identifier

http://purl.tuc.gr/dl/dias/157117EC-5401-47A1-B453-9D39AAFFC2E210.1007/s10994-008-5069-3

Publisher

Springer Verlag (EN)

Rollout sampling approximate policy iteration

Βοηθείστε μας να κάνουμε καλύτερο το OpenArchives.gr.