Learning to Plan from Raw Data in Grid-based Games

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

10 Downloads (Pure)

Abstract

An agent that autonomously learns to act in its environment must acquire a model of the domain dynamics. This can be a challenging task, especially in real-world domains, where observations are high-dimensional and noisy. Although in automated planning the dynamics are typically given, there are action schema learning approaches that learn symbolic rules (e.g. STRIPS or PDDL) to be used by traditional planners. However, these algorithms rely on logical descriptions of environment observations. In contrast, recent methods in deep reinforcement learning for games learn from pixel observations. However, they typically do not acquire an environment model, but a policy for one-step action selection. Even when a model is learned, it cannot generalize to unseen instances of the training domain. Here we propose a neural network-based method that learns from visual observations an approximate, compact, implicit representation of the domain dynamics, which can be used for planning with standard search algorithms, and generalizes to novel domain instances. The learned model is composed of submodules, each implicitly representing an action schema in the traditional sense. We evaluate our approach on visual versions of the standard domain Sokoban, and show that, by training on one single instance, it learns a transition model that can be successfully used to solve new levels of the game.
Original languageEnglish
Title of host publicationProceedings of 4th Global Conference on Artificial Intelligence
Publication date2019
Pages54–67
DOIs
Publication statusPublished - 2019
Event4th Global Conference on Artificial Intelligence - Luxembourg, Luxembourg
Duration: 18 Sep 201821 Sep 2018

Conference

Conference4th Global Conference on Artificial Intelligence
CountryLuxembourg
CityLuxembourg
Period18/09/201821/09/2018
SeriesEPiC Series in Computing
Volume55

Cite this

Dittadi, A., Bolander, T., & Winther, O. (2019). Learning to Plan from Raw Data in Grid-based Games. In Proceedings of 4th Global Conference on Artificial Intelligence (pp. 54–67). EPiC Series in Computing, Vol.. 55 https://doi.org/10.29007/s8jk
Dittadi, Andrea ; Bolander, Thomas ; Winther, Ole. / Learning to Plan from Raw Data in Grid-based Games. Proceedings of 4th Global Conference on Artificial Intelligence. 2019. pp. 54–67 (EPiC Series in Computing, Vol. 55).
@inproceedings{7c88d72416824aeba6e8419aa5816f7f,
title = "Learning to Plan from Raw Data in Grid-based Games",
abstract = "An agent that autonomously learns to act in its environment must acquire a model of the domain dynamics. This can be a challenging task, especially in real-world domains, where observations are high-dimensional and noisy. Although in automated planning the dynamics are typically given, there are action schema learning approaches that learn symbolic rules (e.g. STRIPS or PDDL) to be used by traditional planners. However, these algorithms rely on logical descriptions of environment observations. In contrast, recent methods in deep reinforcement learning for games learn from pixel observations. However, they typically do not acquire an environment model, but a policy for one-step action selection. Even when a model is learned, it cannot generalize to unseen instances of the training domain. Here we propose a neural network-based method that learns from visual observations an approximate, compact, implicit representation of the domain dynamics, which can be used for planning with standard search algorithms, and generalizes to novel domain instances. The learned model is composed of submodules, each implicitly representing an action schema in the traditional sense. We evaluate our approach on visual versions of the standard domain Sokoban, and show that, by training on one single instance, it learns a transition model that can be successfully used to solve new levels of the game.",
author = "Andrea Dittadi and Thomas Bolander and Ole Winther",
year = "2019",
doi = "10.29007/s8jk",
language = "English",
pages = "54–67",
booktitle = "Proceedings of 4th Global Conference on Artificial Intelligence",

}

Dittadi, A, Bolander, T & Winther, O 2019, Learning to Plan from Raw Data in Grid-based Games. in Proceedings of 4th Global Conference on Artificial Intelligence. EPiC Series in Computing, vol. 55, pp. 54–67, 4th Global Conference on Artificial Intelligence, Luxembourg, Luxembourg, 18/09/2018. https://doi.org/10.29007/s8jk

Learning to Plan from Raw Data in Grid-based Games. / Dittadi, Andrea; Bolander, Thomas; Winther, Ole.

Proceedings of 4th Global Conference on Artificial Intelligence. 2019. p. 54–67 (EPiC Series in Computing, Vol. 55).

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

TY - GEN

T1 - Learning to Plan from Raw Data in Grid-based Games

AU - Dittadi, Andrea

AU - Bolander, Thomas

AU - Winther, Ole

PY - 2019

Y1 - 2019

N2 - An agent that autonomously learns to act in its environment must acquire a model of the domain dynamics. This can be a challenging task, especially in real-world domains, where observations are high-dimensional and noisy. Although in automated planning the dynamics are typically given, there are action schema learning approaches that learn symbolic rules (e.g. STRIPS or PDDL) to be used by traditional planners. However, these algorithms rely on logical descriptions of environment observations. In contrast, recent methods in deep reinforcement learning for games learn from pixel observations. However, they typically do not acquire an environment model, but a policy for one-step action selection. Even when a model is learned, it cannot generalize to unseen instances of the training domain. Here we propose a neural network-based method that learns from visual observations an approximate, compact, implicit representation of the domain dynamics, which can be used for planning with standard search algorithms, and generalizes to novel domain instances. The learned model is composed of submodules, each implicitly representing an action schema in the traditional sense. We evaluate our approach on visual versions of the standard domain Sokoban, and show that, by training on one single instance, it learns a transition model that can be successfully used to solve new levels of the game.

AB - An agent that autonomously learns to act in its environment must acquire a model of the domain dynamics. This can be a challenging task, especially in real-world domains, where observations are high-dimensional and noisy. Although in automated planning the dynamics are typically given, there are action schema learning approaches that learn symbolic rules (e.g. STRIPS or PDDL) to be used by traditional planners. However, these algorithms rely on logical descriptions of environment observations. In contrast, recent methods in deep reinforcement learning for games learn from pixel observations. However, they typically do not acquire an environment model, but a policy for one-step action selection. Even when a model is learned, it cannot generalize to unseen instances of the training domain. Here we propose a neural network-based method that learns from visual observations an approximate, compact, implicit representation of the domain dynamics, which can be used for planning with standard search algorithms, and generalizes to novel domain instances. The learned model is composed of submodules, each implicitly representing an action schema in the traditional sense. We evaluate our approach on visual versions of the standard domain Sokoban, and show that, by training on one single instance, it learns a transition model that can be successfully used to solve new levels of the game.

U2 - 10.29007/s8jk

DO - 10.29007/s8jk

M3 - Article in proceedings

SP - 54

EP - 67

BT - Proceedings of 4th Global Conference on Artificial Intelligence

ER -

Dittadi A, Bolander T, Winther O. Learning to Plan from Raw Data in Grid-based Games. In Proceedings of 4th Global Conference on Artificial Intelligence. 2019. p. 54–67. (EPiC Series in Computing, Vol. 55). https://doi.org/10.29007/s8jk