Abstract
Reinforcement learning policies are often represented by neural networks, but programmatic policies are preferred in some cases because they are more interpretable, amenable to formal verification, or generalize better. While efficient algorithms for learning neural policies exist, learning programmatic policies is challenging. Combining imitation-projection and dataset aggregation with a local search heuristic, we present a simple and direct approach to extracting a programmatic policy from a pretrained neural policy. After examining our local search heuristic on a programming by example problem, we demonstrate our programmatic policy extraction method on a pendulum swing-up problem. Both when trained using a hand crafted expert policy and a learned neural policy, our method discovers simple and interpretable policies that perform almost as well as the original.
Original language | English |
---|---|
Title of host publication | Inductive Logic Programming |
Publisher | Springer |
Publication date | 2022 |
Pages | 156–166 |
ISBN (Print) | 978-3-030-97453-4 |
DOIs | |
Publication status | Published - 2022 |
Event | 30th International Conference on Inductive Logic Programming - Virtual Event, Athens, Greece Duration: 25 Oct 2021 → 27 Oct 2021 http://lr2020.iit.demokritos.gr/ilp/ |
Conference
Conference | 30th International Conference on Inductive Logic Programming |
---|---|
Location | Virtual Event |
Country/Territory | Greece |
City | Athens |
Period | 25/10/2021 → 27/10/2021 |
Internet address |
Series | Lecture Notes in Computer Science |
---|---|
Volume | 13191 |
ISSN | 0302-9743 |
Keywords
- Program synthesis
- Reinforcement learning
- Hindley-Milner type system
- Neighborhood search