An efficient hardware implementation of reinforcement learning: The q-learning algorithm

Sergio Spanò*, Gian Carlo Cardarilli, Luca Di Nunzio, Rocco Fazzolari, Daniele Giardino, Marco Matta, Alberto Nannarelli, Marco Re

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

38 Downloads (Pure)

Abstract

In this paper we propose an efficient hardware architecture that implements the Q-Learning algorithm, suitable for real-time applications. Its main features are low-power, high throughput and limited hardware resources. We also propose a technique based on approximated multipliers to reduce the hardware complexity of the algorithm. We implemented the design on a Xilinx Zynq Ultrascale+ MPSoC ZCU106 Evaluation Kit. The implementation results are evaluated in terms of hardware resources, throughput and power consumption. The architecture is compared to the state of the art of Q-Learning hardware accelerators presented in the literature obtaining better results in speed, power and hardware resources. Experiments using different sizes for the Q-Matrix and different wordlengths for the fixed point arithmetic are presented. With a Q-Matrix of size 8 × 4 (8 bit data) we achieved a throughput of 222 MSPS (Mega Samples Per Second) and a dynamic power consumption of 37 mW, while with a Q-Matrix of size 256 × 16 (32 bit data) we achieved a throughput of 93 MSPS and a power consumption 611 mW. Due to the small amount of hardware resources required by the accelerator, our system is suitable for multi-agent IoT applications. Moreover, the architecture can be used to implement the SARSA (State-Action-Reward-State-Action) Reinforcement Learning algorithm with minor modifications.

Original languageEnglish
Article number8937555
JournalIEEE Access
Volume7
Pages (from-to)186340-186351
ISSN2169-3536
DOIs
Publication statusPublished - 1 Jan 2019

Keywords

  • Artificial intelligence
  • ASIC
  • FPGA
  • Hardware accelerator
  • IoT
  • Machine learning
  • Multi-agent
  • Q-learning
  • Reinforcement learning
  • SARSA

Cite this

Spanò, S., Cardarilli, G. C., Di Nunzio, L., Fazzolari, R., Giardino, D., Matta, M., Nannarelli, A., & Re, M. (2019). An efficient hardware implementation of reinforcement learning: The q-learning algorithm. IEEE Access, 7, 186340-186351. [8937555]. https://doi.org/10.1109/ACCESS.2019.2961174