An efficient hardware implementation of reinforcement learning: The q-learning algorithm

Sergio Spanò*, Gian Carlo Cardarilli, Luca Di Nunzio, Rocco Fazzolari, Daniele Giardino, Marco Matta, Alberto Nannarelli, Marco Re

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

101 Downloads (Pure)


In this paper we propose an efficient hardware architecture that implements the Q-Learning algorithm, suitable for real-time applications. Its main features are low-power, high throughput and limited hardware resources. We also propose a technique based on approximated multipliers to reduce the hardware complexity of the algorithm. We implemented the design on a Xilinx Zynq Ultrascale+ MPSoC ZCU106 Evaluation Kit. The implementation results are evaluated in terms of hardware resources, throughput and power consumption. The architecture is compared to the state of the art of Q-Learning hardware accelerators presented in the literature obtaining better results in speed, power and hardware resources. Experiments using different sizes for the Q-Matrix and different wordlengths for the fixed point arithmetic are presented. With a Q-Matrix of size 8 × 4 (8 bit data) we achieved a throughput of 222 MSPS (Mega Samples Per Second) and a dynamic power consumption of 37 mW, while with a Q-Matrix of size 256 × 16 (32 bit data) we achieved a throughput of 93 MSPS and a power consumption 611 mW. Due to the small amount of hardware resources required by the accelerator, our system is suitable for multi-agent IoT applications. Moreover, the architecture can be used to implement the SARSA (State-Action-Reward-State-Action) Reinforcement Learning algorithm with minor modifications.

Original languageEnglish
Article number8937555
JournalIEEE Access
Pages (from-to)186340-186351
Publication statusPublished - 1 Jan 2019


  • Artificial intelligence
  • ASIC
  • FPGA
  • Hardware accelerator
  • IoT
  • Machine learning
  • Multi-agent
  • Q-learning
  • Reinforcement learning

Fingerprint Dive into the research topics of 'An efficient hardware implementation of reinforcement learning: The q-learning algorithm'. Together they form a unique fingerprint.

Cite this