Addressing partial observability in reinforcement learning for energy management

Marco Biemann, Xiufeng Liu, Yifeng Zeng, Lizhen Huang

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

263 Downloads (Pure)


Automatic control of energy systems is affected by the uncertainties of multiple factors, including weather, prices and human activities. The literature relies on Markov-based control, taking only into account the current state. This impacts control performance, as previous states give additional context for decision making. We present two ways to learn non-Markovian policies, based on recurrent neural networks and variational inference. We evaluate the methods on a simulated data centre HVAC control task. The results show that the off-policy stochastic latent actor-critic algorithm can maintain the temperature in the predefined range within three months of training without prior knowledge while reducing energy consumption compared to Markovian policies by more than 5%.
Original languageEnglish
Title of host publicationProceedings of the 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation
Number of pages5
Publication date2021
Publication statusPublished - 2021


Dive into the research topics of 'Addressing partial observability in reinforcement learning for energy management'. Together they form a unique fingerprint.

Cite this