Abstract
Automatic control of energy systems is affected by the uncertainties of multiple factors, including weather, prices and human activities. The literature relies on Markov-based control, taking only into account the current state. This impacts control performance, as previous states give additional context for decision making. We present two ways to learn non-Markovian policies, based on recurrent neural networks and variational inference. We evaluate the methods on a simulated data centre HVAC control task. The results show that the off-policy stochastic latent actor-critic algorithm can maintain the temperature in the predefined range within three months of training without prior knowledge while reducing energy consumption compared to Markovian policies by more than 5%.
Original language | English |
---|---|
Title of host publication | Proceedings of the 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation |
Number of pages | 5 |
Publication date | 2021 |
Pages | 324-328 |
DOIs | |
Publication status | Published - 2021 |