Reinforcement learning to improve flexibility of building energy management

Research output: Book/ReportPh.D. thesis

63 Downloads (Pure)


The reduction of energy consumption in buildings plays a central role in achieving carbon neutrality by 2050, and can be accomplished by increasing the proportion of renewable energy sources and electrifying
the heating supply. However, an increased electrification may lead to challenges for the power grid due to energy supply intermittency and uncertainty. To mitigate this, measures such as demand response are
implemented to incentivise energy end-users to shift electricity demand and store energy during periods of low prices. However, traditional controllers are not suited to achieving flexibility objectives in this context. This thesis endeavours to investigate the application of reinforcement learning (RL) controllers to solve present and future challenges in the energy sector, with a special emphasis on buildings.

RL algorithms gained prominence by their capacity to master complex board games like Go and to learn to execute tasks by simply interacting within an environment. This makes RL a compelling approach, as the controller can in principle adapt its behaviour over time without the need of designing thermodynamic models of buildings, which requires domain knowledge and does not scale well to a large number of buildings. Multiple simulated case studies show that RL algorithms can learn the desired behaviour in the energy sector. Nevertheless, to apply such controllers practically in the real world, a number of challenges must be overcome, including data efficiency, safety guarantees and a stable learning process.

Firstly, we seek to determine which model-free RL algorithms are best suited to address the domain-specific challenges encountered in the energy sector, which includes perturbations to a variety of stochastic, non-stationary exogenous variables. To this end, we conduct experiments in a data centre cooling case study, where the controller cools down the server rooms to maintain temperatures within a suitable operating range, while also reducing electricity consumption. Our findings show that the soft actor-critic algorithm performs particularly well, showing robustness to unforeseen disturbances and maintaining stable temperature control. The experiments indicate that it may not be desirable to optimise the conventional discounted infinite horizon objective, but also to introduce entropy regularisation. While this may result in inferior final performance, it can diminish the risk of temperature violations, which plays an important role in the real world.

Secondly, we extend our previous experiments to optimise objectives more aligned with the expectations of economic actors, which could provide benefits to the electrical grid. Instead of focusing on reducing electricity consumption, we strive to minimise costs with real-time electricity prices. However, applying the previous methods that presume Markovian dynamics proved ineffective. The challenge stems from a time-dependent reward function, where knowledge of previous prices is indispensable to successfully learning to shift the demand over time. To surmount this hurdle, we account for imperfect information by using a recurrent neural network architecture in the policy network, which provides the agent with the knowledge of past observations for decisionmaking. This method can capture the temporal dependencies in price and weather data. While this can enhance the performance in the conventional setting of minimising consumption, we observed it to be vital when optimising costs. By using the soft-actor critic algorithm to minimise costs, we can decrease electricity bills compared to RL agents that aim to minimise consumption, while maintaining temperatures within the appropriate range.

Lastly, while model-free algorithms can learn to shift demand, achieving this may require a significant amount of tuning effort. Therefore, a better way to make use of available predictions, such as prices from the
day-ahead market, while still leveraging the adaptation potential of RL algorithms, is desirable. Model-based RL has the potential to achieve this, where a model of the environment is learned while interacting in
the environment and used to plan actions, similar to model predictive control (MPC). This approach is generally more data-efficient, can ad-dress operational constraints and use predictions to make better decisions. We conduct a review of the applications of model-based RL and related ideas in the energy sector, with a particular focus on uncertainty quantification, as it is critical to successfully learn models online successfully. This aims to bring together the MPC and RL communities to combine the strengths of both fields and enhance the flexibility of the power grid.

This thesis aims to provide contribution to the existing literature in identifying and addressing key challenges of RL algorithms specific to the energy sector. The importance of achieving data efficiency, robustness and handling partial observability, especially in demand response scenarios, is emphasised. Our findings demonstrate the promising potential of RL algorithms in building energy management. However, several obstacles must be overcome before these controllers can be implemented effectively in the real world. We advocate for further research in modelbased RL approaches, which can partially address these challenges.
Original languageEnglish
PublisherTechnical University of Denmark
Number of pages225
Publication statusPublished - 2023


Dive into the research topics of 'Reinforcement learning to improve flexibility of building energy management'. Together they form a unique fingerprint.

Cite this