Clustering and classification of energy meter data: A comparison analysis of data from individual homes and the aggregated data from multiple homes

Juan Sala, Rongling Li*, Morten H. Christensen

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

The transition towards a more sustainable environment requires the development of new control systems on the demand side to integrate renewable energy sources into the energy systems. For this purpose, energy meter data of homes have been broadly used in modelling, forecast and optimal control of energy use. However, usability and reliability of household energy meter data have not been specifically addressed. In this study, we apply commonly used machine learning methods on the heating consumption data of (1) two individual homes in an apartment building and (2) the district heating substation of the apartment building which includes 72 homes, to identify how the characteristics of data affect the result of data analysis. Two clustering approaches were applied using the K-means algorithm to group similar heating daily profiles. Using the clustering results, different classification algorithms such as logistic regression and random forest were applied to predict the heating consumption level with regards to the weather conditions. The data analysis process showed that the substation data which is the aggregated heating consumption of the 72 homes is more reliable and valid for energy prediction than the data from two individual homes. This is due to the large variation and uncertainty in the daily energy use of individual homes.
Original languageEnglish
JournalBuilding Simulation
Number of pages15
ISSN1996-3599
DOIs
Publication statusAccepted/In press - 2019

Keywords

  • Smart meter data
  • Daily profile
  • Clustering
  • Classification
  • Data usability
  • Heat substation

Cite this

@article{3e52e73e9d854500b34f2a36796e2a7b,
title = "Clustering and classification of energy meter data: A comparison analysis of data from individual homes and the aggregated data from multiple homes",
abstract = "The transition towards a more sustainable environment requires the development of new control systems on the demand side to integrate renewable energy sources into the energy systems. For this purpose, energy meter data of homes have been broadly used in modelling, forecast and optimal control of energy use. However, usability and reliability of household energy meter data have not been specifically addressed. In this study, we apply commonly used machine learning methods on the heating consumption data of (1) two individual homes in an apartment building and (2) the district heating substation of the apartment building which includes 72 homes, to identify how the characteristics of data affect the result of data analysis. Two clustering approaches were applied using the K-means algorithm to group similar heating daily profiles. Using the clustering results, different classification algorithms such as logistic regression and random forest were applied to predict the heating consumption level with regards to the weather conditions. The data analysis process showed that the substation data which is the aggregated heating consumption of the 72 homes is more reliable and valid for energy prediction than the data from two individual homes. This is due to the large variation and uncertainty in the daily energy use of individual homes.",
keywords = "Smart meter data, Daily profile, Clustering, Classification, Data usability, Heat substation",
author = "Juan Sala and Rongling Li and Christensen, {Morten H.}",
year = "2019",
doi = "10.1007/s12273-019-0587-4",
language = "English",
journal = "Building Simulation",
issn = "1996-3599",
publisher = "Tsinghua University Press",

}

TY - JOUR

T1 - Clustering and classification of energy meter data: A comparison analysis of data from individual homes and the aggregated data from multiple homes

AU - Sala, Juan

AU - Li, Rongling

AU - Christensen, Morten H.

PY - 2019

Y1 - 2019

N2 - The transition towards a more sustainable environment requires the development of new control systems on the demand side to integrate renewable energy sources into the energy systems. For this purpose, energy meter data of homes have been broadly used in modelling, forecast and optimal control of energy use. However, usability and reliability of household energy meter data have not been specifically addressed. In this study, we apply commonly used machine learning methods on the heating consumption data of (1) two individual homes in an apartment building and (2) the district heating substation of the apartment building which includes 72 homes, to identify how the characteristics of data affect the result of data analysis. Two clustering approaches were applied using the K-means algorithm to group similar heating daily profiles. Using the clustering results, different classification algorithms such as logistic regression and random forest were applied to predict the heating consumption level with regards to the weather conditions. The data analysis process showed that the substation data which is the aggregated heating consumption of the 72 homes is more reliable and valid for energy prediction than the data from two individual homes. This is due to the large variation and uncertainty in the daily energy use of individual homes.

AB - The transition towards a more sustainable environment requires the development of new control systems on the demand side to integrate renewable energy sources into the energy systems. For this purpose, energy meter data of homes have been broadly used in modelling, forecast and optimal control of energy use. However, usability and reliability of household energy meter data have not been specifically addressed. In this study, we apply commonly used machine learning methods on the heating consumption data of (1) two individual homes in an apartment building and (2) the district heating substation of the apartment building which includes 72 homes, to identify how the characteristics of data affect the result of data analysis. Two clustering approaches were applied using the K-means algorithm to group similar heating daily profiles. Using the clustering results, different classification algorithms such as logistic regression and random forest were applied to predict the heating consumption level with regards to the weather conditions. The data analysis process showed that the substation data which is the aggregated heating consumption of the 72 homes is more reliable and valid for energy prediction than the data from two individual homes. This is due to the large variation and uncertainty in the daily energy use of individual homes.

KW - Smart meter data

KW - Daily profile

KW - Clustering

KW - Classification

KW - Data usability

KW - Heat substation

U2 - 10.1007/s12273-019-0587-4

DO - 10.1007/s12273-019-0587-4

M3 - Journal article

JO - Building Simulation

JF - Building Simulation

SN - 1996-3599

ER -