TY - JOUR
T1 - Smart Meter Data Analytics: Systems, Algorithms and Benchmarking
AU - Liu, Xiufeng
AU - Golab, Lukasz
AU - Golab, Wojciech
AU - Ilyas, Ihab F.
AU - Jin, Shichao
N1 - Accepted by Journal of IEEE Transaction of Database Systems<br/>This journal http://tods.acm.org/
PY - 2016
Y1 - 2016
N2 - Smart electricity meters have been replacing conventional meters worldwide, enabling automated collection of fine-grained (e.g., every 15 minutes or hourly) consumption data. A variety of smart meter analytics algorithms and applications have been proposed, mainly in the smart grid literature. However, the focus has been on what can be done with the data rather than how to do it efficiently. In this paper, we examine smart meter analytics from a software performance perspective. First, we design a performance benchmark that includes common smart meter analytics tasks. These include off-line feature extraction and model building as well a framework for on-line anomaly detection that we propose. Second, since obtaining real smart meter data is difficult due to privacy issues, we present an algorithm for generating large realistic data sets from a small seed of real data. Third, we implement the proposed benchmark using five representative platforms: a traditional numeric computing platform (Matlab), a relational DBMS with a built-in machine learning toolkit (PostgreSQL/MADlib), a main-memory column store (“System C”), and two distributed data processing platforms (Hive and Spark/Spark Streaming). We compare the five platforms in terms of application development effort and performance on a multicore machine as well as a cluster of 16 commodity servers.
AB - Smart electricity meters have been replacing conventional meters worldwide, enabling automated collection of fine-grained (e.g., every 15 minutes or hourly) consumption data. A variety of smart meter analytics algorithms and applications have been proposed, mainly in the smart grid literature. However, the focus has been on what can be done with the data rather than how to do it efficiently. In this paper, we examine smart meter analytics from a software performance perspective. First, we design a performance benchmark that includes common smart meter analytics tasks. These include off-line feature extraction and model building as well a framework for on-line anomaly detection that we propose. Second, since obtaining real smart meter data is difficult due to privacy issues, we present an algorithm for generating large realistic data sets from a small seed of real data. Third, we implement the proposed benchmark using five representative platforms: a traditional numeric computing platform (Matlab), a relational DBMS with a built-in machine learning toolkit (PostgreSQL/MADlib), a main-memory column store (“System C”), and two distributed data processing platforms (Hive and Spark/Spark Streaming). We compare the five platforms in terms of application development effort and performance on a multicore machine as well as a cluster of 16 commodity servers.
UR - http://tods.acm.org/
U2 - 10.1145/3004295
DO - 10.1145/3004295
M3 - Journal article
SN - 0362-5915
VL - 42
JO - A C M Transactions on Database Systems
JF - A C M Transactions on Database Systems
IS - 1
ER -