Smart Meter Data Analytics: Systems, Algorithms and Benchmarking

Xiufeng Liu, Lukasz Golab, Wojciech Golab, Ihab F. Ilyas, Shichao Jin

    Research output: Contribution to journalJournal articleResearchpeer-review

    Abstract

    Smart electricity meters have been replacing conventional meters worldwide, enabling automated collection of fine-grained (e.g., every 15 minutes or hourly) consumption data. A variety of smart meter analytics algorithms and applications have been proposed, mainly in the smart grid literature. However, the focus has been on what can be done with the data rather than how to do it efficiently. In this paper, we examine smart meter analytics from a software performance perspective. First, we design a performance benchmark that includes common smart meter analytics tasks. These include off-line feature extraction and model building as well a framework for on-line anomaly detection that we propose. Second, since obtaining real smart meter data is difficult due to privacy issues, we present an algorithm for generating large realistic data sets from a small seed of real data. Third, we implement the proposed benchmark using five representative platforms: a traditional numeric computing platform (Matlab), a relational DBMS with a built-in machine learning toolkit (PostgreSQL/MADlib), a main-memory column store (“System C”), and two distributed data processing platforms (Hive and Spark/Spark Streaming). We compare the five platforms in terms of application development effort and performance on a multicore machine as well as a cluster of 16 commodity servers.
    Original languageEnglish
    JournalA C M Transactions on Database Systems
    Volume42
    Issue number1
    Number of pages38
    ISSN0362-5915
    DOIs
    Publication statusPublished - 2016

    Bibliographical note

    Accepted by Journal of IEEE Transaction of Database Systems<br/>This journal http://tods.acm.org/

    Cite this

    Liu, Xiufeng ; Golab, Lukasz ; Golab, Wojciech ; Ilyas, Ihab F. ; Jin, Shichao. / Smart Meter Data Analytics: Systems, Algorithms and Benchmarking. In: A C M Transactions on Database Systems. 2016 ; Vol. 42, No. 1.
    @article{13fe87ffe2ba46768d38758065df3871,
    title = "Smart Meter Data Analytics: Systems, Algorithms and Benchmarking",
    abstract = "Smart electricity meters have been replacing conventional meters worldwide, enabling automated collection of fine-grained (e.g., every 15 minutes or hourly) consumption data. A variety of smart meter analytics algorithms and applications have been proposed, mainly in the smart grid literature. However, the focus has been on what can be done with the data rather than how to do it efficiently. In this paper, we examine smart meter analytics from a software performance perspective. First, we design a performance benchmark that includes common smart meter analytics tasks. These include off-line feature extraction and model building as well a framework for on-line anomaly detection that we propose. Second, since obtaining real smart meter data is difficult due to privacy issues, we present an algorithm for generating large realistic data sets from a small seed of real data. Third, we implement the proposed benchmark using five representative platforms: a traditional numeric computing platform (Matlab), a relational DBMS with a built-in machine learning toolkit (PostgreSQL/MADlib), a main-memory column store (“System C”), and two distributed data processing platforms (Hive and Spark/Spark Streaming). We compare the five platforms in terms of application development effort and performance on a multicore machine as well as a cluster of 16 commodity servers.",
    author = "Xiufeng Liu and Lukasz Golab and Wojciech Golab and Ilyas, {Ihab F.} and Shichao Jin",
    note = "Accepted by Journal of IEEE Transaction of Database Systems<br/>This journal http://tods.acm.org/",
    year = "2016",
    doi = "10.1145/3004295",
    language = "English",
    volume = "42",
    journal = "A C M Transactions on Database Systems",
    issn = "0362-5915",
    publisher = "Association for Computing Machinery",
    number = "1",

    }

    Smart Meter Data Analytics: Systems, Algorithms and Benchmarking. / Liu, Xiufeng; Golab, Lukasz ; Golab, Wojciech ; Ilyas, Ihab F. ; Jin, Shichao.

    In: A C M Transactions on Database Systems, Vol. 42, No. 1, 2016.

    Research output: Contribution to journalJournal articleResearchpeer-review

    TY - JOUR

    T1 - Smart Meter Data Analytics: Systems, Algorithms and Benchmarking

    AU - Liu, Xiufeng

    AU - Golab, Lukasz

    AU - Golab, Wojciech

    AU - Ilyas, Ihab F.

    AU - Jin, Shichao

    N1 - Accepted by Journal of IEEE Transaction of Database Systems<br/>This journal http://tods.acm.org/

    PY - 2016

    Y1 - 2016

    N2 - Smart electricity meters have been replacing conventional meters worldwide, enabling automated collection of fine-grained (e.g., every 15 minutes or hourly) consumption data. A variety of smart meter analytics algorithms and applications have been proposed, mainly in the smart grid literature. However, the focus has been on what can be done with the data rather than how to do it efficiently. In this paper, we examine smart meter analytics from a software performance perspective. First, we design a performance benchmark that includes common smart meter analytics tasks. These include off-line feature extraction and model building as well a framework for on-line anomaly detection that we propose. Second, since obtaining real smart meter data is difficult due to privacy issues, we present an algorithm for generating large realistic data sets from a small seed of real data. Third, we implement the proposed benchmark using five representative platforms: a traditional numeric computing platform (Matlab), a relational DBMS with a built-in machine learning toolkit (PostgreSQL/MADlib), a main-memory column store (“System C”), and two distributed data processing platforms (Hive and Spark/Spark Streaming). We compare the five platforms in terms of application development effort and performance on a multicore machine as well as a cluster of 16 commodity servers.

    AB - Smart electricity meters have been replacing conventional meters worldwide, enabling automated collection of fine-grained (e.g., every 15 minutes or hourly) consumption data. A variety of smart meter analytics algorithms and applications have been proposed, mainly in the smart grid literature. However, the focus has been on what can be done with the data rather than how to do it efficiently. In this paper, we examine smart meter analytics from a software performance perspective. First, we design a performance benchmark that includes common smart meter analytics tasks. These include off-line feature extraction and model building as well a framework for on-line anomaly detection that we propose. Second, since obtaining real smart meter data is difficult due to privacy issues, we present an algorithm for generating large realistic data sets from a small seed of real data. Third, we implement the proposed benchmark using five representative platforms: a traditional numeric computing platform (Matlab), a relational DBMS with a built-in machine learning toolkit (PostgreSQL/MADlib), a main-memory column store (“System C”), and two distributed data processing platforms (Hive and Spark/Spark Streaming). We compare the five platforms in terms of application development effort and performance on a multicore machine as well as a cluster of 16 commodity servers.

    UR - http://tods.acm.org/

    U2 - 10.1145/3004295

    DO - 10.1145/3004295

    M3 - Journal article

    VL - 42

    JO - A C M Transactions on Database Systems

    JF - A C M Transactions on Database Systems

    SN - 0362-5915

    IS - 1

    ER -