A semi-automated approach to validation and error diagnostics of water network data

Jonas Kjeld Kirstein*, Klavs Høgh, Martin Rygaard, Morten Borup

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

We propose a method for quality assurance of raw data from water distribution networks in near real-time. Well-known and novel data analysis methods, including a timestamp drift test, are combined to produce a malfunction indicator database for diagnosing anomalies within data acquisition practices. The method was applied to 112 flow and 111 pressure data sets, covering on average 32 months, located throughout the distribution networks of three Danish utilities. Around 10% of measurements in the utilities’ meter data sets were absent and 3–35% were categorized as dubious or erroneous. The most common types of anomalies for flow and pressure data were flatline and time stamp inconsistencies. Time drifts were identified in all three utilities and a similarity analysis revealed a simultaneous occurrence of many anomalies. These high rates could have been avoided if the proposed method had been implemented to automatically highlight meter errors and system-wide problems in data collection.
Original languageEnglish
JournalUrban Water Journal
Volume16
Issue number1
Pages (from-to)1-10
ISSN1573-062X
DOIs
Publication statusPublished - 2019

Keywords

  • Data validation
  • error diagnostics
  • water supply

Cite this

@article{a8970aba84e643b183dd1098d5d348e0,
title = "A semi-automated approach to validation and error diagnostics of water network data",
abstract = "We propose a method for quality assurance of raw data from water distribution networks in near real-time. Well-known and novel data analysis methods, including a timestamp drift test, are combined to produce a malfunction indicator database for diagnosing anomalies within data acquisition practices. The method was applied to 112 flow and 111 pressure data sets, covering on average 32 months, located throughout the distribution networks of three Danish utilities. Around 10{\%} of measurements in the utilities’ meter data sets were absent and 3–35{\%} were categorized as dubious or erroneous. The most common types of anomalies for flow and pressure data were flatline and time stamp inconsistencies. Time drifts were identified in all three utilities and a similarity analysis revealed a simultaneous occurrence of many anomalies. These high rates could have been avoided if the proposed method had been implemented to automatically highlight meter errors and system-wide problems in data collection.",
keywords = "Data validation, error diagnostics, water supply",
author = "Kirstein, {Jonas Kjeld} and Klavs H{\o}gh and Martin Rygaard and Morten Borup",
year = "2019",
doi = "10.1080/1573062X.2019.1611884",
language = "English",
volume = "16",
pages = "1--10",
journal = "Urban Water Journal",
issn = "1573-062X",
publisher = "CRC Press/Balkema",
number = "1",

}

A semi-automated approach to validation and error diagnostics of water network data. / Kirstein, Jonas Kjeld; Høgh, Klavs; Rygaard, Martin; Borup, Morten.

In: Urban Water Journal, Vol. 16, No. 1, 2019, p. 1-10.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - A semi-automated approach to validation and error diagnostics of water network data

AU - Kirstein, Jonas Kjeld

AU - Høgh, Klavs

AU - Rygaard, Martin

AU - Borup, Morten

PY - 2019

Y1 - 2019

N2 - We propose a method for quality assurance of raw data from water distribution networks in near real-time. Well-known and novel data analysis methods, including a timestamp drift test, are combined to produce a malfunction indicator database for diagnosing anomalies within data acquisition practices. The method was applied to 112 flow and 111 pressure data sets, covering on average 32 months, located throughout the distribution networks of three Danish utilities. Around 10% of measurements in the utilities’ meter data sets were absent and 3–35% were categorized as dubious or erroneous. The most common types of anomalies for flow and pressure data were flatline and time stamp inconsistencies. Time drifts were identified in all three utilities and a similarity analysis revealed a simultaneous occurrence of many anomalies. These high rates could have been avoided if the proposed method had been implemented to automatically highlight meter errors and system-wide problems in data collection.

AB - We propose a method for quality assurance of raw data from water distribution networks in near real-time. Well-known and novel data analysis methods, including a timestamp drift test, are combined to produce a malfunction indicator database for diagnosing anomalies within data acquisition practices. The method was applied to 112 flow and 111 pressure data sets, covering on average 32 months, located throughout the distribution networks of three Danish utilities. Around 10% of measurements in the utilities’ meter data sets were absent and 3–35% were categorized as dubious or erroneous. The most common types of anomalies for flow and pressure data were flatline and time stamp inconsistencies. Time drifts were identified in all three utilities and a similarity analysis revealed a simultaneous occurrence of many anomalies. These high rates could have been avoided if the proposed method had been implemented to automatically highlight meter errors and system-wide problems in data collection.

KW - Data validation

KW - error diagnostics

KW - water supply

U2 - 10.1080/1573062X.2019.1611884

DO - 10.1080/1573062X.2019.1611884

M3 - Journal article

VL - 16

SP - 1

EP - 10

JO - Urban Water Journal

JF - Urban Water Journal

SN - 1573-062X

IS - 1

ER -