Greedy Gaussian Segmentation of Multivariate Time Series

David Hallac*, Peter Nystrup, Stephen Boyd

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

307 Downloads (Pure)

Abstract

We consider the problem of breaking a multivariate (vector) time series into segments over which the data is well explained as independent samples from a Gaussian distribution. We formulate this as a covariance-regularized maximum likelihood problem, which can be reduced to a combinatorial optimization problem of searching over the possible breakpoints, or segment boundaries. This problem can be solved using dynamic programming, with complexity that grows with the square of the time series length. We propose a heuristic method that approximately solves the problem in linear time with respect to this length, and always yields a locally optimal choice, in the sense that no change of anyone breakpoint improves the objective. Our method, which we call greedy Gaussian segmentation (GGS), easily scales to problems with vectors of dimension over 1000 and time series of arbitrary length. We discuss methods that can be used to validate such a model using data, and also to automatically choose appropriate values of the two hyperparameters in the method. Finally, we illustrate our GGS approach on financial time series and Wikipedia text data.
Original languageEnglish
JournalAdvances in Data Analysis and Classification
Volume13
Issue number3
Pages (from-to)727-751
Number of pages25
ISSN1862-5347
DOIs
Publication statusPublished - 2019

Keywords

  • Time series analysis
  • Change-point detection
  • Financial regimes
  • Text segmentation
  • Covariance regularization
  • Greedy algorithms

Fingerprint

Dive into the research topics of 'Greedy Gaussian Segmentation of Multivariate Time Series'. Together they form a unique fingerprint.

Cite this