Greedy Gaussian Segmentation of Multivariate Time Series

D. Hallac, P. Nystrup, and S. Boyd

Manuscript, October 2016, updated June 2017.

We consider the problem of breaking a multivariate (vector) time series into segments over which the data is well explained as independent samples from a Gaussian distribution. We formulate this as a covariance-regularized maximum likelihood problem, which can be reduced to a combinatorial optimization problem of searching over the possible breakpoints, or segment boundaries. This problem is in general difficult to solve globally, so we propose an efficient heuristic method that approximately solves it, and always yields a locally optimal choice, in the sense that no change of any one breakpoint improves the objective. Our method, which we call greedy Gaussian segmentation (GGS), is quite efficient and easily scales to problems with vectors of dimension over 1000 and time series of arbitrary length. We discuss methods that can be used to validate such a model using data, and also to automatically choose good values of the two hyperparameters in the method. Finally, we illustrate the approach on various financial time series.