Computational Statistics

, Volume 33, Issue 2, pp 1017–1045 | Cite as

A pruned recursive solution to the multiple change point problem

Original Paper
  • 96 Downloads

Abstract

Long time series are often heterogeneous in nature. As such, the most appropriate model is one whose parameters are allowed to change through time. The exponential number of solutions to the multiple change point problem requires an efficient algorithm in order to be computationally feasible. Exact Bayesian solutions have at best quadratic complexity in the number of observations, which still can be too slow for very large data sets. Here, a pruned dynamic programming algorithm is proposed to fit a piecewise regression model with unknown break points to a data set. The algorithm removes unessential calculations, reducing the complexity of the most time consuming step of the algorithm from quadratic in the number of observations to quadratic in the average distance between change points. A distance measure is introduced that can be used to determine the divergence of the approximate joint posterior distribution from the exact posterior distribution. Analysis of two real data sets shows that this approximate algorithm produces a nearly identical representation of the joint posterior distribution on the locations of the change points, but with a significantly faster run time than its exact counterpart.

Keywords

Bayesian change point Dynamic programming Joint posterior Kullback–Leibler divergence Piecewise regression 

Notes

Acknowledgements

The author would like to thank the two anonymous reviewers for their thoughtful feedback which helped to greatly improve this manuscript. This work was supported by a grant from the National Science Foundation, DMS-1407670 (E. Ruggieri, PI).

Supplementary material

180_2017_756_MOESM1_ESM.txt (12 kb)
Supplementary material 1 (txt 11 KB)
180_2017_756_MOESM2_ESM.m (3 kb)
Supplementary material 2 (m 2 KB)
180_2017_756_MOESM3_ESM.m (7 kb)
Supplementary material 3 (m 6 KB)
180_2017_756_MOESM4_ESM.m (1 kb)
Supplementary material 4 (m 1 KB)
180_2017_756_MOESM5_ESM.txt (35 kb)
Supplementary material 5 (txt 34 KB)
180_2017_756_MOESM6_ESM.m (0 kb)
Supplementary material 6 (m 1 KB)
180_2017_756_MOESM7_ESM.m (0 kb)
Supplementary material 7 (m 1 KB)
180_2017_756_MOESM8_ESM.m (6 kb)
Supplementary material 8 (m 5 KB)
180_2017_756_MOESM9_ESM.txt (71 kb)
Supplementary material 9 (txt 71 KB)
180_2017_756_MOESM10_ESM.txt (110 kb)
Supplementary material 10 (txt 109 KB)
180_2017_756_MOESM11_ESM.m (4 kb)
Supplementary material 11 (m 3 KB)
180_2017_756_MOESM12_ESM.m (2 kb)
Supplementary material 12 (m 1 KB)
180_2017_756_MOESM13_ESM.m (10 kb)
Supplementary material 13 (m 10 KB)
180_2017_756_MOESM14_ESM.m (6 kb)
Supplementary material 14 (m 6 KB)
180_2017_756_MOESM15_ESM.txt (36 kb)
Supplementary material 15 (txt 36 KB)
180_2017_756_MOESM16_ESM.m (10 kb)
Supplementary material 16 (m 9 KB)

References

  1. Adams RP, MacKay DJC (2007) Bayesian online changepoint detection. http://arxiv.org/pdf/0710.3742.pdf. Accessed 20 June 2016
  2. Auger IE, Lawrence CE (1989) Algorithms for the optimal identification of segment neighborhoods. Bull Math Biol 51:39–54MathSciNetCrossRefMATHGoogle Scholar
  3. Bai J, Perron P (2003) Computation and analysis of multiple structural change models. J Appl Econom 18:1–22CrossRefGoogle Scholar
  4. Barry D, Hartigan JA (1993) A Bayesian analysis for change point problems. J Am Stat Assoc 88(421):309–319MathSciNetMATHGoogle Scholar
  5. Carlin BP, Gelfand AE, Smith AFM (1992) Hierarchical Bayesian analysis of changepoint problems. Appl Stat 41:389–405CrossRefMATHGoogle Scholar
  6. Chib S (1998) Estimation and comparison of multiple change-point models. J Econom 86:221–241MathSciNetCrossRefMATHGoogle Scholar
  7. Chopin N (2007) Dynamic detection of change points in line time series. Ann Inst Stat Math 59:349–366MathSciNetCrossRefMATHGoogle Scholar
  8. Erdman C, Emerson J (2008) A fast Bayesian change point analysis for the segmentation of microarray data. Bioinformatics 24:2143–2148CrossRefGoogle Scholar
  9. Fearnhead P (2006) Exact and efficient Bayesian inference for multiple changepoint problems. Stat Comput 16:203–213MathSciNetCrossRefGoogle Scholar
  10. Fearnhead P, Clifford P (2003) On-line inference for hidden Markov models via particle filters. J R Stat Soc B 65(4):887–899MathSciNetCrossRefMATHGoogle Scholar
  11. Fearnhead P, Liu Z (2007) On-line inference for multiple changepoint problems. J R Stat Soc B 69(4):589–605MathSciNetCrossRefGoogle Scholar
  12. Fryzlewicz P (2013) Wild binary segmentation for multiple change-point detection. http://stats.lse.ac.uk/fryzlewicz/wbs/wbs.pdf. Accessed 20 June 2016
  13. Gallagher C, Lund R, Robbins M (2012) Changepoint detection in daily precipitation data. Environmetrics 23(5):407–419MathSciNetCrossRefGoogle Scholar
  14. Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4):711–732. doi: 10.1093/biomet/82.4.711 MathSciNetCrossRefMATHGoogle Scholar
  15. Hawkins DM, Qiu P, Kang CW (2003) The changepoint model for statistical process control. J Qual Technol 35(4):355–366CrossRefGoogle Scholar
  16. Hays JD, Imbrie J, Shackleton NJ (1976) Variations in the Earth’s orbit: pacemakers of the ice ages. Science 194:1121–1132CrossRefGoogle Scholar
  17. Jarrett RG (1979) A note on the intervals between coal-mining disasters. Biometrika 66:191–193CrossRefGoogle Scholar
  18. Killick R, Fearnhead P, Eckley IA (2012a) Optimal detection of changepoints with a linear computational cost. J Am Stat Assoc 107(500):1590–1598MathSciNetCrossRefMATHGoogle Scholar
  19. Killick R, Nam CFH, Aston JAD, Eckley IA (2012b) Changepoint.info: the changepoint repository. http://changepoint.info
  20. Lavielle M, Lebarbier E (2001) An application of MCMC methods for the multiple change-points problem. Signal Process 81(1):39–53CrossRefMATHGoogle Scholar
  21. Lisiecki LE, Raymo ME (2005) A Pliocene-Pleistocene stack of 57 globally distributed benthic \(\delta \)18O records. Paleoceanography 20:PA1003. doi: 10.1029/2004PA001071 Google Scholar
  22. Liu JS, Lawrence CE (1999) Bayesian inference on biopolymer models. Bioinformatics 15(1):38–52CrossRefGoogle Scholar
  23. Milankovitch M (1941) Canon of insolation and the ice-age problem. Israel program for scientific translations, Jerusalem (1969)Google Scholar
  24. Olshen AB, Venkatraman ES, Lucito R, Wigler M (2004) Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5(4):557–572CrossRefMATHGoogle Scholar
  25. O’Ruanaidh J, Fitzgerald WJ (1996) Numerical Bayesian methods applied to signal processing. Springer, New YorkCrossRefGoogle Scholar
  26. Rigaill G (2010) Pruned dynamic programming for optimal multiple change-point detection. http://arXiv:1004.0887v2.pdf . Accessed 20 June 2016Google Scholar
  27. Ross GJ (2013) Parametric and nonparametric sequential change detection in R: the cpm package. http://www.gordonjross.co.uk/cpm.pdf. Accessed 20 June 2016
  28. Ruddiman WF (2013) Earth’s climate: past and future, 3rd edn. WH Freeman, New YorkGoogle Scholar
  29. Ruggieri E (2013) A Bayesian approach to detecting change points in climatic records. Int J Climatol 33:520–528CrossRefGoogle Scholar
  30. Ruggieri E, Antonellis M (2016) An exact approach to sequential change point detection. Comput Stat Data Anal 97:71–86MathSciNetCrossRefGoogle Scholar
  31. Ruggieri E, Lawrence CE (2014) The Bayesian change point and variable selection algorithm: application to the \({\updelta }^{18}\text{ O }\) record of the Plio-Pleistocene. J Comput Gr Stat 23(1):87–110CrossRefGoogle Scholar
  32. Ruggieri E, Herbert T, Lawrence KT, Lawrence CE (2009) Change point method for detecting regime shifts in paleoclimatic time series: application to \(\delta \)18O time series of the Plio-Pleistocene. Paleoceanography 24:PA1204. doi: 10.1029/2007PA001568 CrossRefGoogle Scholar
  33. Saatci Y, Turner R, Rasmussen CE (2010) Gaussian process change point models. In: Proceedings of the 27th international conference on machine learning, pp 927–934Google Scholar
  34. Scott AJ, Knott M (1974) A cluster analysis method for grouping means in the analysis of variance. Biometrics 30:507–512CrossRefMATHGoogle Scholar
  35. Stephens DA (1994) Bayesian retrospective multiple-changepoint identification. Appl Stat 43(1):159–178CrossRefMATHGoogle Scholar
  36. Wang Q, Kulkarni SR, Verdú S (2009) Divergence estimation for multidimensional densities via \(k\)-nearest neighbor distances. IEEE Trans Inf Theory 55(5):2392–2405MathSciNetCrossRefMATHGoogle Scholar
  37. Western B, Kleykamp M (2004) A Bayesian change point model for historical time series analysis. Polit Anal 12(4):354–374CrossRefGoogle Scholar
  38. Whiteley N, Andrieu C, Doucet A (2011) Bayesian computational methods for inference in multiple change-point models. http://www.maths.bris.ac.uk/~manpw/change_points_2011.pdf. Accessed 20 June 2016
  39. Wilson RC, Nassar MR, Gold JI (2010) Bayesian on-line learning of the hazard rate in change-point problems. Neural Comput 22(9):2452–2476CrossRefMATHGoogle Scholar
  40. Yildirim S, Singh SS, Doucet A (2013) An online expectation-maximization algorithm for changepoint models. J Comput Gr Stat 22(4):906–926MathSciNetCrossRefGoogle Scholar
  41. Zeileis A, Leisch F, Hornik K, Kleiber C (2002) Strucchange: an R package for testing for structural change in linear regression models. J Stat Softw 7(2):1–38CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.Department of Mathematics and Computer ScienceCollege of the Holy CrossWorcesterUSA

Personalised recommendations