# A pruned recursive solution to the multiple change point problem

- 193 Downloads

## Abstract

Long time series are often heterogeneous in nature. As such, the most appropriate model is one whose parameters are allowed to change through time. The exponential number of solutions to the multiple change point problem requires an efficient algorithm in order to be computationally feasible. Exact Bayesian solutions have at best quadratic complexity in the number of observations, which still can be too slow for very large data sets. Here, a pruned dynamic programming algorithm is proposed to fit a piecewise regression model with unknown break points to a data set. The algorithm removes unessential calculations, reducing the complexity of the most time consuming step of the algorithm from quadratic in the number of observations to quadratic in the average distance between change points. A distance measure is introduced that can be used to determine the divergence of the approximate joint posterior distribution from the exact posterior distribution. Analysis of two real data sets shows that this approximate algorithm produces a nearly identical representation of the joint posterior distribution on the locations of the change points, but with a significantly faster run time than its exact counterpart.

## Keywords

Bayesian change point Dynamic programming Joint posterior Kullback–Leibler divergence Piecewise regression## Notes

### Acknowledgements

The author would like to thank the two anonymous reviewers for their thoughtful feedback which helped to greatly improve this manuscript. This work was supported by a grant from the National Science Foundation, DMS-1407670 (E. Ruggieri, PI).

## Supplementary material

## References

- Adams RP, MacKay DJC (2007) Bayesian online changepoint detection. http://arxiv.org/pdf/0710.3742.pdf. Accessed 20 June 2016
- Auger IE, Lawrence CE (1989) Algorithms for the optimal identification of segment neighborhoods. Bull Math Biol 51:39–54MathSciNetCrossRefzbMATHGoogle Scholar
- Bai J, Perron P (2003) Computation and analysis of multiple structural change models. J Appl Econom 18:1–22CrossRefGoogle Scholar
- Barry D, Hartigan JA (1993) A Bayesian analysis for change point problems. J Am Stat Assoc 88(421):309–319MathSciNetzbMATHGoogle Scholar
- Carlin BP, Gelfand AE, Smith AFM (1992) Hierarchical Bayesian analysis of changepoint problems. Appl Stat 41:389–405CrossRefzbMATHGoogle Scholar
- Chib S (1998) Estimation and comparison of multiple change-point models. J Econom 86:221–241MathSciNetCrossRefzbMATHGoogle Scholar
- Chopin N (2007) Dynamic detection of change points in line time series. Ann Inst Stat Math 59:349–366MathSciNetCrossRefzbMATHGoogle Scholar
- Erdman C, Emerson J (2008) A fast Bayesian change point analysis for the segmentation of microarray data. Bioinformatics 24:2143–2148CrossRefGoogle Scholar
- Fearnhead P (2006) Exact and efficient Bayesian inference for multiple changepoint problems. Stat Comput 16:203–213MathSciNetCrossRefGoogle Scholar
- Fearnhead P, Clifford P (2003) On-line inference for hidden Markov models via particle filters. J R Stat Soc B 65(4):887–899MathSciNetCrossRefzbMATHGoogle Scholar
- Fearnhead P, Liu Z (2007) On-line inference for multiple changepoint problems. J R Stat Soc B 69(4):589–605MathSciNetCrossRefGoogle Scholar
- Fryzlewicz P (2013) Wild binary segmentation for multiple change-point detection. http://stats.lse.ac.uk/fryzlewicz/wbs/wbs.pdf. Accessed 20 June 2016
- Gallagher C, Lund R, Robbins M (2012) Changepoint detection in daily precipitation data. Environmetrics 23(5):407–419MathSciNetCrossRefGoogle Scholar
- Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4):711–732. doi: 10.1093/biomet/82.4.711 MathSciNetCrossRefzbMATHGoogle Scholar
- Hawkins DM, Qiu P, Kang CW (2003) The changepoint model for statistical process control. J Qual Technol 35(4):355–366CrossRefGoogle Scholar
- Hays JD, Imbrie J, Shackleton NJ (1976) Variations in the Earth’s orbit: pacemakers of the ice ages. Science 194:1121–1132CrossRefGoogle Scholar
- Jarrett RG (1979) A note on the intervals between coal-mining disasters. Biometrika 66:191–193CrossRefGoogle Scholar
- Killick R, Fearnhead P, Eckley IA (2012a) Optimal detection of changepoints with a linear computational cost. J Am Stat Assoc 107(500):1590–1598MathSciNetCrossRefzbMATHGoogle Scholar
- Killick R, Nam CFH, Aston JAD, Eckley IA (2012b) Changepoint.info: the changepoint repository. http://changepoint.info
- Lavielle M, Lebarbier E (2001) An application of MCMC methods for the multiple change-points problem. Signal Process 81(1):39–53CrossRefzbMATHGoogle Scholar
- Lisiecki LE, Raymo ME (2005) A Pliocene-Pleistocene stack of 57 globally distributed benthic \(\delta \)18O records. Paleoceanography 20:PA1003. doi: 10.1029/2004PA001071 Google Scholar
- Liu JS, Lawrence CE (1999) Bayesian inference on biopolymer models. Bioinformatics 15(1):38–52CrossRefGoogle Scholar
- Milankovitch M (1941) Canon of insolation and the ice-age problem. Israel program for scientific translations, Jerusalem (1969)Google Scholar
- Olshen AB, Venkatraman ES, Lucito R, Wigler M (2004) Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5(4):557–572CrossRefzbMATHGoogle Scholar
- O’Ruanaidh J, Fitzgerald WJ (1996) Numerical Bayesian methods applied to signal processing. Springer, New YorkCrossRefGoogle Scholar
- Rigaill G (2010) Pruned dynamic programming for optimal multiple change-point detection. http://arXiv:1004.0887v2.pdf . Accessed 20 June 2016Google Scholar
- Ross GJ (2013) Parametric and nonparametric sequential change detection in R: the cpm package. http://www.gordonjross.co.uk/cpm.pdf. Accessed 20 June 2016
- Ruddiman WF (2013) Earth’s climate: past and future, 3rd edn. WH Freeman, New YorkGoogle Scholar
- Ruggieri E (2013) A Bayesian approach to detecting change points in climatic records. Int J Climatol 33:520–528CrossRefGoogle Scholar
- Ruggieri E, Antonellis M (2016) An exact approach to sequential change point detection. Comput Stat Data Anal 97:71–86MathSciNetCrossRefGoogle Scholar
- Ruggieri E, Lawrence CE (2014) The Bayesian change point and variable selection algorithm: application to the \({\updelta }^{18}\text{ O }\) record of the Plio-Pleistocene. J Comput Gr Stat 23(1):87–110CrossRefGoogle Scholar
- Ruggieri E, Herbert T, Lawrence KT, Lawrence CE (2009) Change point method for detecting regime shifts in paleoclimatic time series: application to \(\delta \)18O time series of the Plio-Pleistocene. Paleoceanography 24:PA1204. doi: 10.1029/2007PA001568 CrossRefGoogle Scholar
- Saatci Y, Turner R, Rasmussen CE (2010) Gaussian process change point models. In: Proceedings of the 27th international conference on machine learning, pp 927–934Google Scholar
- Scott AJ, Knott M (1974) A cluster analysis method for grouping means in the analysis of variance. Biometrics 30:507–512CrossRefzbMATHGoogle Scholar
- Stephens DA (1994) Bayesian retrospective multiple-changepoint identification. Appl Stat 43(1):159–178CrossRefzbMATHGoogle Scholar
- Wang Q, Kulkarni SR, Verdú S (2009) Divergence estimation for multidimensional densities via \(k\)-nearest neighbor distances. IEEE Trans Inf Theory 55(5):2392–2405MathSciNetCrossRefzbMATHGoogle Scholar
- Western B, Kleykamp M (2004) A Bayesian change point model for historical time series analysis. Polit Anal 12(4):354–374CrossRefGoogle Scholar
- Whiteley N, Andrieu C, Doucet A (2011) Bayesian computational methods for inference in multiple change-point models. http://www.maths.bris.ac.uk/~manpw/change_points_2011.pdf. Accessed 20 June 2016
- Wilson RC, Nassar MR, Gold JI (2010) Bayesian on-line learning of the hazard rate in change-point problems. Neural Comput 22(9):2452–2476CrossRefzbMATHGoogle Scholar
- Yildirim S, Singh SS, Doucet A (2013) An online expectation-maximization algorithm for changepoint models. J Comput Gr Stat 22(4):906–926MathSciNetCrossRefGoogle Scholar
- Zeileis A, Leisch F, Hornik K, Kleiber C (2002) Strucchange: an R package for testing for structural change in linear regression models. J Stat Softw 7(2):1–38CrossRefGoogle Scholar