Abstract
Scaled Bregman distances SBD have turned out to be useful tools for simultaneous estimation and goodness-of-fit-testing in parametric models of random data (streams, clouds). We show how SBD can additionally be used for model preselection (structure detection), i.e. for finding appropriate candidates of model (sub)classes in order to support a desired decision under uncertainty. For this, we exemplarily concentrate on the context of nonlinear recursive models with additional exogenous inputs; as special cases we include nonlinear regressions, linear autoregressive models (e.g. AR, ARIMA, SARIMA time series), and nonlinear autoregressive models with exogenous inputs (NARX). In particular, we outline a corresponding information-geometric 3D computer-graphical selection procedure. Some sample-size asymptotics is given as well.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Equipped with some \(\sigma \)-algebra \(\mathcal {A}\).
- 2.
Equipped with some \(\sigma \)-algebra \(\mathcal {B}\).
- 3.
Notice that, here, for the definition of AR models we do not assume the stationarity of \((X_m)_{m \ge k}\).
- 4.
The use of the indefinite article reflects the possible non-uniqueness.
- 5.
One can also take variants which (according to some principle) “synthetically un-zero-ize” the empirical probability mass of non-appearing outcomes.
- 6.
In the sense of putting probability mass \({c \atopwithdelims ()j} \, \theta ^j \, (1-\theta )^{c-j}\) (\(\theta \in ]0,1[\)) on the j-th point \(y_j\) of a finite set \(\mathcal {Y} =\{y_{1}, \ldots , y_{c}\}\).
- 7.
Which may vary in \(\alpha \), \(\beta \).
- 8.
Notice that is a discrete distribution and hence the reference distribution \(\lambda \) is typically the counting distribution (attributing the value 1 to each possible outcome); if \(Q_{\theta }\) has a different reference distribution \(\overline{\lambda }\) of completely different type (e.g. \(Q_{\theta }\) is a classical (absolutely) continuous distribution, say Gaussian, and accordingly \(\overline{\lambda }\) is the Lebesgue measure), then one can e.g. smooth the histogram and hence , or “discretize” \(Q_{\theta }\) by appropriately partitioning its support.
- 9.
Choosing the counting distribution for \(\lambda \); one can use (15) also for non-probability contexts (e.g. general nonnegative vectors) with \(\sum _{{x\in \mathcal {X}}} p(x) \ne 1\), \(\sum _{{x\in \mathcal {X}}} q(x) \ne 1\).
- 10.
In case of \(w(u,v)=w(v,u)\) for all (u, v), one can easily produce symmetric preselection-score versions by means of either \(B_{\phi ,W }^{new}\left( P,Q\right) +B_{\phi ,W }^{new}\left( Q,P\right) \), \(\max \{B_{\phi ,W }^{new}\left( P,Q\right) , B_{\phi ,W }^{new}\left( Q,P\right) \}\), \(\min \{B_{\phi ,W }^{new}\left( P,Q\right) , B_{\phi ,W }^{new}\left( Q,P\right) \}\); this also works for \(\phi (t)= \phi _1(t)\) together with arbitrary scale-connectors w.
- 11.
Goodness-of-approximation score surface.
- 12.
Quantities which modelwise should be rare but actually appear much more often.
- 13.
Quantities which modelwise should be very frequent but actually appear much less often.
References
Ali, M.S., Silvey, D.: A general class of coefficients of divergence of one distribution from another. J. Roy. Statist. Soc. B-28, 131140 (1966)
Basu, A., Harris, I.R., Hjort, N.L., Jones, M.C.: Robust and efficient estimation by minimising a density power divergence. Biometrika B-85, 549–559 (1998)
Basu, A., Shioya, H., Park, C.: Statistical Inference: The Minimum Distance Approach. CRC Press, Boca Raton (2011)
Billings, S.A.: Nonlinear System Identification. Wiley, Chichester (2013)
Csiszar, I.: Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten. Publ. Math. Inst. Hungar. Acad. Sci. A-8, 85–108 (1963)
Kißlinger, A.-L., Stummer, W.: Some decision procedures based on scaled bregman distance surfaces. In: Nielsen, F., Barbaresco, F. (eds.) GSI 2013. LNCS, vol. 8085, pp. 479–486. Springer, Heidelberg (2013)
Kißlinger, A.-L., Stummer, W.: Robust statistical engineering by means of scaled Bregman divergences (to appear)
Kißlinger, A.-L., Stummer, W.: A new information-geometric method of change detection. Preprint
Liese, F., Vajda, I.: Convex Statistical Distances. Teubner, Leipzig (1987)
Nock, R., Piro, P., Nielsen, F., Ali, W.B.H., Barlaud, M.: Boosting k-NN for categorization of natural sciences. Int. J. Comput. Vis. 100, 294–314 (2012)
Pardo, L.: Statistical Inference Based on Divergence Measures. Chapman & Hall, Boca Raton (2006)
Pardo, M.C., Vajda, I.: On asymptotic properties of information-theoretic divergences. IEEE Trans. Inf. Theor. 49(7), 1860–1868 (2003)
Read, T.R.C., Cressie, N.A.C.: Goodness-of-Fit Statistics for Discrete Multivariate Data. Springer, New York (1988)
Stummer, W.: Some Bregman distances between financial diffusion processes. Proc. Appl. Math. Mech. 7(1), 1050503–1050504 (2007)
Stummer, W., Vajda, I.: On Bregman distances and divergences of probability measures. IEEE Trans. Inf. Theor. 58(3), 1277–1288 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kißlinger, AL., Stummer, W. (2015). New Model Search for Nonlinear Recursive Models, Regressions and Autoregressions. In: Nielsen, F., Barbaresco, F. (eds) Geometric Science of Information. GSI 2015. Lecture Notes in Computer Science(), vol 9389. Springer, Cham. https://doi.org/10.1007/978-3-319-25040-3_74
Download citation
DOI: https://doi.org/10.1007/978-3-319-25040-3_74
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25039-7
Online ISBN: 978-3-319-25040-3
eBook Packages: Computer ScienceComputer Science (R0)