Abstract
This paper presents a new approach to the detection of discontinuities in the n-th derivative of observational data. This is achieved by performing two polynomial approximations at each interstitial point. The polynomials are coupled by constraining their coefficients to ensure continuity of the model up to the (n − 1)-th derivative; while yielding an estimate for the discontinuity of the n-th derivative. The coefficients of the polynomials correspond directly to the derivatives of the approximations at the interstitial points through the prudent selection of a common coordinate system. The approximation residual and extrapolation errors are investigated as measures for detecting discontinuity. This is necessary since discrete observations of continuous systems are discontinuous at every point. It is proven, using matrix algebra, that positive extrema in the combined approximation-extrapolation error correspond exactly to extrema in the difference of the Taylor coefficients. This provides a relative measure for the severity of the discontinuity in the observational data. The matrix algebraic derivations are provided for all aspects of the methods presented here; this includes a solution for the covariance propagation through the computation. The performance of the method is verified with a Monte Carlo simulation using synthetic piecewise polynomial data with known discontinuities. It is also demonstrated that the discontinuities are suitable as knots for B-spline modelling of data. For completeness, the results of applying the method to sensor data acquired during the monitoring of heavy machinery are presented.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In the recent past physics informed data science has become a focus of research activities, e.g., [9]. It appears under different names e.g., physics informed [12]; hybrid learning [13]; physics-based [17], etc.; but with the same basic idea of embedding physical principles into the data science algorithms. The goal is to ensure that the results obtained obey the laws of physics and/or are based on physically relevant features. Discontinuities in the observations of continuous systems violate some very basic physics and for this reason their detection is of fundamental importance. Consider Newton’s second law of motion,
Any discontinuities in the observations of m(t), \(\dot{m}(t)\), y(t), \(\dot{y}(t)\) or \(\ddot{y}(t)\) indicate a violation of some basic principle: be it that the observation is incorrect or something unexpected is happening in the system. Consequently, detecting discontinuities is of fundamental importance in physics based data science. A function s(x) is said to be \(C^n\) discontinuous, if \(s \in C^{n-1}{\setminus } C^n\), that is if s(x) has continuous derivatives up to and including order \(n-1\), but the n-th derivative is discontinuous. Due to the discrete and finite nature of the observational data, only jump discontinuities in the n-th derivative are considered; asymptotic discontinuities are not considered. Furthermore, in more classical data modelling, \(C^n\) jump discontinuities form the basis for the locations of knots in B-Spline models of observational data [15].
1.1 State of the Art
There are numerous approaches in the literature dealing with estimating regression functions that are smooth, except at a finite number of points. Based on the methods, these approaches can be classified into four groups: local polynomial methods, spline-based methods, kernel-based methods and wavelet methods. The approaches vary also with respect to the available a priori knowledge about the number of points of discontinuity or the derivative in which these discontinuities appear. For a good literature review of these methods, see [3]. The method used in this paper is relevant both in terms of local polynomials as well as spline-based methods; however, the new approach requires no a priori knowledge about the data.
In the local polynomial literature, namely in [8] and [14], ideas similar to the ones presented here are investigated. In these papers, local polynomial approximations from the left and the right side of the point in question are used. The major difference is that neither of these methods use constraints to ensure that the local polynomial approximations enforce continuity of the lower derivatives, which is done in this paper. As such, they use different residuals to determine the existence of a change point. Using constrained approximation ensures that the underlying physical properties of the system are taken into consideration, which is one of the main advantages of the approach presented here. Additionally, in the aforementioned papers, it is not clear whether only co-locative points are considered as possible change points, or interstitial points are also considered. This distinction between collocative and interstitial is of great importance. Fundamentally, the method presented here can be applied to discontinuities at either locations. However, it has been assumed that discontinuities only make sense between the sampled (co-locative) points, i.e., the discontinuities are interstitial.
In [11] on the other hand, one polynomial instead of two is used, and the focus is mainly on detecting \(C^0\) and \(C^1\) discontinuities. Additionally, the number of change-points must be known a-priori, so only their location is approximated; the required a-priori knowledge make the method unsuitable in real sensor based system observation.
In the spline-based literature there are heuristic methods (top-down and bottom-up) as well as optimization methods. For a more detailed state of the art on splines, see [2]. Most heuristic methods use a discrete geometric measure to calculate whether a point is a knot, such as: discrete curvature, kink angle, etc, and then use some (mostly arbitrary) threshold to improve the initial knot set. In the method presented here, which falls under the category of bottom-up approaches, the selection criterion is based on calculus and statistics, which allows for incorporation of the fundamental physical laws governing the system, in the model, but also ensures mathematical relevance and rigour.
1.2 The New Approach
This paper presents a new approach to detecting \(C^n\) discontinuities in observational data. It uses constrained coupled polynomial approximation to obtain two estimates for the \(n^\text {th}\) Taylor coefficients and their uncertainties, at every interstitial point. These correspond approximating the local function by polynomials, once from the left \(\mathsf {f}(x,\varvec{\alpha })\) and once from the right \(\mathsf {g}(x,\varvec{\beta })\). The constraints couple the polynomials to ensure that \(\alpha _i = \beta _i \,\,\, \text {for every}\, i \in [0 \ldots n-1]\). In this manner the approximations are \(C^{n-1}\) continuous at the interstitial points, while delivering an estimate for the difference in the \(n^\text {th}\) Taylor coefficients. All the derivations for the coupled constrained approximations and the numerical implementations are presented. Both the approximation and extrapolation residuals are derived. It is proven that the discontinuities must lie at local positive peaks in the extrapolation error. The new approach is verified with both known synthetic data and on real sensor data obtained from observing the operation of heavy machinery.
2 Detecting \(C^n\) Discontinuities
Discrete observations \(s(x_i)\) of a continuous system s(x) are, by their very nature, discontinuous at every sample. Consequently, some measure for discontinuity will be required, with uncertainty, which provides the basis for further analysis.
The observations are considered to be the co-locative points, denoted by \(x_i\) and collectively by the vector \(\varvec{x}\); however, we wish to estimate the discontinuity at the interstitial points, denoted by \(\zeta _i\) and collectively as \(\varvec{\zeta }\). Using interstitial points, one ensures that each data point is used for only one polynomial approximation at a time. Furthermore, in the case of sensor data, one expects the discontinuities to happen between samples. Consequently the data is segmented at the interstitial points, i.e. between the samples. This requires the use of interpolating functions and in this work we have chosen to use polynomials.
Polynomials have been chosen because of their approximating, interpolating and extrapolating properties when modelling continuous systems: The Weierstrass approximation theorem [16] states that if f(x) is a continuous real-valued function defined on the real interval \(x \in [a, b]\), then for every \(\varepsilon > 0\), there exists a polynomial p(x) such that for all \(x \in [a, b]\), the supremum norm \(\Vert f(x) - p(x)\Vert _{\infty } < \varepsilon \). That is any function f(x) can be approximated by a polynomial to an arbitrary accuracy \(\varepsilon \) given a sufficiently high degree.
The basic concept (see Fig. 1) to detect a \(C^n\) discontinuity is: to approximate the data to the left of an interstitial point by the polynomial \(\mathsf {f}(x,\varvec{\alpha })\) of degree \(d_L\) and to the right by \(\mathsf {g}(x,\varvec{\beta })\) of degree \(d_R\), while constraining these approximations to be \(C^{n-1}\) continuous at the interstitial point. This approximation ensures that,
while yielding estimates for \(\mathsf {f}^{(n)}(\zeta _i)\) and \(\mathsf {g}^{(n)}(\zeta _i)\) together with estimates for their variances \(\lambda _{f(\zeta _i)}\) and \(\lambda _{g(\zeta _i)}\). This corresponds exactly to estimating the Taylor coefficients of the function twice for each interstitial point, i.e., once from the left and once from the right. It they differ significantly, then the function’s \(n^\text {th}\) derivative is discontinuous at this point. The Taylor series of a function f(x) around the point a is defined as,
for each x for which the infinite series on the right hand side converges. Furthermore, any function which is \(n+1\) times differentiable can be written as
where \(\tilde{\mathsf {f}}(x)\) is an \(n^\text {th}\) degree polynomial approximation of the function f(x),
and R(x) is the remainder term. The Lagrange form of the remainder R(x) is given by
where \(\xi \) is a real number between a and x.
A Taylor expansion around the origin (i.e. \(a = 0\) in Eq. 3) is called a Maclaurin expansion; for more details, see [1]. In the rest of this work, the \(n^\text {th}\) Maclaurin coefficient for the function f(x) will be denoted by
The coefficients of a polynomial \(\mathsf {f}(x,\varvec{\alpha }) = \alpha _n x^n\,+\,\ldots \,+\,\alpha _1 x\,+ \alpha _0\) are closely related to the coefficients of the Maclaurin expansion of this polynomial. Namely, it’s easy to prove that
A prudent selection of a common local coordinate system, setting the interstitial point as the origin, ensures that the coefficients of the left and right approximating polynomials correspond to the derivative values at this interstitial point. Namely, one gets a very clear relationship between the coefficients of the left and right polynomial approximations, \(\varvec{\alpha }\) and \(\varvec{\beta }\), their Maclaurin coefficients, \(t_{\mathsf {f}}^{(n)}\) and \(t_{\mathsf {g}}^{(n)}\), and the values of the derivatives at the interstitial point
From Eq. 9 it is clear that performing a left and right polynomial approximation at an interstitial point is sufficient to get the derivative values at that point, as well as their uncertainties.
3 Constrained and Coupled Polynomial Approximation
The goal here is to obtain \(\varDelta t_{\mathsf {fg}}^{\left( n\right) } \triangleq t_{\mathsf {f}}^{\left( n\right) } - t_{\mathsf {g}}^{\left( n\right) }\) via polynomial approximation. To this end two polynomial approximations are required; whereby, the interstitial point is used as the origin in the common coordinate system, see Fig. 1. The approximations are coupled [6] at the interstitial point by constraining the coefficients such that \(\alpha _i = \beta _i, \, \text {for every} \, i \in [0\ldots n-1]\). This ensures that the two polynomials are \(C^{n-1}\) continuous at the interstitial points. This also reduces the degrees of freedom during the approximation and with this the variance of the solution is reduced. For more details on constrained polynomial approximation see [4, 7].
To remain fully general, a local polynomial approximation of degree \(d_L\) is performed to the left of the interstitial point with the support length \(l_L\) creating \(\mathsf {f}(x,\varvec{\alpha })\); similarly to the right \(d_R\), \(l_R\), \(\mathsf {g}(x,\varvec{\beta })\). The x coordinates to the left, denoted as \(\varvec{x}_L\) are used to form the left Vandermonde matrix \(\varvec{V}_L\), similarly \(\varvec{x}_R\) form \(\varvec{V}_R\) to the right. This leads to the following formulation of the approximation process,
A \(C^{n-1}\) continuity implies \(\alpha _i = \beta _i, \,\text {for every}\, i \in [0\ldots n-1]\) which can be written in matrix form as
Defining
We obtain the task of least squares minimization with homogeneous linear constraints,
Clearly \(\varvec{\gamma }\) must lie in the null-space of \(\varvec{C}\); now, given \(\varvec{N}\), an ortho-normal vector basis set for \(\mathop {\mathrm {null}}\left\{ \varvec{C}\right\} \), we obtain,
Back-substituting into Eq. 13 yields,
The least squares solution to this problem is,
and consequently,
Formulating the approximation in the above manner ensures that the difference in the Taylor coefficients can be simply computed as
Now defining \(\varvec{d} = [1, \, \varvec{0}_{d_L -1}, \, -1, \, \varvec{0}_{d_R -1}]^\mathrm {T}\), \(\varDelta t_{\mathsf {fg}}^{\left( n\right) }\) is obtained from \(\varvec{\gamma }\) as
3.1 Covariance Propagation
Defining, \(\varvec{K} = \varvec{N} \, \left( \varvec{V} \, \varvec{N} \right) ^+\), yields, \(\varvec{\gamma } = \varvec{K} \, \varvec{y}\). Then given the covariance of \(\varvec{y}\), i.e., \(\varvec{\varLambda }_{\varvec{y}}\), one gets that,
Additionally, from Eq. 19 one could derive the covariance of the difference in the Taylor coefficients
Keep in mind that, if one uses approximating polynomials of degree n to determine a discontinuity in the \(n^{\text {th}}\) derivative, as done so far, \(\varvec{\varLambda }_{\varvec{\varDelta }}\) is just a scalar and corresponds to the variance of \(\varDelta t_{\mathsf {fg}}^{\left( n\right) }\).
4 Error Analysis
In this paper we consider three measures for error:
-
1.
the norm of the approximation residual;
-
2.
the combined approximation and extrapolation error;
-
3.
the extrapolation error.
4.1 Approximation Error
The residual vector has the form
The approximation error is calculated as
4.2 Combined Error
The basic concept, which can be seen in Fig. 2, is as follows: the left polynomial \(\mathsf {f}\left( x,\varvec{\alpha }\right) \), which approximates over the values \(\varvec{x}_L\), is extended to the right and evaluated at the points \(\varvec{x}_R\). Analogously, the right polynomial \(\mathsf {g}\left( x,\varvec{\beta }\right) \) is evaluated at the points \(\varvec{x}_L\). If there is no \(C^n\) discontinuity in the system, the polynomials \(\mathsf {f}\) and \(\mathsf {g}\) must be equal and consequently the extrapolated values won’t differ significantly from the approximated values.
Analytical Combined Error. The extrapolation error in a continuous case, i.e. between the two polynomial models, can be computed with the following 2-norm,
Given, the constraints which ensure that \(\alpha _i = \beta _i \, i \in [0,\ldots ,n-1]\), we obtain,
Expanding and performing the integral yields,
Given fixed values for \(x_{min}\) and \(x_{max}\) across a single computation implies that the factor,
is a constant. Consequently, the extrapolation error is directly proportional to the square of the difference in the Taylor coefficients,
Numerical Combined Error. In the discrete case, one can write the errors of \(\mathsf {f}(x,\varvec{\alpha })\) and \(\mathsf {g}(x,\varvec{\beta })\) as
respectively. Consequently, one could define an error function as
where . From these calculations it is clear that in the discrete case the error is also directly proportional to the square of the difference in the Taylor coefficients and that \( E_{\mathsf {f}\mathsf {g}} \propto \varepsilon _x\). This proves that the numerical computation is consistent with the analytical continuous error.
4.3 Extrapolation Error
One could also define a different kind of error, based just on the extrapolative properties of the polynomials. Namely, using the notation from the beginning of Sect. 3, one defines
and then calculates the error as
In the example in Sect. 5, it will be seen that there is no significant numerical difference between these two errors.
5 Numerical Testing
The numerical testing is performed with: synthetic data from a piecewise polynomial, where the locations of the \(C^n\) discontinuities are known; and with real sensor data emanating from the monitoring of heavy machinery.
5.1 Synthetic Data
In the literature on splines, functions of the type \(y\left( x\right) = e^{-x^2}\) are commonly used. However, this function is analytic and \(C^{\infty }\) continuous; consequently it was not considered a suitable function for testing. In Fig. 3 a piecewise polynomial with a similar shape is shown; however, this curve has \(C^2\) discontinuities at known locations. The algorithm was applied to the synthetic data from the piecewise polynomial, with added noise with \(\sigma = 0.05\) and the results for a single case can be seen in Fig. 3. Additionally, a Monte Carlo simulation with \(m=10000\) iterations was performed and the results of the algorithm were compared to the true locations of the two known knots. The mean errors in the location of the knots are: \(\mu _1 = (5.59 \pm 2.05) \times 10^{-4}\) with \(95 \%\) confidence, and \(\mu _2 = (-4.62 \pm 1.94)\times 10^{-4}\). Errors in the scale of \(10^{-4}\), in a support with a range \([0,\,1]\), and \(5 \%\) noise amplitude in the curve can be considered a highly satisfactory result.
5.2 Sensor Data
The algorithm was also applied to a set of real-world sensor dataFootnote 1 emanating from the monitoring of heavy machinery. The original data set can be seen in Fig. 4 (top). It has many local peaks and periods of little or no change, so the algorithm was used to detect discontinuities in the first derivative, in order to determine the peaks and phases. The peaks in the Taylor differences were used in combination with the peaks of the extrapolation error to determine the points of discontinuity. A peak in the Taylor differences means that the Taylor coefficients are significantly different at that interstitial point, compared to other interstitial points in the neighbourhood. However, if there is no peak in the extrapolation errors at the same location, then the peak found by the Taylor differences is deemed insignificant, since one polynomial could model both the left and right values and as such the peak isn’t a discontinuity. Additionally, it can be seen in Fig. 5 that both the extrapolation error and the combined error, as defined in Sect. 4, have peaks at the same locations, and as such the results they provide do not differ significantly.
6 Conclusion and Future Work
It may be concluded, from the results achieved, that the coupled constrained polynomial approximation yield a good method for the detection of \(C^n\) discontinuities in discrete observational data of continuous systems. Local peaks in the square of the difference of the Taylor polynomials provide a relative measure as a means of determining the locations of discontinuities.
Current investigations indicate that the method can be implemented directly as a convolutional operator, which will yield a computationally efficient solution. The use of discrete orthogonal polynomials [5, 10] is being tested as a means of improving the sensitivity of the results to numerical perturbations.
Notes
- 1.
For confidentiality reasons the data has been anonymized.
References
Burden, R.L., Faires, J.D.: Numerical Analysis, 9th edn. Pacific Grove, Brooks/Cole (2010)
Dung, V.T., Tjahjowidodo, T.: A direct method to solve optimal knots of B-spline curves: an application for non-uniform B-spline curves fitting. PLoS ONE 12(3), 1–24 (2017). https://doi.org/10.1371/journal.pone.0173857
Gijbels, I., Goderniaux, A.C.: Data-driven discontinuity detection in derivatives of a regression function. Commun. Stat.-Theory Methods 33(4), 851–871 (2005). https://doi.org/10.1081/STA-120028730
Klopfenstein, R.W.: Conditional least squares polynomial approximation. Math. Comput. 18(88), 659–662 (1964). http://www.jstor.org/stable/2002954
O’Leary, P., Harker, M.: Discrete polynomial moments and Savitzky-Golay smoothing. Int. J. Comput. Inf. Eng. 4(12), 1993–1997 (2010). https://publications.waset.org/vol/48
O’Leary, P., Harker, M., Zsombor-Murray, P.: Direct and least square fitting of coupled geometric objects for metric vision. IEE Proc. Vis. Image Sig. Process. 152, 687–694 (2006). https://doi.org/10.1049/ip-vis:20045206
O’Leary, P., Ritt, R., Harker, M.: Constrained polynomial approximation for inverse problems in engineering. In: Abdel Wahab, M. (ed.) NME 2018. LNME, pp. 225–244. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-2273-0_19
Orváth, L., Kokoszka, P.: Change-point detection with non-parametric regression. Statistics 36(1), 9–31 (2002). https://doi.org/10.1080/02331880210930
Owhadi, H.: Bayesian numerical homogenization. Multiscale Model. Simul. 13(3), 812–828 (2015). https://doi.org/10.1137/140974596
Persson, P.O., Strang, G.: Smoothing by Savitzky-Golay and Legendre filters. In: Rosenthal, J., Gilliam, D.S. (eds.) Mathematical Systems Theory in Biology, Communications, Computation, and Finance. IMA, vol. 134, pp. 301–315. Springer, New York (2003). https://doi.org/10.1007/978-0-387-21696-6_11
Qiu, P., Yandell, B.: Local polynomial jump-detection algorithm in nonparametric regression. Technometrics 40(2), 141–152 (1998). https://doi.org/10.1080/00401706.1998.10485196
Raissi, M., Perdikaris, P., Karniadakis, G.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019). https://doi.org/10.1016/j.jcp.2018.10.045. http://www.sciencedirect.com/science/article/pii/S0021999118307125
Saxena, H., Aponte, O., McConky, K.T.: A hybrid machine learning model for forecasting a billing period’s peak electric load days. Int. J. Forecast. 35(4), 1288–1303 (2019). https://doi.org/10.1016/j.ijforecast.2019.03.025
Spokoiny, V.: Estimation of a function with discontinuities via local polynomial fit with an adaptive window choice. Ann. Stat. 26 (1998). https://doi.org/10.1214/aos/1024691246
Wahba, G.: Spline models for observational data. Soc. Ind. Appl. Math. (1990). https://doi.org/10.1137/1.9781611970128
Weierstrass, K.: Über die analytische darstellbarkeit sogenannter willkürlicher functionen einer reellen veränderlichen. Sitzungsberichte der Königlich Preußischen Akademie der Wissenschaften zu Berlin, 1885(II), 633–639, 789–805 (1885)
Yaman, B., Hosseini, S.A.H., Moeller, S., Ellermann, J., Uǧurbil, K., Akçakaya, M.: Self-supervised physics-based deep learning MRI reconstruction without fully-sampled data (2019)
Acknowledgements
This work was partially funded by:
1. The COMET program within the K2 Center “Integrated Computational Material, Process and Product Engineering (IC-MPPE)” (Project No 859480). This program is supported by the Austrian Federal Ministries for Transport, Innovation and Technology (BMVIT) and for Digital and Economic Affairs (BMDW), represented by the Austrian research funding association (FFG), and the federal states of Styria, Upper Austria and Tyrol.
2. The European Institute of Innovation and Technology (EIT), a body of the European Union which receives support from the European Union’s Horizon 2020 research and innovation programme. This was carried out under Framework Partnership Agreement No. 17031 (MaMMa - Maintained Mine & Machine).
The authors gratefully acknowledge this financial support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2020 The Author(s)
About this paper
Cite this paper
Ninevski, D., O’Leary, P. (2020). Detection of Derivative Discontinuities in Observational Data. In: Berthold, M., Feelders, A., Krempl, G. (eds) Advances in Intelligent Data Analysis XVIII. IDA 2020. Lecture Notes in Computer Science(), vol 12080. Springer, Cham. https://doi.org/10.1007/978-3-030-44584-3_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-44584-3_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44583-6
Online ISBN: 978-3-030-44584-3
eBook Packages: Computer ScienceComputer Science (R0)