We thank the discussants for taking the effort to provide insightful comments on our paper. We offer a brief response to these comments.

Schliep provides some thoughts on scalability. We note, again, that inference for slope and aspect is a post-model fitting activity (as described in Section 4.2). So, the primary concern here appears to reside in fitting a DEM to a large dataset of elevations (much more than our 456 monitoring sites in the Cape Floristic Region). Prediction to a dense grid using posterior samples is not so computationally demanding. Furthermore, the DEM is usually a simple geostatistical model and with the rapidly growing literature on the “big n” problem (e.g., Datta et al. 2016; Katzfuss 2017) we find more and more tools to address this challenge. More importantly, as Schliep points out, it is important not to employ a model which produces elevation surfaces that are too smooth since this will smooth away the local gradient behavior we seek to extract.

In this regard, the nearest-neighbor Gaussian process (NNGP) appears to be attractive, as she suggests (and so do Jona Lasinio and Mastrantonio). By construction, the NNGP provides a sparse inverse covariance matrix for the locations in a reference set and convenient conditionally independent distributions for locations not in this set, given neighbors in the set. However, upon reflection, it does not provide an explicit covariance function. Hence, the required gradient and Hessian matrices to obtain the cross-covariance function for slope and aspect inference are not accessible. To work with the NNGP, we would have to replace the process realization, \(Y({\varvec{s}})\) with say \(\mathbb {E}\big (Y({\varvec{s}})|N_{Y({\varvec{s}})} \big )\) where \(N_{Y({\varvec{s}})}\) is the neighbor set for location \({\varvec{s}}\). As possible justification, we can argue that this conditional expectation surface becomes arbitrarily close to the \(Y({\varvec{s}})\) surface, almost everywhere in expected mean square, as \(N_{Y({\varvec{s}})}\) becomes more dense around \({\varvec{s}}\). Further, this conditional mean surface is almost everywhere mean square differentiable (see Schliep and Gelfand 2018). However, as with the predictive process, in practice, the conditional mean surface may be too smooth (though the \(Y({\varvec{s}})\) surface will not be).

A second point made by Schliep concerns the use of slope and aspect output from our modeling as predictors in spatial regression modeling. Suppose we consider a simple spatial regression specification of the form

$$\begin{aligned} Z({\varvec{s}}) = \beta _{0} + \beta _{1}\text {elev}({\varvec{s}}) + \beta _{2} \text {slope}({\varvec{s}}) + \beta _{3}\text {aspect}({\varvec{s}}) + w({\varvec{s}}) + \epsilon ({\varvec{s}}), \end{aligned}$$
(1)

where \(w({\varvec{s}})\) is the usual Gaussian process for the residual, independent of the elevation, slope, and aspect processes and \(\epsilon ({\varvec{s}})\) is the usual independent error process with variance \(\tau ^2\). If we want to build a hierarchical model to propagate the uncertainty in these processes when they are used as predictors, it will clearly be much easier to work in the space of the elevation and gradient surfaces since we have available Gaussian distribution theory. We should re-write (1) as

$$\begin{aligned} Z({\varvec{s}}) = \beta _{0} + \beta _{1}\text {elev}({\varvec{s}}) + \beta _{2} D_{{\varvec{e}}_{1}}({\varvec{s}}) + \beta _{3}D_{{\varvec{e}}_{2}}({\varvec{s}}) + w({\varvec{s}}) + \epsilon ({\varvec{s}}). \end{aligned}$$
(2)

As a result, the hierarchical specification takes the form:

$$\begin{aligned}&\prod _{i} \left[ Z({\varvec{s}}_{i})| {\varvec{\beta }}, \text {elev}({\varvec{s}}_{i}), D_{{\varvec{e}}_{1}}({\varvec{s}}_{i}), D_{{\varvec{e}}_{2}}({\varvec{s}}_{i}), w({\varvec{s}}_{i}), \tau ^2\right] \nonumber \\&\quad \times \,[\text {elev}({\varvec{s}}_{i}), D_{{\varvec{e}}_{1}}({\varvec{s}}_{i}), D_{{\varvec{e}}_{2}}({\varvec{s}}_{i})][w({\varvec{s}}_{i})] \end{aligned}$$
(3)

with parameters in the elevation process and in the \(w({\varvec{s}})\) process, along with suitable hyperpriors. However, while this is an appropriate specification to capture the stochasticity in the regressors, it may be challenging to fit and to identify.

Schliep poses a nice question regarding the possible confounding between \(w({\varvec{s}})\) and the elevation and derived processes. We note that introducing orthogonality between the vector of \(w({\varvec{s}}_{i})\) and the vectors of \(D_{{\varvec{e}}_{1}}({\varvec{s}}_{i}), D_{{\varvec{e}}_{2}}({\varvec{s}}_{i})\) will not correspond to orthogonality for the nonlinear function vectors, of \(R({\varvec{s}}_{i})\) and \(\theta _{\mathrm{asp}}({\varvec{s}}_{i})\).

Her concluding thoughts suggest attractive further applications of our methodology. Slope and aspect computation for the ocean floor, for temperature and environmental contaminant surfaces, even for intensity surfaces driving spatial point patterns, say log Gaussian Cox processes (Møller et al. 1998; Brix and Diggle 2001), could all be of interest.

Turning to the comments of Banerjee, his primary point seems to be the possibility of looking at a broader range of features associated with a realization of a stochastic process over a subset of \(\mathbb {R}^{2}\). In particular, linear functionals offer convenient distribution theory. Apart from those he mentions, we could also consider moving to three-dimensional spaces where other angular and distance functions of the gradient basis vectors at location \({\varvec{s}}\) could be explored. Also, we can consider both global functionals and local functionals associated with the random surface.

With spatio-temporal stochastic process models, apart from the work of Quick et al. (2015) which he cites, there is also work using process realizations to study velocity surfaces, arising as a ratio of gradients at a given \({\varvec{s}}\) in a given direction (Schliep et al. 2015; Schliep and Gelfand 2018).

Banerjee’s words regarding integrating available gradient information into our modeling is attractive. Evidently, if this information arises as a function of the available elevation data, then it seems incoherent to think about using it to fit the elevation model (which induces post-model fitting for the gradient inference). However, prior information regarding the nature of the gradient surface, e.g., that, at a location, it is increasing over a particular portion of the available directions, could perhaps be introduced into the gradient prediction.

Finally, he makes an important point regarding the distinction between the gradient behavior (hence, slope and aspect behavior) of say the mean of the observed surface vs. that of a residual surface, adjusted for spatial covariates. If the mean surface is say of the form \(\mathbf {X}^{\mathrm {\scriptscriptstyle T}}({\varvec{s}}){\varvec{\beta }}+ w({\varvec{s}})\), then we can examine the gradient behavior of the \(w({\varvec{s}})\) process as long as it is mean square differentiable. In order to study the mean surface, we require differentiability of the entries in \(\mathbf {X}({\varvec{s}})\). Customarily, these are not supplied as functions and are most often supplied as tiled surfaces over the region of interest; differentiation is not available. As a result, in other work looking at gradients of mean surfaces, we find constructed differentiable covariates, e.g., in space as distances from particular surface landmarks, in time as periodic functions (e.g., Schliep and Gelfand 2018).

Lastly, we turn to words of Jona Lasinio and Mastrantonio. They raise the interesting challenge of adding dynamics to our slope and aspect ideas. As they note, with elevation surfaces, there will not be any dynamics but in other applications, there may be. They mention geological examples; we can add climate surfaces and land value surfaces. The question here is whether we consider time to be discrete or to be continuous. With discrete time, we would imagine a dynamic model for the spatial surfaces which, upon fitting, would induce a dynamic process for slope and for aspect. This seems to be an attractive area for further investigation. With continuous time, we would introduce a space–time covariance specification which would lead to gradients in time as well as in space. One way to study this dynamically is through the notion of a velocity. This approach has been presented in the two papers referenced above.

Jona Lasinio and Mastrantonio ask about how uncertainty in the elevation surface is propagated to the slope and aspect processes. Samples from the posterior predictive distribution for say, \(\nabla Y({\varvec{s}}_{0})\), hence for \(g(\nabla Y({\varvec{s}}_{0}))\) arise from the composition sampling described in Section 4.2. These samples will inherit the uncertainty which is associated with our elevation model.

Finally, citing some very recent work, they comment on the difficulty in interpreting regression coefficients associated with the projected normal model when these coefficients and associated regressors appear in the bivariate Gaussian process that induces the projected Gaussian process. We can offer two thoughts here. First, we need the covariate surfaces to be differentiable as discussed above. Second, if we do have covariate information, we would introduce it into the elevation model. Hence, the gradient behavior in the presence of these covariates would be inherited in the mean of \(\nabla Y({\varvec{s}})= (D_{{\varvec{e}}_{1}}Y({\varvec{s}}), D_{{\varvec{e}}_{2}}Y({\varvec{s}}))^{\mathrm {\scriptscriptstyle T}}\). It is easy to see that this mean is a linear function of the derivatives of the covariate vector in the \((1,0)^{\mathrm {\scriptscriptstyle T}}\) and \((0,1)^{\mathrm {\scriptscriptstyle T}}\) directions, respectively, at \({\varvec{s}}\). This is the bivariate Gaussian process which induces the projected Gaussian process. So, the interpretation would be with regard to the derivative surfaces of the covariates rather than the covariate surfaces themselves.