1 Introduction

As part of the natural carbon cycle, trees absorb carbon dioxide from the atmosphere, store carbon in wood and bark, and release oxygen back into the atmosphere [8]. Thus, information about carbon stock and forest biomass is crucial for the development of sustainable forest management programs, including those aiming to mitigate climate change. However, standard procedures to estimate the carbon stored in trees require knowledge about specific features, such as the trunk diameter at breast height (DBH) (typically measured at 1.3 m height) and the height of trees. There have been some efforts to undertake these tasks using remote sensing technologies, particularly LiDARs [11, 16], or its combination with cameras [15]. Nevertheless, these approaches make use devices which may require correspondingly robust infrastructure. More commonly, estimation of the overall carbon content of vegetation is still a labor-intensive, costly, error-prone and lengthy task, which includes the need to deploy personnel in the field. Thus, there is a need to develop reliable, economical, and fast strategies for the efficient management of forest resources.

Fig. 1.
figure 1

Estimating carbon content in a forest stand. We process multispectral photos from Unmanned Aerial Systems (UAS) to obtain orthomosaics. Our tree detector algorithm uses these orthomosaics as input for determining bounding boxes. We use RGB images to generate a sparse point cloud for each tree. Once located a tree and measured its structure, we obtain conventional allometric equations to estimate carbon content.

In this paper, we describe a methodology to estimate carbon stocks by detecting and measuring trees in forest stands via an Unmanned Aerial System (UAS) automatically (see Fig. 1). To identify individual trees, we employ a deep learning based approach where we create synthetic images for training. To estimate carbon content, we use aerial photos to reconstruct the scenario applying structure from motion techniques [12]. From the resulting point cloud, we determined the tree height and crown diameter, and predicted DBH of the identified trees. We implemented these methods for tree detection and used allometric equations to predict carbon content in the forest stand. Finally, we compared with estimates obtained from manual measurements of tree height and DBH. Although the focus of this paper is the determination of forest carbon, we briefly describe the tree detection method, which receives full attention in a different document.

We structure the rest of the document as follows. In Sect. 2, we review the literature covering the problem of carbon estimation in forests. Then, in Sect. 3, we provide an overview of our deep learning strategy for tree detection. Next, in Sect. 4, we introduce a model to estimate DBH from height measurements and the allometric model that computes carbon from estimated and measured DBHs. Finally, we conclude the document and delineate directions for future research.

2 Related Research

We examine the scientific literature in the areas of carbon estimation, automatic tree detection, and synthetic images generation, as related to our problem.

Carbon Estimation. Conventionally, measuring trees for biomass estimation requires field DBH measurements [4] using tools such as diameter tapes or calipers while one utilizes Clinometers or laser hypsometers for measuring tree height [13]. One inputs the measurements into allometric models, which require ground truth data to solve for parameters. Commonly, predictive models use these equations to generalize to other trees in similar conditions of soil and fertility, having conventional measures as input parameters [13]. One could calculate carbon as the product of aboveground biomass and the amount of carbon per biomass unit in the studied species of tree. Official carbon estimation methods vary for each country [10]. Escalona et al. [2] estimate the carbon contained by a stand of Pinus greggii using field measurements. They measure DBH and height for a tree stand, cut off a dry sample of trees from their study field, and obtain the total organic carbon using a combustion catalytic oxidation method.

Tree Detection. Automatic tree detection is experiencing a radical change as researchers explore deep learning approaches as opposed to classical ones [9]. Classical methods for detecting trees relied on the use of crafted features, including local maxima filtering, template matching, valley-following, watershed, region growing, and marked point processes [6]. Lately, there has been a surge in methods to detect and count plants using convolutional neural networks (CNNs). So far, researchers have employed well-established architectures such as LeNet, VGG, AlexNet or GoogleNet for classification or regression.

Synthetic Dataset Generation. Deep learning commonly requires vast amounts of labeled data to train a CNN. As the manual labeling of images is very demanding, the creation of synthetic datasets is attractive for researchers working in machine learning. Ubbens et al. [17] render 3D models of Arabidopsis thaliana rosettes and use them to create data sets for training. Han and Kerekes [7] review simulation methods for multispectral images. To verify models for biomass estimation, Fassnacht et al. [3] simulate canopy height and cover type combining the SILVA individual-tree forest simulator [14] with real LiDAR point clouds of individual trees.

3 Tree Detection Using Deep Learning

Our approach to detect trees consists of using multispectral images captured from UAS to generate the input for a CNN. As a by-product, we obtain the Digital Elevated Vegetation Map (DVEM), a representation for tree stands. One problem in deep learning is the existence of a sizable database with exemplary samples. We solve this problem with the use of synthetic datasets for training.

3.1 The Digital Elevated Vegetation Model

Digital Surface Models (DSMs) and Digital Terrain Models (DTMs) are 2.5D representations, but while the former gives information about the objects over the terrain, the latter gives the bare surface without vegetation or human-made structures. Also, one could generate indices, such as the Normalized Difference Vegetation Index (NDVI), to filter out no vegetal elements from the images. One could calculate the NDVI [18] with the red and near-infrared radiation, for \({\mathbf {x}}= (x,y)\), where \(x \in [1, w]\) and \(y \in [1, h]\), as

$$\begin{aligned} \text {NDVI}({\mathbf {x}})= \frac{\text {NI}({\mathbf {x}})-\text {RE}({\mathbf {x}})}{\text {NI}({\mathbf {x}})+\text {RE}({\mathbf {x}})}. \end{aligned}$$
(1)

In our method, we combine the DSM, DTM and NDVI models to define the Digital Elevated Vegetation Model (DEVM) as

$$\begin{aligned} \text {DEVM}({\mathbf {x}})= (\text {DSM}({\mathbf {x}}) - \text {DTM}({\mathbf {x}}))\text {NDVI}({\mathbf {x}}), \end{aligned}$$
(2)

where the subtraction of the DTM from the DSM leaves the objects over the terrain. The NDVI filters out non-vegetal objects. The DEVM representation facilitates the generation of synthetic images for the training of deep learning classifiers.

3.2 Synthetic Dataset Generation

Using the DVEM representation, we proceed to define synthetic images that closely resemble the treetops (see Fig. 2). We produce a synthetic image \({\mathbf {I}}({\mathbf {x}})\) varying randomly, over uniform distributions, the number n of trees, the position of their center \((\overline{x}_i, \overline{y}_i)\), and their width \(a_i\) and \(b_i\). We model each tree as a set of at most \(m_i\) randomly overlapping domes. We use the following analytic expression to represent each dome:

$$\begin{aligned} {\mathbf {D}}(\alpha , \beta ) = h_{ij} \cdot \cos \left( \frac{\alpha \pi }{2a_{ij}}\right) \cdot \cos \left( \frac{\beta \pi }{2b_{ij}}\right) , \end{aligned}$$
(3)

for given values of \(a_{ij}, b_{ij}\), and \(h_{ij}\), where \(\alpha \in [-a_{ij}, a_{ij}]\) and \(\beta \in [-b_{ij}, b_{ij}]\), and \(h_{ij}\) is a random gain variable.

Fig. 2.
figure 2

Samples of (a) Synthetic and (b) Real DEVM images

3.3 CNN Architecture

To identify trees in DEVM images, we used DetectNet [1], a CNN that predicts the bounding box limits and the class probabilities from images in a single pass. It includes an initial layer that divides an image into a regular cell grid of \(S\times S\) elements. Each cell predicts B bounding boxes with their respective confidence score. Correspondingly, each bounding box consists of predictions for (xy), the center of the bounding box; (wh), the width and height; and the intersection of the union (IoU) between the predicted and ground truth boxes.

To detect multiple objects in DetectNet during training, we extract the bounding boxes of each image from the annotations overlaid on the coverage map. Given the coverage map for object k, \(C_k({\mathbf {x}})\), for \({\mathbf {x}}= (x,y)\) and \(1 \le x,y \le S\), we set to 1 the positions where objects are present and 0 otherwise. We use the following loss function for training

$$\begin{aligned} \text {{ loss}} = \frac{1}{2N} \sum _{i=1}^{N}\left\{ \sum _{{\mathbf {x}}}\left( C_{i}^{t}({\mathbf {x}}) - C_{i}^{p}({\mathbf {x}}) \right) ^{2} + \lambda \left( \left| {\mathbf {u}}^{t} - {\mathbf {u}}^{p} \right| + \left| {\mathbf {l}}^{t} - {\mathbf {l}}^{p} \right| \right) \right\} , \end{aligned}$$
(4)

where N is the number of objects, \(\lambda \) weights the regularization term, \(C^t\) and \(C^p\) are the coverage maps, and \({\mathbf {u}}\) and \({\mathbf {l}}\) are the upper-left and lower-right corners for the ground truth t and the prediction p.

Fig. 3.
figure 3

Conventional measures of trees: (a) trunk DBH is the diameter of the trunk at a standard height of 1.30 m, also known as diameter at breast height; (b) total height of the tree from the ground to the top; and (c) approximated diameter of the crown from a zenithal viewpoint.

4 Carbon Content Estimation

A common practice in silviculture is to compute the carbon content from the tree trunk DBH using allometric equations (Fig. 3). In our approach, we infer a tree’s DBH using the tree height we obtain from the 3D SfM reconstruction and the location information from our tree detector. Using its location bounding box, we calculate a tree’s height from the DSM, computing the difference between the maximum and the minimum height values (see Fig. 4).

Fig. 4.
figure 4

We use the cloud of points in the resulting bounding boxes to estimate the height from DSM as the difference between the maximum and minimum height values.

To estimate the DBH for a tree, we define an allometric relationship between the DBH and the height. Firstly, we obtain paired ground truth data from field inventorying, where we use a metric tape to measure the height and DBH from a set of trees in a forest stand (see Fig. 5). In our approach, we propose to model the relationship between height, h, and DBH, d, as

$$\begin{aligned} d(h)= d_1h^2+d_2h. \end{aligned}$$
(5)

We estimate the value for the coefficients \(d_1\) and \(d_2\) using least squares and forcing a constraint making the DBH zero whenever the height equals zero.

To obtain the amount of carbon for Pinus greggii, Escalona et al. [2] cut and heated 20 six year old trees. Measuring the trees’ DBH and height, they arrived to a quadratic allometric equation expressed as

$$\begin{aligned} c(x)= c_1x^2+c_2x, \end{aligned}$$
(6)

where \(c_1 = 3287\), \(c_2=147.36\), and x = \(d^2h\) combines DBH and height. Replacing the definition of x in (6) and expanding d by (5), we arrive to the expression

$$\begin{aligned} c(h)= h d(h)^2 (c_1 h d(h)^2 + c_2). \end{aligned}$$
(7)

In SfM, where we find the structure by pointwise correspondence, the algorithms tend to underestimate tree height. Given an estimated tree height h, we correct it using

$$\begin{aligned} \hat{h} = \alpha h + \beta , \end{aligned}$$
(8)

where \(\alpha = \sigma _g/\sigma _e\) and \(\beta = \mu _g - \alpha \mu _e\) are scale and bias factors, and \(\sigma _g\) and \(\sigma _e\) and \(\mu _g\) and \(\mu _e\) correspond to the standard deviation and mean of the distribution of measured and estimated heights, respectively. A summary of the data flow is described in the Algorithm 1.

figure a

5 Experimental Results

For our experiments, we mounted a Parrot-Sequoia Micasense camera on a 3DR Solo quadcopter and flew over Las Mancañas, a 0.76 ha leaf-on(Pinus greggii) pine field with a mean distance between the trees of 5.9 m. The sampling area is located in Guanajuato, Mexico in the coordinates 20\(^\circ \)58’40.”N 100\(^\circ \)16’31.2”W. We flew at an altitude of 30 m in a double grid procedure with 85% of overlap between adjacent images along the paths of rows and columns followed by one spiral flight approximating the center of the sampled area. In this landscape, the Parrot-Sequoia produced 2,212 multispectral and RGB images with spectral response peaking in wavelengths of 550 nm (Green), 660 nm (Red), 735 nm (Red Edge, RE) and 790 nm (Near Infrared, NI) (see Fig. 5).

To train DetectNet, we generated a synthetic-labeled dataset of 12,500 synthetic DEVM images. We trained DetectNet through ten epochs, using transfer learning from a model previously trained with the KITTI database [5]. At refinement, we utilized the synthetic dataset, splitting the 12,500 images into a set of 10,000 images for training and 2,500 images for validation.

Fig. 5.
figure 5

Ground truth measurement for DBH estimation (Color figure online)

Fig. 6.
figure 6

Tree height adjustment. SfM underestimates tree height (a)–(b). We apply a correction factor based on the offset and spread (c). The linear correlation coefficient (d) with respect to the manual measurements is satisfactory.

To test our carbon content measurement model, we obtained ground truth for the sampled area through a field inventorying of 60 trees, measuring their DBH and height. The trees have an average height of 211.53 cm with a standard deviation of 26.47 cm and an average DBH of 4.64 cm with a standard deviation of 1.70 cm. To measure the height of the detected trees, we automatically extracted sub-images from the DSM for the 60 detected bounding boxes. For each of these sub-images, we calculated the height as the difference of the maximum and the minimal depth values. SfM techniques tend to underestimate the treetops height. Figure 6(a)–(b) illustrates the treetops height distributions and plot for the tape measured and SfM process, respectively. The mean and standard value for the tape measured and the SfM estimated height is (2.12 m, 1.73 m) and (0.26 m, 0.25 m), respectively. We computed adjustment variables \(\alpha \) and \(\beta \), as described in (8), as 1.08 and 0.25, respectively. Figure 6(c)–(d) shows the resulting adjustment. The linear correlation coefficient for the heights is 0.999.

To estimate the carbon content, we iteratively selected random partitions of the data, into training and testing sets, to adjust the coefficients of (5) before computing the carbon content. In the end, we evaluated (7) using both the ground truth values and the estimated ones. In Fig. 7, we illustrate the ground truth carbon content, with the blue dotted line, and the estimated carbon content with a box plot diagram. Our method estimates that the mean carbon content for the tree stand is 0.84 kg (50.4 kg for the forest stand), while the ground truth estimation is 0.94 kg (56.4.4 kg for the forest stand), the RMS value is 0.58 kg.

Fig. 7.
figure 7

Carbon estimation. The blue line represents the ground truth carbon content for the sample forest stand, while the boxplot includes the mean values, maximum and minimum value, and standard deviation for the estimated values. (Color figure online)

6 Conclusion

In this paper, we introduce a methodology to estimate carbon content in a forest stand using the photogrammetry measurements of trees taken by a UAS. We demonstrate that a system built out of this methodology can successfully be scaled up by estimating the carbon content of a parcel of Pinus greggii. During the development of this research, we introduce a tree detection method based on the use of a CNN. The DEVM representation made it possible to develop a strategy to construct synthetic ground truth data useful for training, alleviating the need for labeling ground truth data. Our method reduces the resources that are necessary to obtain those measures with classical approaches with on-field personnel.

In the future, we are planning to develop models for carbon estimation circumventing the use of allometric equations based on DBH. As we are aiming to increase the precision of our estimation, we may rely on the use of biomass change over time.