A Simple Methodology for 2D Reconstruction Using a CNN Model
- 89 Downloads
In recent years, Deep Learning research have demonstrated their effectiveness in digital image processing, mainly in areas with heavy computational load. Such is the case of aerial photogrammetry, where the principal objective is to generate a 2D map or a 3D model from a specific terrain. In these topics, high-efficiency in visual information processing is demanded. In this work we present a simple methodology to build an orthomosaic, our proposal is focused in replacing traditional digital imagen processing using instead a Convolutional Neuronal Network (CNN) model. The dataset of aerial images is generated from drone photographs of our university campus. The method described in this article uses a CNN model to detect matching points and RANSAC algorithm to correct feature’s correlation. Experimental results show that feature maps and matching points obtained between pair of images through a CNN are comparable with those obtained in traditional artificial vision algorithms.
KeywordsDeep Learning CNN 2D reconstruction Aerial images
Image stitching produces a mosaic that corresponds to a set of images taken from one or several cameras which overlap and are joined in a single image . In the generation of this mosaic several computer vision techniques are used. We worked with aerial images and computer vision strategies combined with photogrammetry techniques.
The stitching process is usually made with traditional computer vision methods as shown in Fig. 1a. It begins with a drone flight plan to image acquisition of a selected area. Then placeholders with georeferenced points are added over a map as well as flight height and overlapping percentage between each pair of acquired images. Usually a mobile application is configured with these specifications to acquire the information autonomously. Some popular free apps to help in this stage are Pix4D and DroneDeploy.
Scale Space Extrema Detection: identify a location and scales key points using scale space extrema in the DoG (Difference-of-Gaussian) functions with different values of standard deviation.
Key point Localization: key point candidates are localized and refined by eliminating low contrast points.
Orientation Assignment: orientation of key point is obtained based on local image gradient.
Description Generation: compute the local image descriptor for each key point based on image gradient magnitude and orientation at each image sample point in a region centered at key point.
These steps generate a 128-dimension key point descriptor.
Hypothesize. First minimal sample sets (MSSs) are randomly selected from the input dataset and model parameters are computed using only elements of the MSS. Cardinality of MSS is the smallest sufficient to determine the model parameters (as opposed to other approaches, such as least squares, where parameters are estimated using all data available, possibly with appropriate weights).
Test. In the second step, RANSAC checks which elements of the entire dataset are consistent with the model instantiated using parameters estimated in the first step. The set of such elements is called consensus set (CS).
RANSAC terminates when the probability of finding a better ranked CS drops below a certain threshold. In their original formulation the ranking of CS was its cardinality (i.e. CSs that contain more elements are ranked better than CSs that contain fewer elements).
This is the best option to adjust the correspondences and eliminate features that do not meet a reference value. The final stage is to build an orthomosaic with all previously performed procedures. In this step, computer vision techniques are used to join all photographs into one.
It should be noted that the most complex task is orthomosaic generation. It is extremely complex, however, recent research has demonstrated great efficiency of convolutional neural networks (CNN) in digital image processing [1, 11, 22], that is why this investigation uses a CNN to built an orthomosaic from Technological University of the Mixteca (UTM) campus with aerial images obtained from an Unmanned Aerial Vehicle (UAV).
2 Related Work
Aerial photogrammetry is a procedure to obtain plans for large land areas by means of aerial photographs . The result is a 2D map or a 3D terrain model. To do this we need to apply computer vision techniques and algorithms.
Research has been carried out with the purpose of perform improvements such as the work of  where SIFT algorithm is used to feature extraction and digital surface models (DSM) were generated from UAV images in high resolution. Similarly, in , the author proposes to use new algorithms for surface reconstruction. These approaches demand still high computational complexity.
Nevertheless, recent research has included studies in Deep Learning approaches such as presented in [5, 10, 24, 26] where they perform image pairing and 3D reconstructions using deep neuronal network techniques. Obtained results are quite acceptable, however, proposed models are very complex and often require additional information from external sensors .
UTM campus image dataset. This is the way images have been organized, so that they can be used to adjust the CNN model.
30% \(\times \) 30%
50% \(\times \) 50%
Then, based on the DELF model and Noh’s work, we used our new dataset including 880 aerial images rescaled to \(250\,\times \,250\) pixels (Table 1). This dataset was created by capturing multiple aerial images of the entire university campus. Due to the terrain conditions of the campus, a minimum safe flight-height of 100 m and a maximum of 150 m were selected. Overlapping percentages among captured images were considered with two configurations, the first set with \(30\%\) both longitudinally and transversely, and the second \(50\%\) in both directions.
After finding correspondences, outliers must be eliminated from estimation through the fundamental matrix since internal parameters of the camera are unknown. However, many of correspondences are faulty and estimating the parameter set with all coordinates is not enough. Therefore, RANSAC algorithm is used on top of the normal model to robustly estimate the parameter set by detecting outliers. The main objective is to determine geometric transformation between both images, that is, to define the fundamental matrix that relates two views of planar target. RANSAC algorithm can help computing the homography matrix [7, 16] starting with acquired correspondences. Then, we use RANSAC with the feature vectors extracted from images as a set of observed data points. Moreover, as the model that can be fitted to data points we used an affine transform model. We end up having a set of source and destination coordinates which can be used to estimate the geometric transformation between both images and building an orthomosaic with all previously performed procedures.
4 Experimental Results
In order to evaluate our proposal we analyze qualitative results in two stages. In the first one, we determine the efficiency of our process for feature extraction and matching features in the dataset. In the second experiment, we check results for orthomosaic generation.
Comparison between our resulting orthomosaics and other reconstructions. This table shows the Euclidean distance as a measure of similarity between orthomosaics. Manual reconstruction was performed with images at \(50\%\) of their original resolution. Aerial image was taken at twice the reference height. Pix4DMapper’s orthomosaic only shows \(75\%\) of total established area.
Resulting orthomosaic vs
Image at twice of the reference height
Orthomosaic from Pix4DMapper
In this work a simple methodology to built orthomosaics using aerial images is presented. This study focuses on verify the methodology that uses a deep neuronal network model. Preliminary results generating orthomosaics have been verified qualitatively obtaining feature maps and matching points between images pairs.
Resulting orthomosaics were evaluated using Euclidean distance as a similarity measure. Orthomosaic obtained was compared with: a manual reconstruction, an image captured at a higher height and a reconstruction obtained with commercial software. It is showed that our methodology provides similar results to those obtained as described before but with a high-definition details. Our results are as well comparable with those obtained with traditional computer vision algorithms.
On the other hand, reconstruction of larger areas such as the entire campus of the university with a high-resolution orthomosaic map is being considered for future work.
- 1.Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: Netvlad: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5297–5307 (2016)Google Scholar
- 3.Asale, R., Rae: fotogrametría: Diccionario de la lengua española, November 2019. https://dle.rae.es/fotogrametria
- 4.Barazzetti, L., Remondino, F., Scaioni, M.: Extraction of accurate tie points for automated pose estimation of close-range blocks. In: ISPRS Technical Commission III Symposium on Photogrammetric Computer Vision and Image Analysis (2010)Google Scholar
- 6.Cheng, Y., Xue, D., Li, Y.: A fast mosaic approach for remote sensing images. In: 2007 International Conference on Mechatronics and Automation, pp. 2009–2013. IEEE (2007)Google Scholar
- 8.Escalante Torrado, J.O., Porras Díaz, H., et al.: Ortomosaicos y modelos digitales de elevación generados a partir de imágenes tomadas con sistemas uav. Tecnura 20(50), 119–140 (2016)Google Scholar
- 11.Gordo, A., Almazán, J., Revaud, J., Larlus, D.: Deep image retrieval: learning global representations for image search. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part VI. LNCS, vol. 9910, pp. 241–257. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_15CrossRefGoogle Scholar
- 12.He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
- 13.Li, J., Ai, M., Hu, Q., Fu, D.: A novel approach to generating DSM from high-resolution UAV images. In: 2014 22nd International Conference on Geoinformatics (GeoInformatics), pp. 1–5. IEEE (2014)Google Scholar
- 15.Li, T., Hailes, S., Julier, S., Liu, M.: UAV-based SLAM and 3D reconstruction system. In: 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 2496–2501. IEEE (2017)Google Scholar
- 16.Li, X., Liu, Y., Wang, Y., Yan, D.: Computing homography with RANSAC algorithm: a novel method of registration. In: Electronic Imaging and Multimedia Technology IV, vol. 5637, pp. 109–112. International Society for Optics and Photonics (2005)Google Scholar
- 17.Lingua, A., Marenchino, D., Nex, F.: Automatic digital surface model (DSM) generation procedure from images acquired by unmanned aerial systems (UASS). RevCAD J. Geodesy Cadastre 9, 53–64 (2009)Google Scholar
- 18.Lingua, A., Marenchino, D., Nex, F.: A comparison between “old and new” feature extraction and matching techniques in photogrammetry. RevCAD J. Geodesy Cadastre 9, 43–52 (2009)Google Scholar
- 19.Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157. IEEE (1999)Google Scholar
- 21.Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3456–3465 (2017)Google Scholar
- 22.Radenović, F., Tolias, G., Chum, O.: CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 3–20. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_1CrossRefGoogle Scholar
- 25.Teichmann, M., Araujo, A., Zhu, M., Sim, J.: Detect-to-retrieve: efficient regional aggregation for image search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5109–5118 (2019)Google Scholar
- 26.Weerasekera, C.S., Latif, Y., Garg, R., Reid, I.: Dense monocular reconstruction using surface normals. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2524–2531. IEEE (2017)Google Scholar
- 27.Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)Google Scholar