Advertisement

Counting the Uncountable: Deep Semantic Density Estimation from Space

  • Andres C. RodriguezEmail author
  • Jan D. Wegner
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11269)

Abstract

We propose a new method to count objects of specific categories that are significantly smaller than the ground sampling distance of a satellite image. This task is hard due to the cluttered nature of scenes where different object categories occur. Target objects can be partially occluded, vary in appearance within the same class and look alike to different categories. Since traditional object detection is infeasible due to the small size of objects with respect to the pixel size, we cast object counting as a density estimation problem. To distinguish objects of different classes, our approach combines density estimation with semantic segmentation in an end-to-end learnable convolutional neural network (CNN). Experiments show that deep semantic density estimation can robustly count objects of various classes in cluttered scenes. Experiments also suggest that we need specific CNN architectures in remote sensing instead of blindly applying existing ones from computer vision.

Keywords

Remote sensing Computer vision Density estimation Deep learning 

Notes

Acknowledments

This project is funded by Barry Callebaut Sourcing AG as a part of a Research Project Agreement with ETH Zurich.

References

  1. 1.
    A survey of recent advances in CNN-based single image crowd counting and density estimation. Pattern Recogn. Lett. 107, 3–16 (2018). Video Surveillance-oriented BiometricsGoogle Scholar
  2. 2.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)CrossRefGoogle Scholar
  3. 3.
    Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)CrossRefGoogle Scholar
  4. 4.
    Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. CoRRGoogle Scholar
  5. 5.
    Chen, L., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. CoRRGoogle Scholar
  6. 6.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  7. 7.
    Doupe, P., Bruzelius, E., Faghmous, J., Ruchman, S.G.: Equitable development through deep learning: the case of sub-national population density estimation. In: Proceedings of the 7th Annual Symposium on Computing for Development, p. 6. ACM (2016)Google Scholar
  8. 8.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
  9. 9.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017)Google Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  11. 11.
    Joshi, C., De Leeuw, J., Skidmore, A., van Duren, I., van Osten, H.: Remotely sensed estimation of forest canopy density: a comparison of the performance of four methods. Int. J. Appl. Earth Obs. Geoinf. 8(2), 84–95 (2006)CrossRefGoogle Scholar
  12. 12.
    Kuo, T.S., Tseng, K.S., Yan, J.W., Liu, Y.C., Frank Wang, Y.C.: Deep aggregation net for land cover classification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2018Google Scholar
  13. 13.
    Lin, G., Milan, A., Shen, C., Reid, I.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5168–5177 (2017)Google Scholar
  14. 14.
    Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  15. 15.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  16. 16.
    Liu, W., Rabinovich, A., Berg, A.C.: ParseNet: looking wider to see better. CoRRGoogle Scholar
  17. 17.
    Liu, X., van de Weijer, J., Bagdanov, A.D.: Leveraging unlabeled data for crowd counting by learning to rank. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar
  18. 18.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  19. 19.
    Mallat, S.: A Wavelet Tour of Signal Processing: The Sparse Way, 3rd edn. Academic Press Inc., Orlando (2008)zbMATHGoogle Scholar
  20. 20.
    Marmanis, D., Wegner, J.D., Galliani, S., Schindler, K., Datcu, M., Stilla, U.: Semantic segmentation of aerial images with an ensemble of CNNs. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 3, 473 (2016)CrossRefGoogle Scholar
  21. 21.
    Máttyus, G., Luo, W., Urtasun, R.: DeepRoadMapper: extracting road topology from aerial images. In: International Conference on Computer Vision, vol. 2 (2017)Google Scholar
  22. 22.
    Meynberg, O., Cui, S., Reinartz, P.: Detection of high-density crowds in aerial images using texture classification. Remote Sens. 8(6), 470 (2016)CrossRefGoogle Scholar
  23. 23.
    Mutanga, O., Adam, E., Cho, M.: High density biomass estimation for wetland vegetation using Wordlview-2 imagery and random forest regression algorithm. Int. J. Appl. Earth Obs. Geoinf. 18, 399–406 (2012)CrossRefGoogle Scholar
  24. 24.
    Pohlen, T., Hermans, A., Mathias, M., Leibe, B.: Full-resolution residual networks for semantic segmentation in street scenes. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  25. 25.
    Postadjian, T., Le Bris, A., Sahbi, H., Mallet, C.: Investigating the potential of deep neural networks for large-scale classification of very high resolution satellite images. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 4, 183 (2017)CrossRefGoogle Scholar
  26. 26.
    Pryzant, R., Ermon, S., Lobell, D.: Monitoring ethiopian wheat fungus with satellite imagery and deep feature learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017Google Scholar
  27. 27.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRefGoogle Scholar
  28. 28.
    Robinson, C., Hohman, F., Dilkina, B.: A deep learning approach for population estimation from satellite imagery. In: Proceedings of the 1st ACM SIGSPATIAL Workshop on Geospatial Humanities, pp. 47–54. ACM (2017)Google Scholar
  29. 29.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  30. 30.
    Russwurm, M., Korner, M.: Temporal vegetation modelling using long short-term memory networks for crop identification from medium-resolution multi-spectral satellite images. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017Google Scholar
  31. 31.
    Shang, C., Ai, H., Bai, B.: End-to-end crowd counting via joint learning local and global count. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1215–1219 (2016)Google Scholar
  32. 32.
    Zhang, T., Huang, X., Wen, D., Li, J.: Urban building density estimation from high-resolution imagery using multiple features and support vector regression. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 10(7), 3265–3280 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.ETH ZurichZurichSwitzerland

Personalised recommendations