Multi-objects Detection and Segmentation for Scene Understanding Based on Texton Forest and Kernel Sliding Perceptron


In the recent days, scene understanding has become hot research topic due to its real usage at perceiving, analyzing and recognizing different dynamic scenes coverage during GPS monitoring system, drone’s targets, auto-driving and tourist guide. The goal of scene understanding is to make machines look at like humans do, which means the accurate recognition of the contents in scenes and during location observations. Then, we perform two operations such as (1) to perfectly describe the whole environment and (2) to describe what action is going on in the environment. Due to complex scene analysis, recognition of multiple objects and the relation between the objects remain as a challenging part of the research. In this paper, we have proposed a novel approach for the scene understanding that integrates multiple objects detection/segmentation and scene labeling using Geometric features, Histogram of oriented gradient and scale invariant feature transform descriptors. The complete procedure of the purposed model includes resizing and noise removing of images from the dataset, multiple object segmentation and detection, feature extraction and multiple object recognition using multi-layer kernel sliding perceptron. After that, scene recognition is achieved by using multi-class logistic regression. Finally, two datasets such as MSRC and UIUC sports are used for the experimental evaluation of our proposed method. Our purposed method accurately handles the complex objects physical exclusion and objects occlusion. Therefore, it outperforms in term of accuracy compared with other state-of-the-art approaches.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11


  1. 1.

    Osterland S, Weber J (2019) Analytical analysis of single-stage pressure relief valves. Int J Hydromechatron 2:32–53

    Article  Google Scholar 

  2. 2.

    Tingting Y, Junqian W, Lintai W, Yong X (2019) Three-stage network for age estimation. CAAI Trans Intell Technol 4(2):122–126

    Article  Google Scholar 

  3. 3.

    Susan S, Agrawal P, Mittal M, Bansal S (2019) New shape descriptor in the context of edge continuity. CAAI Trans Intell Technol 4(2):101–109

    Article  Google Scholar 

  4. 4.

    Zhu C, Miao D (2019) Influence of kernel clustering on an RBFN. CAAI Trans Intell Technol 4(4):255–260

    Article  Google Scholar 

  5. 5.

    Ahmed A, Jalal A, Kim K (2020) RGB-D images for object segmentation, localization and recognition in indoor scenes using feature descriptor and Hough voting. In: proceedings of IBCAST, pp 290–295

  6. 6.

    Jalal A, Kim YH, Kim YJ, Kamal S, Kim D (2017) Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recognit 61:295–308

    Article  Google Scholar 

  7. 7.

    Zhao M, Zhan C, Wu Z, Tang P (2018) Semi-supervised image classification based on local and global regression. IEEE Signal Process Lett 10:1666–1670

    Google Scholar 

  8. 8.

    Heitz G, Gould S, Saxena A, Koller D (2009) Cascaded classification models: combining models for holistic scene understanding. In: Advances in neural information processing systems, pp 641–648

  9. 9.

    Shokri M, Tavakoli K (2019) A review on the artificial neural network approach to analysis and prediction of seismic damage in infrastructure. Int J Hydromechatron 2:178–196

    Article  Google Scholar 

  10. 10.

    Almira GA, Harsono T, Sigit R, Bimantara IGNTB, Saputra JM (2016) Performance analysis of Gaussian and bilateral filter in case of determination the fetal length. In: Proceedings of KCIC, pp 246–252

  11. 11.

    Shotton J, Johnson M, Cipolla R (2008) Semantic texton forests for image catsegorization and segmentation. In: Proceedings of CVPR, pp 1–8

  12. 12.

    Mahmood M, Jalal A, Kim K (2019) WHITE STAG model: wise human interaction tracking and estimation (WHITE) using spatio-temporal and angular-geometric (STAG) descriptors. Multimedia Tools Appl 79:6919–6950

    Article  Google Scholar 

  13. 13.

    Xu J, Ramos S, Vázquez D, López AM (2014) Domain adaptation of deformable part-based models. IEEE Trans Pattern Anal Mach Intell 36(12):2367–2380

    Article  Google Scholar 

  14. 14.

    Wiens T (2019) Engine speed reduction for hydraulic machinery using predictive algorithms. Int. J. Hydromechatron. 2:16–31

    Article  Google Scholar 

  15. 15.

    Lai Z, Deng H (2018) Medical image classification based on deep features extracted by deep model and statistic feature fusion with multilayer perceptron‬. Comput Intell Neurosci 2018:1–13

    Article  Google Scholar 

  16. 16.

    Quaid MAK, Jalal A (2020) Wearable sensors based human behavioral pattern recognition using statistical features and reweighted genetic algorithm. Multimedia Tools Appl 79:6061–6083

    Article  Google Scholar 

  17. 17.

    Xu P, Davoine F, Denoeux T (2015) Evidential multinomial logistic regression for multiclass classifier calibration. In: Proceedings of IEEE conference on information fusion, pp. 1106–1112, 2015.

  18. 18.

    Shotton J, Winn J, Rother C, Criminisi A (2006) Textonboost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Proceedings of European conference on computer vision, pp 1–15

  19. 19.

    Li LJ, Fei-Fei L (2007) What, where and who? Classifying event by scene and object recognition. In: Proceedings of IEEE conference on computer vision, pp. 1–8, 2007.

  20. 20.

    Irie G, Liu D, Li Z, Chang SF (2013) A bayesian approach to multimodal visual dictionary learning. In: Proceedings of CVPR, pp. 329–336, 2013.

  21. 21.

    Mottaghi R, Fidler S, Yuille A, Urtasun R, Parikh D (2015) Human-machine CRFs for identifying bottlenecks in scene understanding. IEEE Trans Pattern Anal Mach Intell 38(1):74–87

    Article  Google Scholar 

  22. 22.

    Liu X, Yang W, Lin L, Wang Q, Cai Z, Lai J (2015) Data-driven scene understanding with adaptively retrieved exemplars. Multimedia 22(3):82–92

    Article  Google Scholar 

  23. 23.

    Du L, Ren L, Dunson D, Carin L (2009) A Bayesian model for simultaneous image clustering, annotation and object segmentation. Adv Neural Inf Process Syst 22:486–494

    Google Scholar 

  24. 24.

    Rafique AA, Jalal A, Ahmed A (2019) Scene understanding and recognition: statistical segmented model using geometrical features and Gaussian Naïve Bayes. In: Proceedings of ICAEM, pp.225–230

  25. 25.

    Feng J, Fu A (2018) Scene semantic recognition based on probability topic model. Information 9(4):1–13

    Article  Google Scholar 

Download references


This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education (No. 2018R1D1A1A02085645). Also, this work was supported by the Korea Medical Device Development Fund grant funded by the Korea government (the Ministry of Science and ICT, the Ministry of Trade, Industry and Energy, the Ministry of Health & Welfare, the Ministry of Food and Drug Safety) (Project Number: 202012D05-02).

Author information



Corresponding author

Correspondence to Kibum Kim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ahmed, A., Jalal, A. & Kim, K. Multi-objects Detection and Segmentation for Scene Understanding Based on Texton Forest and Kernel Sliding Perceptron. J. Electr. Eng. Technol. (2021).

Download citation


  • Logistic regression
  • Multi-layer perceptron
  • Neural network
  • Scene understanding
  • Texton forest segmentation