Mobile Networks and Applications

, Volume 23, Issue 4, pp 1103–1110 | Cite as

Transportation Object Detection with Bag of Visual Words Model by PLSA and MLP

  • Hyun Chul Song
  • Kwang Nam ChoiEmail author


Visual big data is an essential and significant research topic, due to its diverse applications. In this paper, a new visual detection method for transportation is proposed based on probabilistic latent semantic analysis with visual data. We detect the distinctiveness by integrating three steps as follows: first, representing the co-ocurrence matrix of images, which were vectorized using the bag of visual words (BoVW) framework; then calculating the histograms of the visual words of each class; and finally applying the test images as the visual words. A multilayer perceptron (MLP) is used as the classification method in our system. The visual words are extracted by sampling the patches from the current image. A new topology of the neural network for the BoVW model is proposed, and management of the learning rate by reducing at specific iterations is exploited. The Probabilistic latent semantic analysis (PLSA) is compared to the MLP using the Caltech 256 datasets. The classes used include cars, motorbikes, and horses. The results of the experiment show that the MLP outperforms current methods in predicting transportation objects, and properly approximates the transportation detection function with extracted local features. It shows that the proposed method yields about 4.4% higher accuracy than the conventional PLSA for all classes.


Transportation detection Bag of visual words Multi-layer perceptron Probabilistic latent semantic analysis Scale-invariant feature transform 



This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF-2010-0025512).


  1. 1.
    Han H, Han Q, Li X, Gu J (2013) Hierarchical spatial pyramid max pooling based on sift features and sparse coding for image classification. IET Comput Vis 7(2):144–150CrossRefGoogle Scholar
  2. 2.
    Ji Z (2013) Decoupling sparse coding with fusion of fisher vectors and scalable svms for large-scale visual recognition. In: Proceedings of the IEEE Conf Computer Vision and Pattern Recognition, pp 450–457Google Scholar
  3. 3.
    Parkhi OM, Vedaldi A, Zisserman A, Jawahar C (2012) Cats and dogs. In: Proceedings of the IEEE Conf Computer Vision and Pattern Recognition, pp 3498–3505Google Scholar
  4. 4.
    Fergus R, Fei-Fei L, Perona P, Zisserman A (2005) Learning object categories from google’s image search, vol 2, pp 1816–1823Google Scholar
  5. 5.
    Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the International ACM SIGIR Conf Research and Development in Information Retrieval, pp 50–57Google Scholar
  6. 6.
    Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  7. 7.
    Zhong C, Miao Z (2014) Modeling correlation between multi-modal continuous words for plsa-based video classification. In: Proceedings of the International Conference on Image Processing, pp 4304–4308Google Scholar
  8. 8.
    Pliakos K, Kotropoulos C (2014) Plsa driven image annotation, classification, and tourism recommendation. In: Proceedings of the International Conference on Image Processing, pp 3003–3007Google Scholar
  9. 9.
    Fergus R (2005) Visual object category recognitionGoogle Scholar
  10. 10.
    Choi HJ, Lee YS, Shim D-S, Lee CG, Choi KN (2016) Effective pedestrian detection using deformable part model based on human model. Int J Control Autom Syst 14(6):1618–1625CrossRefGoogle Scholar
  11. 11.
    Sivic J, Russell BC, Efros AA, Zisserman A, Freeman WT (2005) Discovering object categories in image collections. In: Proceedings of the IEEE Conference on Computer VisionGoogle Scholar
  12. 12.
    Bosch A, Zisserman A, Muñoz X (2006) Scene classification via plsa. In: Proceedings of the European Conference on Computer Vision, pp 517–530Google Scholar
  13. 13.
    Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. Proc IEEE Conf Comput Vis Pattern Recognit 1:I–IGoogle Scholar
  14. 14.
    Bui TQ, Vu TT, Hong K-S (2016) Extraction of sparse features of color images in recognizing objects. Int J Control Autom Syst 14(2):616–627CrossRefGoogle Scholar
  15. 15.
    Whoang I, Kim JH, Choi KN (2012) Object tracking using maximum colour distance under illumination change. Int J Adv Robot Syst 9(5):212CrossRefGoogle Scholar
  16. 16.
    Chang SH, Shim D-S, Kim H-Y, Choi K-N (2012) Object motion tracking using a moving direction estimate and color upyears. Int J Control Autom Syst 10(1):136–142CrossRefGoogle Scholar
  17. 17.
    Kim J, Lee GH, Jung JJ, Choi KN (2017) Real-Time Head Pose Estimation Framework for Mobile Devices. Mobile Networks and Applications 22(4):634–641CrossRefGoogle Scholar
  18. 18.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. Proc IEEE Conf Comput Vis Pattern Recognit 1:886–893Google Scholar
  19. 19.
    Bay H, Tuytelaars T, Van Gool L (2006) Surf Speeded up robust features. In: Proceedings of the European Conference on Computer Vision, pp 404–417Google Scholar
  20. 20.
    Mu Y, Yan S, Liu Y, Huang T, Zhou B (2008) Discriminative local binary patterns for human detection in personal album. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8Google Scholar
  21. 21.
    Bui KN, Jung JJ (2018) Internet of agents framework for connected vehicles: A case study on distributed traffic control system. J Parallel Distrib Comput 116:89–95CrossRefGoogle Scholar
  22. 22.
    Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst (MCSS) 2(4):303–314MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comp Vision Image Underst 106(1):59–70CrossRefGoogle Scholar
  24. 24.
    Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. California Institute of TechnologyGoogle Scholar
  25. 25.
    Harris C, Stephens M (1988) A combined corner and edge detector. Alvey Vision Conference 15(50):10–5244Google Scholar
  26. 26.
    Lowe DG (2001) Local feature view clustering for 3d object recognition. Proc IEEE Conf Comput Vis Pattern Recognit 1:I–IGoogle Scholar
  27. 27.
    Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. Proceedings of the European Conference on Computer Vision 1(1–22):1–2Google Scholar
  28. 28.
    Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. Proc IEEE Conf Comput Vis Pattern Recognit 2:524–531Google Scholar
  29. 29.
    Lowe DG (1999) Object recognition from local scale-invariant features. Proc of the IEEE Conf Computer Vision 2:1150– 1157Google Scholar
  30. 30.
    Sivic J, Zisserman A et al (2003) Video google: a text retrieval approach to object matching in videos. Proc of the IEEE Conf Computer Vision 2(1470):1470–1477CrossRefGoogle Scholar
  31. 31.
    Goodfellow I , Bengio Y, Courville A (2016) Deep Learning. MIT Press, Cambridge. zbMATHGoogle Scholar
  32. 32.
    Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines, pp 807–814Google Scholar
  33. 33.
    Orgaz GB, Jung JJ, Camacho D (2016) Social big data: Recent achievements and new challenges. Information Fusion 28:45– 59CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computer Science and EngineeringChung-Ang UniversityChung-AngKorea

Personalised recommendations