Bag of Features vs Vector of Locally Aggregated Descriptors

  • Farkhunda Younas
  • Junaid BaberEmail author
  • Tahir Mahmood
  • Javeria Farooq
  • Maheen Bakhtyar
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 16)


Image representation by set of local features are common and also state-of-the art for many applications such as image retrieval and image classification. A single image contains on average 2.5 k–3.0 k features. Searching the images based on local features are discriminative compared to global features at the cost of heavy computational overhead. Bag-of-Features (BoF), also known as bag-of-visual words, are used for feature quantization which makes searching local features feasible in very large databases at the cost of distinctiveness. Mostly, the vocabulary size in those applications is kept up-to 1 million. In this research study, we investigated the performance of Vector of Locally Aggregated Descriptors (VLAD) which is recently proposed as an alternative to BoF for different families of descriptor. The VLAD achieves similar or sometimes better performance when compared to BoF despite of limited vocabulary size. The performance of VLAD is mostly compared with BoF on gradient based descriptors in literature. In our experiments, we take gradient based descriptor, intensity based descriptor, and binary descriptor. Scale Invariant Feature Transform (SIFT), Local Intensity Order Pattern (LIOP) and BInarization of Gradient Orientation Histograms (BIGOH) are used to validate the performance of VLAD in parallel to BoF on famous benchmark dataset. VLAD outperforms BoF in gradient based family and intensity based family but non of these are feasible for binary descriptors.


Bag-of-Features (BoF) Local features Locally aggregated descriptors (VLAD) SIFT 



This research work is supported by Higher Education Commission (HEC) of Pakistan, SBK women university, and university of Balochistan.


  1. 1.
    Yu, F.X., Ji, R., Tsai, M.-H., Ye, G., Chang, S.-F.: Weak attributes for large-scale image retrieval. In: International Conference on Computer Vision and Pattern Recognition, pp. 2949–2956 (2012)Google Scholar
  2. 2.
    Gong, Y., Lazebnik, S., Gordo, A., Perronnin, F.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2916–2929 (2013)CrossRefGoogle Scholar
  3. 3.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999)Google Scholar
  4. 4.
    Baber, J., Dailey, M.N., Satoh, S., Afzulpurkar, N., Bakhtyar, M.: BIG-OH: binarization of gradient orientation histograms. Image Vis. Comput. 32(11), 940–953 (2014)CrossRefGoogle Scholar
  5. 5.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  6. 6.
    Wu, Z., Ke, Q., Isard, M., Sun, J.: Bundling features for large scale partial-duplicate web image search. In: Computer Vision and Pattern Recognition, pp. 25–32 (2009)Google Scholar
  7. 7.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Computer Vision and Pattern Recognition, pp. 1–8 (2007)Google Scholar
  8. 8.
    Jégou, H., Douze, M., Schmid, C.: Packing Bag-of-Features. In: International Conference on Computer Vision, pp. 2357–2364 (2009)Google Scholar
  9. 9.
    Baber, J., Afzulpurkar, N., Satoh, S.: A framework for video segmentation using global and local features. Int. J. Pattern Recogn. Artif. Intell. 27(05) (2013)Google Scholar
  10. 10.
    Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: International Conference on Computer Vision, pp. 1470–1477 (2003)Google Scholar
  11. 11.
    Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: International Conference on Computer Vision and Pattern Recognition, pp. 3304–3311 (2010)Google Scholar
  12. 12.
    Yuan, X., Yu, J., Qin, Z., Wan, T.: A SIFT-LBP image retrieval model based on bag of features. In: IEEE International Conference on Image Processing (2011)Google Scholar
  13. 13.
    Wang, Z., Fan, B., Wu, F.: Local intensity order pattern for feature description. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 603–610. IEEE (2011)Google Scholar
  14. 14.
    Yu, S., Jurie, F.: Improving image classification using semantic attributes. Int. J. Comput. Vis. 100(1), 59–77 (2012)CrossRefGoogle Scholar
  15. 15.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  16. 16.
    Xiao, J., Hays, J., Ehinger, K., Oliva, A., Torralba, A., et al.: Sun database: large-scale scene recognition from abbey to zoo. In: International Conference on Computer Vision and Pattern Recognition, pp. 3485–3492 (2010)Google Scholar
  17. 17.
    Baber, J., Satoh, S., Afzulpurkar, N., Keatmanee, C.: Bag of visual words model for videos segmentation into scenes. In: Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service, pp. 191–194 (2013)Google Scholar
  18. 18.
    Hota, A.: Comparison of some bag-of-words models for image recognition. In: 2014 X International Symposium on Telecommunications (BIHTEL), pp. 1–5 (2014)Google Scholar
  19. 19.
    Peng, X., Wang, L., Qiao, Y., Peng, Q.: Boosting VLAD with supervised dictionary learning and high-order statistics. In: European Conference on Computer, pp. 660–674 (2014)Google Scholar
  20. 20.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Computer Vision and Pattern Recognition, vol. 2, pp. 2161–2168 (2006)Google Scholar
  21. 21.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: European Conference on Computer, pp. 143–156 (2010)Google Scholar
  22. 22.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)CrossRefGoogle Scholar
  23. 23.
    Adam, B.: Reliable feature matching across widely separated views. In: International Conference on Computer Vision and Pattern Recognition, pp. 774–781 (2000)Google Scholar
  24. 24.
    Lindeberg, T., Gårding, J.: Shape-adapted smoothing in estimation of 3-D shape cues from affine deformations of local 2-D brightness structure. Image Vis. Comput. 15, 415–434 (1997)CrossRefGoogle Scholar
  25. 25.
    Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Comput. Vis. 60, 63–86 (2004)CrossRefGoogle Scholar
  26. 26.
    Malisiewicz, T., Gupta, A., Efros, A., et al.: Ensemble of exemplar-SVMs for object detection and beyond. In: International Conference on Computer Vision, pp. 89–96 (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Farkhunda Younas
    • 1
  • Junaid Baber
    • 2
    Email author
  • Tahir Mahmood
    • 3
  • Javeria Farooq
    • 4
  • Maheen Bakhtyar
    • 2
  1. 1.Department of Computer ScienceSardar Bahadur Khan Women’s UniversityQuettaPakistan
  2. 2.Department of Computer Science and Information TechnologyUniversity of BalochistanQuettaPakistan
  3. 3.Department of Computer ScienceCOMSATS Institute of Infomation TechnologyIslamabadPakistan
  4. 4.Department of Electronic EngineeringBalochistan University of Information Technology, Engineering and Management SciencesQuettaPakistan

Personalised recommendations