Stopwords Detection in Bag-of-Visual-Words: The Case of Retrieving Maya Hieroglyphs

  • Edgar Roman-Rangel
  • Stephane Marchand-Maillet
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8158)

Abstract

We present a method for automatic detection of stopwords in visual vocabularies that is based upon the entropy of each visual word. We propose a specific formulation to compute the entropy as the core of this method, in which the probability density function of the visual words is marginalized over all visual classes, such that words with higher entropy can be considered to be irrelevant words, i.e., stopwords. We evaluate our method on a dataset of syllabic Maya hieroglyphs, which is of great interest for archaeologists, and that requires efficient techniques for indexing and retrieval. Our results show that our method produces shorter bag representations without hurting retrieval performance, and even improving it in some cases, which does not happen when using previous methods. Furthermore, our assumptions for the proposed computation of the entropy can be generalized to bag representations of different nature.

Keywords

Bag-of-words stopwords retrieval archaeology hieroglyphs 

References

  1. 1.
    Hsiao, J.-H., Chen, C.-S., Chen, M.-S.: A Novel Language-Model-Based Approach for Image Object Mining and ReRanking. In: Proceedings of the 8th IEEE International Conference on Data Mining (2008)Google Scholar
  2. 2.
    Jégou, H., Chum, O.: Negative evidences and co-occurences in image retrieval: The benefit of PCA and whitening. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 774–787. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2010)Google Scholar
  4. 4.
    Jiang, Y.-G., Yang, J., Ngo, C.-W., Hauptmann, A.G.: Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study. IEEE Transactions on Multimedia 12(1), 42–53 (2010)CrossRefGoogle Scholar
  5. 5.
    Quelhas, P., Monay, F., Odobez, J.-M., Gatica-Perez, D., Tuytelaars, T.: A Thousand Words in a Scene. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(9), 1575–1589 (2007)CrossRefGoogle Scholar
  6. 6.
    Roman-Rangel, E., Pallan, C., Odobez, J.-M., Gatica-Perez, D.: Analyzing Ancient Maya Glyph Collections with Contextual Shape Descriptor. International Journal of Computer Vision 94(1), 101–117 (2011)CrossRefGoogle Scholar
  7. 7.
    Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal 27(3), 379–423 (1948)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Sinka, M.P., Corne, D.W.: Towards Modernised and Web-Specific Stoplists for Web Document Analysis. In: Proceedings of the IEEE/WIC International Conference on Web Intelligence (2003)Google Scholar
  9. 9.
    Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: Proceedings of the 9th IEEE International Conference on Computer Vision (2003)Google Scholar
  10. 10.
    van Zwol, R., Garcia Pueyo, L.: Spatially-aware indexing for image object retrieval. In: Proceedings of the 5th ACM International Conference on Web Search and Data Mining (2012)Google Scholar
  11. 11.
    Yang, J., Hauptmann, A.: A text categorization approach to video scene classification using keypoint features. CMU Technical Report (2006)Google Scholar
  12. 12.
    Yang, J., Jiang, Y.-G., Hauptmann, A.G., Ngo, C.-W.: Evaluating Bag-of-Visual-Words Representations in Scene Classification. In: Proceedings of the International Workshop on Multimedia Information Retrieval (2007)Google Scholar
  13. 13.
    Zhao, Z.: Towards a Local-Global Visual Feature-Based Framework for Recognition. PhD Thesis. Rutgers University (October 2009)Google Scholar
  14. 14.
    Zheng, L., Cox, I.J.: Entropy-Based Static Index Pruning. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 713–718. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  15. 15.
    Zou, F., Wang, F.L., Deng, X., Han, S., Wang, L.S.: Automatic Construction of Chinese Stop Word List. In: Proceedings of the 5th WSEAS International Conference on Applied Computer Science (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Edgar Roman-Rangel
    • 1
  • Stephane Marchand-Maillet
    • 1
  1. 1.CVMLabUniversity of GenevaSwitzerland

Personalised recommendations