Abstract
We present a method for automatic detection of stopwords in visual vocabularies that is based upon the entropy of each visual word. We propose a specific formulation to compute the entropy as the core of this method, in which the probability density function of the visual words is marginalized over all visual classes, such that words with higher entropy can be considered to be irrelevant words, i.e., stopwords. We evaluate our method on a dataset of syllabic Maya hieroglyphs, which is of great interest for archaeologists, and that requires efficient techniques for indexing and retrieval. Our results show that our method produces shorter bag representations without hurting retrieval performance, and even improving it in some cases, which does not happen when using previous methods. Furthermore, our assumptions for the proposed computation of the entropy can be generalized to bag representations of different nature.
Keywords
Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Hsiao, J.-H., Chen, C.-S., Chen, M.-S.: A Novel Language-Model-Based Approach for Image Object Mining and ReRanking. In: Proceedings of the 8th IEEE International Conference on Data Mining (2008)
Jégou, H., Chum, O.: Negative evidences and co-occurences in image retrieval: The benefit of PCA and whitening. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 774–787. Springer, Heidelberg (2012)
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2010)
Jiang, Y.-G., Yang, J., Ngo, C.-W., Hauptmann, A.G.: Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study. IEEE Transactions on Multimedia 12(1), 42–53 (2010)
Quelhas, P., Monay, F., Odobez, J.-M., Gatica-Perez, D., Tuytelaars, T.: A Thousand Words in a Scene. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(9), 1575–1589 (2007)
Roman-Rangel, E., Pallan, C., Odobez, J.-M., Gatica-Perez, D.: Analyzing Ancient Maya Glyph Collections with Contextual Shape Descriptor. International Journal of Computer Vision 94(1), 101–117 (2011)
Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal 27(3), 379–423 (1948)
Sinka, M.P., Corne, D.W.: Towards Modernised and Web-Specific Stoplists for Web Document Analysis. In: Proceedings of the IEEE/WIC International Conference on Web Intelligence (2003)
Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: Proceedings of the 9th IEEE International Conference on Computer Vision (2003)
van Zwol, R., Garcia Pueyo, L.: Spatially-aware indexing for image object retrieval. In: Proceedings of the 5th ACM International Conference on Web Search and Data Mining (2012)
Yang, J., Hauptmann, A.: A text categorization approach to video scene classification using keypoint features. CMU Technical Report (2006)
Yang, J., Jiang, Y.-G., Hauptmann, A.G., Ngo, C.-W.: Evaluating Bag-of-Visual-Words Representations in Scene Classification. In: Proceedings of the International Workshop on Multimedia Information Retrieval (2007)
Zhao, Z.: Towards a Local-Global Visual Feature-Based Framework for Recognition. PhD Thesis. Rutgers University (October 2009)
Zheng, L., Cox, I.J.: Entropy-Based Static Index Pruning. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 713–718. Springer, Heidelberg (2009)
Zou, F., Wang, F.L., Deng, X., Han, S., Wang, L.S.: Automatic Construction of Chinese Stop Word List. In: Proceedings of the 5th WSEAS International Conference on Applied Computer Science (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Roman-Rangel, E., Marchand-Maillet, S. (2013). Stopwords Detection in Bag-of-Visual-Words: The Case of Retrieving Maya Hieroglyphs. In: Petrosino, A., Maddalena, L., Pala, P. (eds) New Trends in Image Analysis and Processing – ICIAP 2013. ICIAP 2013. Lecture Notes in Computer Science, vol 8158. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41190-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-41190-8_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41189-2
Online ISBN: 978-3-642-41190-8
eBook Packages: Computer ScienceComputer Science (R0)