Abstract
It has been proven experimentally, that a combination of textual and visual representations can improve the retrieval performance ([20], [23]). It is due to the fact, that the textual and visual feature spaces often represent complementary yet correlated aspects of the same image, thus forming a composite system.
In this paper, we present a model for the combination of visual and textual sub-systems within the user feedback context. The model was inspired by the measurement utilized in quantum mechanics (QM) and the tensor product of co-occurrence (density) matrices, which represents a density matrix of the composite system in QM. It provides a sound and natural framework to seamlessly integrate multiple feature spaces by considering them as a composite system, as well as a new way of measuring the relevance of an image with respect to a context. The proposed approach takes into account both intra (via co-occurrence matrices) and inter (via tensor operator) relationships between features’ dimensions. It is also computationally cheap and scalable to large data collections. We test our approach on ImageCLEF2007photo data collection and present interesting findings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zhao, R., Grosky, W.I.: Narrowing the semantic gap-improved text-based web document retrieval using visual features. IEEE Transactions on Multimedia 4, 189–200 (2002)
Ferecatu, M., Sahbi, H.: TELECOM ParisTech at Image Clef photo 2008: Bi-modal text and image retrieval with diversity enhancement. In: Working Notes of CLEF (2008)
Martinez-Fernandes, J.L., Serrano, A.G., Villena-Roman, J., Saenz, V.D.M., Tortosa, S.G., Castagnone, M., Alonso, J.: MIRACLE at ImageCLEF 2004. In: Working Notes of CLEF (2004)
Yanai, K.: Generic image classification using visual knowledge on the web. In: Proceedings of the 11th ACM International Conference on Multimedia, pp. 167–176 (2003)
Tjondronegoro, D., Zhang, J., Gu, J., Nguyen, A., Geva, S.: Integrating Text Retrieval and Image Retrieval in XML Document Searching. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 511–524. Springer, Heidelberg (2006)
Maillot, N., Chevallet, J.P., Valea, V., Lim, J.H.: IPAL Inter-media pseudo-relevance feedback approach to ImageCLEF 2006 photo retrieval. In: CLEF Working Notes (2006)
Rahman, M.M., Bhattacharya, P., Desai, B.C.: A unified image retrieval framework on local visual and semantic concept-based feature spaces. J. Visual Communication and Image Representation 20(7), 450–462 (2009)
Simpson, M., Rahaman, M.M.: Text and content-based approaches to image retrieval for the ImageCLEF 2009 medical retrieval track. In: Working Notes for the CLEF 2009 Workshop (2009)
Wang, J., Song, D., Kaliciak, L.: Tensor product of correlated text and visual features: a quantum theory inspired image retrieval framework. In: AAAI-Fall 2010 Symposium on Quantum Information for Cognitive, Social, and Semantic Processes, pp. 109–116 (2010)
Mensink, T., Csurka, G., Perronnin, F.: LEAR and XRCE’s participation to visual concept detection task - ImageCLEF 2010. In: Proceedings of the 14th Annual ACM International Conference on Multimedia, pp. 77–80 (2006)
Mensink, T., Verbeek, J., Csurkay, G.: Weighted transmedia relevance feedback for image retrieval and auto-annotation. Technical Report Number 0415 (2011)
Clinchant, S., Ah-Pine, J., Csurka, G.: Semantic combination of textual and visual information in multimedia retrieval. In: ACM International Conference on Multimedia Retrieval, ICMR (2011)
Depeursinge, A., Muller, H.: Fusion techniques for combining textual and visual information retrieval. In: ImageCLEF. The Springer International Series on Information Retrieval, vol. 32, pp. 95–114 (2010)
Chang, Y.-C., Chen, H.-H.: Increasing Precision and Diversity in Photo Retrieval by Result Fusion. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 612–619. Springer, Heidelberg (2009)
Combining systems: the tensor product and partial trace, http://www.quantum.umb.edu/Jacobs/QMT/QMT-AppendixA.pdf
Li, Y., Cunningham, H.: Geometric and quantum methods for information retrieval. SIGIR Forum 42(2), 22–32 (2008)
van Rijsbergen, C.J.: The geometry of information retrieval. Cambridge University Press (2004)
Bruza, P.D., Kitto, K., Nelson, D., McEvoy, C.L.: Entangling words and meaning. In: Proceedings of the 2nd Quantum Interaction Symposium, pp. 118–124 (2008)
Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996)
Grubinger, M., Clough, P., Hanbury, A., Muller, H.: Overview of the ImageCLEF 2007 photographic retrieval task. In: Working Notes of the 2007 CLEF Workshop (2007)
Kaliciak, L., Song, D., Wiratunga, N., Pan, J.: Novel local features with hybrid sampling technique for image retrieval. In: Proceedings of Conference on Information and Knowledge Management (CIKM), pp. 1557–1560 (2010)
Nowak, E., Jurie, F., Triggs, B.: Sampling Strategies for Bag-of-Features Image Classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 490–503. Springer, Heidelberg (2006)
ImageCLEF website, http://www.imageclef.org
Grubinger, M., Clough, P., Hanbury, A., Müller, H.: Overview of the ImageCLEFphoto 2007 Photographic Retrieval Task. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 433–444. Springer, Heidelberg (2008)
Chen, Z., Liu, W., Zhang, F., Li, M.J., Zhang, H.J.: Web mining for web image retrieval. Journal of the American Society for Information Science and Technology 52(10), 831–839 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kaliciak, L., Song, D., Wiratunga, N., Pan, J. (2013). Combining Visual and Textual Systems within the Context of User Feedback. In: Li, S., et al. Advances in Multimedia Modeling. MMM 2013. Lecture Notes in Computer Science, vol 7732. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35725-1_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-35725-1_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35724-4
Online ISBN: 978-3-642-35725-1
eBook Packages: Computer ScienceComputer Science (R0)