Abstract
Automated document classification process extracts information with a systematical analysis of the content of documents. This is an active research field of growing importance due to the large amount of electronic documents produced in the world wide web and made readily available thanks to diffused technologies including mobile ones. Several application areas benefit from automated document classification, including document archiving, invoice processing in business environments, press releases and search engines. Current tools classify or “tag” either text or images separately. In this paper we show how, by linking image and text-based contents together, a technology improves fundamental document management tasks like retrieving information from a database or automatically routing documents. We present a formal definition of pertinence and relevance concepts, that apply to those documents types we name “multimodal”. These are based on a model of conceptual spaces we believe compulsory for document investigation while using joint information sources coming from text and images forming complex documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ye, Q., Huang, Q., Gao, W., Zhao, D.: Fast and robust text detection in images and video frames. Image Vis. Comput. 23(6), 565–576 (2005)
Kahn, C.: Dynamic inline images: context-sensitive retrieval and integration of images into web documents. J. Digit. Imaging 21(3), 274–279 (2008)
Park, G., Baek, Y., Lee, H.-K.: Web image retrieval using majority-based ranking approach. Multimed. Tools Appl. 31(2), 195–219 (2006)
Liu, Y., Zhang, D., Guojun, L., Ma, W.-Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40(1), 262–282 (2007)
Schettini, R., Brambilla, C., Ciocca, G., Valsasna, A., De Ponti, M.: A hierarchical classification strategy for digital documents. Pattern Recogn. 35(8), 1759–1769 (2002)
Seo, K.-K.: An application of one-class support vector machines in content-based image retrieval. Expert Syst. Appl. 33(2), 491–498 (2007)
Larabi, S.: Textual description of shapes. J. Vis. Commun. Image Represent. 20(8), 563–584 (2009)
Sagara, N., Sunayama, W., Yachida, M.: Image labeling using key sentences of HTML. Electron. Commun. Jpn. (Part III Fundam. Electron. Sci.) 89(7), 31–41 (2006)
Fei, W., Han, Y.-H., Zhuang, Y.-T.: Multiple hypergraph clustering of web images by MiningWord2Image correlations. J. Comput. Sci. Technol. 25(4), 750–760 (2010)
de Mello, R.F., Bueno, J.M., Senger, L.J., Yang, L.T.: Image indexing and retrieval using an ART-2A neural network architecture. Int. J. Imaging Syst. Technol. 18(2–3), 202–208 (2008)
Shen, H.T., Zhou, X., Cui, B.: Indexing and integrating multiple features for www images. World Wide Web 9(3), 343–364 (2006)
Wang, H., Liu, S., Chia, L.-T.: Image retrieval with a multi-modality ontology. Multimed. Syst. 13(5), 379–390 (2008)
Bosch, A., Zisserman, A., Munoz, X.: Scene classification using a hybrid generative/discriminative approach. IEEE Trans. Pattern Anal. Mach. Intell. 30(4), 712–727 (2008)
Qin, J., Yung, N.H.C.: Scene categorization via contextual visual words. Pattern Recogn. 43(5), 1874–1888 (2010)
Sable, C.L., Hatzivassiloglou, V.: Text-based approaches for non-topical image categorization. Int. J. Digit. Libr. 3(3), 261–275 (2000)
Zhao, M., Li, S., Kwok, J.: Text detection in images using sparse representation with discriminative dictionaries. Image Vis. Comput. 28(12), 1590–1599 (2010)
Srihari, S.N., Tao, H., Geetha, S.: Machine-printed Japanese document recognition. Pattern Recogn. 30(8), 1301–1313 (1997)
Caponetti, L., Castiello, C., Gorecki, P.: Document page segmentation using neuro-fuzzy approach. Appl. Soft Comput. J. 8(1), 118–126 (2008)
Chan, W., Coghill, G.: Text analysis using local energy. Pattern Recogn. 34(12), 2523–2532 (2001)
Chang, Y., Chen, D., Zhang, Y., Yang, J.: An image-based automatic arabic translation system. Pattern Recogn. 42(9), 2127–2134 (2009)
Wen, D., Ding, X.-Q.: Visual similarity based document layout analysis. J. Comput. Sci. Technol. 21(3), 459–465 (2006)
Lin, W.-C., Chang, Y.-C., Chen, H.-H.: Integrating textual and visual information for cross-language image retrieval: a trans-media dictionary approach. Inf. Process. Manage. 43(2), 488–502 (2007)
Ah-Pine, J., Bressan, M., Clinchant, S., Csurka, G., Hoppenot, Y., Renders, J.-M.: Crossing textual and visual content in different application scenarios. Multimed. Tools Appl. 42(1), 31–56 (2009)
Cristani, M., Tomazzoli, C.: A multimodal approach to exploit similarity in documents. In: Ali, M., Pan, J.-S., Chen, S.-M., Horng, M.-F. (eds.) IEA/AIE 2014, Part I. LNCS, vol. 8481, pp. 490–499. Springer, Heidelberg (2014)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Cristani, M., Tomazzoli, C. (2016). A Multimodal Approach to Relevance and Pertinence of Documents. In: Fujita, H., Ali, M., Selamat, A., Sasaki, J., Kurematsu, M. (eds) Trends in Applied Knowledge-Based Systems and Data Science. IEA/AIE 2016. Lecture Notes in Computer Science(), vol 9799. Springer, Cham. https://doi.org/10.1007/978-3-319-42007-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-42007-3_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42006-6
Online ISBN: 978-3-319-42007-3
eBook Packages: Computer ScienceComputer Science (R0)