Skip to main content

A Multimodal Approach to Relevance and Pertinence of Documents

  • Conference paper
  • First Online:
Trends in Applied Knowledge-Based Systems and Data Science (IEA/AIE 2016)

Abstract

Automated document classification process extracts information with a systematical analysis of the content of documents. This is an active research field of growing importance due to the large amount of electronic documents produced in the world wide web and made readily available thanks to diffused technologies including mobile ones. Several application areas benefit from automated document classification, including document archiving, invoice processing in business environments, press releases and search engines. Current tools classify or “tag” either text or images separately. In this paper we show how, by linking image and text-based contents together, a technology improves fundamental document management tasks like retrieving information from a database or automatically routing documents. We present a formal definition of pertinence and relevance concepts, that apply to those documents types we name “multimodal”. These are based on a model of conceptual spaces we believe compulsory for document investigation while using joint information sources coming from text and images forming complex documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ye, Q., Huang, Q., Gao, W., Zhao, D.: Fast and robust text detection in images and video frames. Image Vis. Comput. 23(6), 565–576 (2005)

    Article  Google Scholar 

  2. Kahn, C.: Dynamic inline images: context-sensitive retrieval and integration of images into web documents. J. Digit. Imaging 21(3), 274–279 (2008)

    Article  MathSciNet  Google Scholar 

  3. Park, G., Baek, Y., Lee, H.-K.: Web image retrieval using majority-based ranking approach. Multimed. Tools Appl. 31(2), 195–219 (2006)

    Article  Google Scholar 

  4. Liu, Y., Zhang, D., Guojun, L., Ma, W.-Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40(1), 262–282 (2007)

    Article  MATH  Google Scholar 

  5. Schettini, R., Brambilla, C., Ciocca, G., Valsasna, A., De Ponti, M.: A hierarchical classification strategy for digital documents. Pattern Recogn. 35(8), 1759–1769 (2002)

    Article  MATH  Google Scholar 

  6. Seo, K.-K.: An application of one-class support vector machines in content-based image retrieval. Expert Syst. Appl. 33(2), 491–498 (2007)

    Article  Google Scholar 

  7. Larabi, S.: Textual description of shapes. J. Vis. Commun. Image Represent. 20(8), 563–584 (2009)

    Article  Google Scholar 

  8. Sagara, N., Sunayama, W., Yachida, M.: Image labeling using key sentences of HTML. Electron. Commun. Jpn. (Part III Fundam. Electron. Sci.) 89(7), 31–41 (2006)

    Article  Google Scholar 

  9. Fei, W., Han, Y.-H., Zhuang, Y.-T.: Multiple hypergraph clustering of web images by MiningWord2Image correlations. J. Comput. Sci. Technol. 25(4), 750–760 (2010)

    Article  Google Scholar 

  10. de Mello, R.F., Bueno, J.M., Senger, L.J., Yang, L.T.: Image indexing and retrieval using an ART-2A neural network architecture. Int. J. Imaging Syst. Technol. 18(2–3), 202–208 (2008)

    Article  Google Scholar 

  11. Shen, H.T., Zhou, X., Cui, B.: Indexing and integrating multiple features for www images. World Wide Web 9(3), 343–364 (2006)

    Article  Google Scholar 

  12. Wang, H., Liu, S., Chia, L.-T.: Image retrieval with a multi-modality ontology. Multimed. Syst. 13(5), 379–390 (2008)

    Article  Google Scholar 

  13. Bosch, A., Zisserman, A., Munoz, X.: Scene classification using a hybrid generative/discriminative approach. IEEE Trans. Pattern Anal. Mach. Intell. 30(4), 712–727 (2008)

    Article  Google Scholar 

  14. Qin, J., Yung, N.H.C.: Scene categorization via contextual visual words. Pattern Recogn. 43(5), 1874–1888 (2010)

    Article  MATH  Google Scholar 

  15. Sable, C.L., Hatzivassiloglou, V.: Text-based approaches for non-topical image categorization. Int. J. Digit. Libr. 3(3), 261–275 (2000)

    Article  Google Scholar 

  16. Zhao, M., Li, S., Kwok, J.: Text detection in images using sparse representation with discriminative dictionaries. Image Vis. Comput. 28(12), 1590–1599 (2010)

    Article  Google Scholar 

  17. Srihari, S.N., Tao, H., Geetha, S.: Machine-printed Japanese document recognition. Pattern Recogn. 30(8), 1301–1313 (1997)

    Article  Google Scholar 

  18. Caponetti, L., Castiello, C., Gorecki, P.: Document page segmentation using neuro-fuzzy approach. Appl. Soft Comput. J. 8(1), 118–126 (2008)

    Article  Google Scholar 

  19. Chan, W., Coghill, G.: Text analysis using local energy. Pattern Recogn. 34(12), 2523–2532 (2001)

    Article  MATH  Google Scholar 

  20. Chang, Y., Chen, D., Zhang, Y., Yang, J.: An image-based automatic arabic translation system. Pattern Recogn. 42(9), 2127–2134 (2009)

    Article  MATH  Google Scholar 

  21. Wen, D., Ding, X.-Q.: Visual similarity based document layout analysis. J. Comput. Sci. Technol. 21(3), 459–465 (2006)

    Article  MathSciNet  Google Scholar 

  22. Lin, W.-C., Chang, Y.-C., Chen, H.-H.: Integrating textual and visual information for cross-language image retrieval: a trans-media dictionary approach. Inf. Process. Manage. 43(2), 488–502 (2007)

    Article  Google Scholar 

  23. Ah-Pine, J., Bressan, M., Clinchant, S., Csurka, G., Hoppenot, Y., Renders, J.-M.: Crossing textual and visual content in different application scenarios. Multimed. Tools Appl. 42(1), 31–56 (2009)

    Article  Google Scholar 

  24. Cristani, M., Tomazzoli, C.: A multimodal approach to exploit similarity in documents. In: Ali, M., Pan, J.-S., Chen, S.-M., Horng, M.-F. (eds.) IEA/AIE 2014, Part I. LNCS, vol. 8481, pp. 490–499. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Matteo Cristani or Claudio Tomazzoli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Cristani, M., Tomazzoli, C. (2016). A Multimodal Approach to Relevance and Pertinence of Documents. In: Fujita, H., Ali, M., Selamat, A., Sasaki, J., Kurematsu, M. (eds) Trends in Applied Knowledge-Based Systems and Data Science. IEA/AIE 2016. Lecture Notes in Computer Science(), vol 9799. Springer, Cham. https://doi.org/10.1007/978-3-319-42007-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42007-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42006-6

  • Online ISBN: 978-3-319-42007-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics