Skip to main content

Extraction of Relevant Figures and Tables for Multi-document Summarization

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7182))

Abstract

We propose a system that extracts the most relevant figures and tables from a set of topically related source documents. These are then integrated into the extractive text summary produced using the same set. The proposed method is domain independent. It predominantly focuses on the generation of a ranked list of relevant candidate units (figures/tables), in order of their computed relevancy. The relevancy measure is based on local and global scores that include direct and indirect references. In order to test the system performance, we have created a test collection of document sets which do not adhere to any specific domain. Evaluation experiments show that the system generated ranked list is in statistically significant correlation with the human evaluators’ ranking judgments. Feasibility of the proposed system to summarize a document set which contains figures/tables as their salient units is made clear in our concluding remark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Afantenos, S.D., Karkaletsis, V., Stamatopoulos, P.: Summarization from medical documents: a survey. J. Artificial Intelligence in Medicine (AIM) 33(2), 157–177 (2005)

    Article  Google Scholar 

  2. Lin, C.-Y., Hovy, E.H.: The potential and limitations of automatic sentence extraction for summarization. In: Radev, D., Teufel, S. (eds.) Proceedings of the HLT-NAACL 2003 on Text Summarization Workshop, pp. 73–80. ACL, Stroudsburg (2003)

    Google Scholar 

  3. Gholamrezazadeh, S., Salehi, M.A., Gholamzadeh, B.: A Comprehensive Survey on Text Summarization Systems. In: Gervasi, O., Taniar, D., Murgante, B., Laganà, A., Mun, Y., Gavrilova, M.L. (eds.) 2nd International Conference on Computer Science and its Applications, CSA 2009, pp. 1–6. IEEE, Jeju (2010)

    Google Scholar 

  4. Futrelle, R.P.: Summarization of diagrams in documents. In: Mani, I., Maybury, M. (eds.) Advances in Automated Text Summarization, pp. 403–421. MIT Press, Cambridge (1999)

    Google Scholar 

  5. Futrelle, R.P.: Handling figures in document summarization. In: Moens, M.-F., Szpakowicz, S. (eds.) Text Summarization Branches Out. 42nd Annual Meeting of the Association for Computational Linguistics Workshop at ACL, pp. 61–65. ACL, Barcelona (2004)

    Google Scholar 

  6. Yu, H., Lee, M.: Accessing bioscience images from abstract sentences. In: Proceedings of 14th International Conference on ISMB, Brazil (2006); ibid. J. Bioinformatics 22(14), e547–e556 (2006)

    Google Scholar 

  7. Agarwal, S., Yu, H.: FigSum: automatically generating structured text summaries for figures in biomedical literature. In: AMIA Annual Symposium Proceedings, pp. 6–10. PubMed Central (2009)

    Google Scholar 

  8. Lu, X., Wang, J.Z., Mitra, P., Giles, C.L.: Deriving knowledge from figures for digital libraries. In: Proceedings of the 16th International Conference on World Wide Web, pp. 1229–1230. ACM, Banff (2007)

    Chapter  Google Scholar 

  9. Liu, Y., Mitra, P., Giles, C.L., Bai, K.: Automatic extraction of table metadata from digital documents. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, June 11-15, pp. 339–340. ACM Press, Chapel Hill (2006)

    Chapter  Google Scholar 

  10. Wang, H.H., Mohamad, D., Ismail, N.A.: Image Retrieval: Techniques, Challenge, and Trend. In: The International Conference on Machine Vision, Image Processing, and Pattern Analysis, Bangkok, pp. 25–27 (2009); ibid. J. Waset, v60–v122 (2011)

    Google Scholar 

  11. Radev, D.R., Jing, H., Budzikowska, M.: Centroid-based summarization of multiple documents: sentence extraction, utility based evaluation, and user studies. In: ANLP/NAACL Workshop on Summarization, vol. 40, pp. 21–29. ACL, Seattle (2000)

    Google Scholar 

  12. Aker, A., Gaizauskas, R.: Evaluating automatically generated user-focused multi-document summaries for geo-referenced images. In: Bandyopadhyay, S., Poibeau, T., Saggion, H., Yangarber, R. (eds.) COLING 2008: Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization, pp. 41–48. ACL, Manchester (2008)

    Google Scholar 

  13. Bhatia, S., Lahiri, L., Mitra, P.: Generating synopses for document-element search. In: Proceeding of the 18th ACM Conference on Information and Knowledge Management, pp. 2003–2006. ACM, New York (2009)

    Chapter  Google Scholar 

  14. Yu, H., Liu, F., Ramesh, B.P.: Automatic Figure Ranking and User Interfacing for Intelligent Figure Search. PLoS ONE 5(10), e12983 (2010)

    Article  Google Scholar 

  15. Agarwal, S., Yu, H.: Figure Summarizer browser extensions for PubMed Central. J. Bioinformatics 27(12), 1723–1724 (2011)

    Article  Google Scholar 

  16. Radev, D.R., Teufel, S., Saggion, H., Lam, W., Blitzer, J., Qi, H., Celebi, A., Liu, D., Drabek, E.: Evaluation challenges in large-scale multi-document summarization: the mead project. In: Hinrichs, E., Roth, D. (eds.) Proceedings of ACL 2003, pp. 375–382. ACL, Sapporo (2003)

    Google Scholar 

  17. The Free and Open Productivity Suite, http://www.openoffice.org

  18. Odt to html translator, http://odt2html.gradsoft.ua/Odt2Html.html

  19. Kim, H.D., Zhai, C., Han, J.: Aggregation of Multiple Judgments for Evaluating Ordered Lists. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Ruger, S.M., Rijsbergen, K.V. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 166–178. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  20. Wessa, P.: Free Statistics Software, Office for Research Development and Education, version 1.1.23-r7 (2011), http://www.wessa.net

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sadh, A., Sahu, A., Srivastava, D., Sanyal, R., Sanyal, S. (2012). Extraction of Relevant Figures and Tables for Multi-document Summarization. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28601-8_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28601-8_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28600-1

  • Online ISBN: 978-3-642-28601-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics