Extraction of Relevant Figures and Tables for Multi-document Summarization

Sadh, Ashish; Sahu, Amit; Srivastava, Devesh; Sanyal, Ratna; Sanyal, Sudip

doi:10.1007/978-3-642-28601-8_34

Ashish Sadh¹⁷,
Amit Sahu¹⁷,
Devesh Srivastava¹⁷,
Ratna Sanyal¹⁷ &
…
Sudip Sanyal¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7182))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1367 Accesses
1 Citations
3 Altmetric

Abstract

We propose a system that extracts the most relevant figures and tables from a set of topically related source documents. These are then integrated into the extractive text summary produced using the same set. The proposed method is domain independent. It predominantly focuses on the generation of a ranked list of relevant candidate units (figures/tables), in order of their computed relevancy. The relevancy measure is based on local and global scores that include direct and indirect references. In order to test the system performance, we have created a test collection of document sets which do not adhere to any specific domain. Evaluation experiments show that the system generated ranked list is in statistically significant correlation with the human evaluators’ ranking judgments. Feasibility of the proposed system to summarize a document set which contains figures/tables as their salient units is made clear in our concluding remark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Afantenos, S.D., Karkaletsis, V., Stamatopoulos, P.: Summarization from medical documents: a survey. J. Artificial Intelligence in Medicine (AIM) 33(2), 157–177 (2005)
Article Google Scholar
Lin, C.-Y., Hovy, E.H.: The potential and limitations of automatic sentence extraction for summarization. In: Radev, D., Teufel, S. (eds.) Proceedings of the HLT-NAACL 2003 on Text Summarization Workshop, pp. 73–80. ACL, Stroudsburg (2003)
Google Scholar
Gholamrezazadeh, S., Salehi, M.A., Gholamzadeh, B.: A Comprehensive Survey on Text Summarization Systems. In: Gervasi, O., Taniar, D., Murgante, B., Laganà, A., Mun, Y., Gavrilova, M.L. (eds.) 2nd International Conference on Computer Science and its Applications, CSA 2009, pp. 1–6. IEEE, Jeju (2010)
Google Scholar
Futrelle, R.P.: Summarization of diagrams in documents. In: Mani, I., Maybury, M. (eds.) Advances in Automated Text Summarization, pp. 403–421. MIT Press, Cambridge (1999)
Google Scholar
Futrelle, R.P.: Handling figures in document summarization. In: Moens, M.-F., Szpakowicz, S. (eds.) Text Summarization Branches Out. 42nd Annual Meeting of the Association for Computational Linguistics Workshop at ACL, pp. 61–65. ACL, Barcelona (2004)
Google Scholar
Yu, H., Lee, M.: Accessing bioscience images from abstract sentences. In: Proceedings of 14th International Conference on ISMB, Brazil (2006); ibid. J. Bioinformatics 22(14), e547–e556 (2006)
Google Scholar
Agarwal, S., Yu, H.: FigSum: automatically generating structured text summaries for figures in biomedical literature. In: AMIA Annual Symposium Proceedings, pp. 6–10. PubMed Central (2009)
Google Scholar
Lu, X., Wang, J.Z., Mitra, P., Giles, C.L.: Deriving knowledge from figures for digital libraries. In: Proceedings of the 16th International Conference on World Wide Web, pp. 1229–1230. ACM, Banff (2007)
Chapter Google Scholar
Liu, Y., Mitra, P., Giles, C.L., Bai, K.: Automatic extraction of table metadata from digital documents. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, June 11-15, pp. 339–340. ACM Press, Chapel Hill (2006)
Chapter Google Scholar
Wang, H.H., Mohamad, D., Ismail, N.A.: Image Retrieval: Techniques, Challenge, and Trend. In: The International Conference on Machine Vision, Image Processing, and Pattern Analysis, Bangkok, pp. 25–27 (2009); ibid. J. Waset, v60–v122 (2011)
Google Scholar
Radev, D.R., Jing, H., Budzikowska, M.: Centroid-based summarization of multiple documents: sentence extraction, utility based evaluation, and user studies. In: ANLP/NAACL Workshop on Summarization, vol. 40, pp. 21–29. ACL, Seattle (2000)
Google Scholar
Aker, A., Gaizauskas, R.: Evaluating automatically generated user-focused multi-document summaries for geo-referenced images. In: Bandyopadhyay, S., Poibeau, T., Saggion, H., Yangarber, R. (eds.) COLING 2008: Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization, pp. 41–48. ACL, Manchester (2008)
Google Scholar
Bhatia, S., Lahiri, L., Mitra, P.: Generating synopses for document-element search. In: Proceeding of the 18th ACM Conference on Information and Knowledge Management, pp. 2003–2006. ACM, New York (2009)
Chapter Google Scholar
Yu, H., Liu, F., Ramesh, B.P.: Automatic Figure Ranking and User Interfacing for Intelligent Figure Search. PLoS ONE 5(10), e12983 (2010)
Article Google Scholar
Agarwal, S., Yu, H.: Figure Summarizer browser extensions for PubMed Central. J. Bioinformatics 27(12), 1723–1724 (2011)
Article Google Scholar
Radev, D.R., Teufel, S., Saggion, H., Lam, W., Blitzer, J., Qi, H., Celebi, A., Liu, D., Drabek, E.: Evaluation challenges in large-scale multi-document summarization: the mead project. In: Hinrichs, E., Roth, D. (eds.) Proceedings of ACL 2003, pp. 375–382. ACL, Sapporo (2003)
Google Scholar
The Free and Open Productivity Suite, http://www.openoffice.org
Odt to html translator, http://odt2html.gradsoft.ua/Odt2Html.html
Kim, H.D., Zhai, C., Han, J.: Aggregation of Multiple Judgments for Evaluating Ordered Lists. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Ruger, S.M., Rijsbergen, K.V. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 166–178. Springer, Heidelberg (2010)
Chapter Google Scholar
Wessa, P.: Free Statistics Software, Office for Research Development and Education, version 1.1.23-r7 (2011), http://www.wessa.net

Download references

Author information

Authors and Affiliations

Indian Institute of Information Technology, Allahabad, India
Ashish Sadh, Amit Sahu, Devesh Srivastava, Ratna Sanyal & Sudip Sanyal

Authors

Ashish Sadh
View author publications
You can also search for this author in PubMed Google Scholar
Amit Sahu
View author publications
You can also search for this author in PubMed Google Scholar
Devesh Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Ratna Sanyal
View author publications
You can also search for this author in PubMed Google Scholar
Sudip Sanyal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research (CIC), National Polytechnic Institute (IPN), Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sadh, A., Sahu, A., Srivastava, D., Sanyal, R., Sanyal, S. (2012). Extraction of Relevant Figures and Tables for Multi-document Summarization. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28601-8_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-28601-8_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28600-1
Online ISBN: 978-3-642-28601-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics