Abstract
We propose a system that extracts the most relevant figures and tables from a set of topically related source documents. These are then integrated into the extractive text summary produced using the same set. The proposed method is domain independent. It predominantly focuses on the generation of a ranked list of relevant candidate units (figures/tables), in order of their computed relevancy. The relevancy measure is based on local and global scores that include direct and indirect references. In order to test the system performance, we have created a test collection of document sets which do not adhere to any specific domain. Evaluation experiments show that the system generated ranked list is in statistically significant correlation with the human evaluators’ ranking judgments. Feasibility of the proposed system to summarize a document set which contains figures/tables as their salient units is made clear in our concluding remark.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Afantenos, S.D., Karkaletsis, V., Stamatopoulos, P.: Summarization from medical documents: a survey. J. Artificial Intelligence in Medicine (AIM) 33(2), 157–177 (2005)
Lin, C.-Y., Hovy, E.H.: The potential and limitations of automatic sentence extraction for summarization. In: Radev, D., Teufel, S. (eds.) Proceedings of the HLT-NAACL 2003 on Text Summarization Workshop, pp. 73–80. ACL, Stroudsburg (2003)
Gholamrezazadeh, S., Salehi, M.A., Gholamzadeh, B.: A Comprehensive Survey on Text Summarization Systems. In: Gervasi, O., Taniar, D., Murgante, B., Laganà , A., Mun, Y., Gavrilova, M.L. (eds.) 2nd International Conference on Computer Science and its Applications, CSA 2009, pp. 1–6. IEEE, Jeju (2010)
Futrelle, R.P.: Summarization of diagrams in documents. In: Mani, I., Maybury, M. (eds.) Advances in Automated Text Summarization, pp. 403–421. MIT Press, Cambridge (1999)
Futrelle, R.P.: Handling figures in document summarization. In: Moens, M.-F., Szpakowicz, S. (eds.) Text Summarization Branches Out. 42nd Annual Meeting of the Association for Computational Linguistics Workshop at ACL, pp. 61–65. ACL, Barcelona (2004)
Yu, H., Lee, M.: Accessing bioscience images from abstract sentences. In: Proceedings of 14th International Conference on ISMB, Brazil (2006); ibid. J. Bioinformatics 22(14), e547–e556 (2006)
Agarwal, S., Yu, H.: FigSum: automatically generating structured text summaries for figures in biomedical literature. In: AMIA Annual Symposium Proceedings, pp. 6–10. PubMed Central (2009)
Lu, X., Wang, J.Z., Mitra, P., Giles, C.L.: Deriving knowledge from figures for digital libraries. In: Proceedings of the 16th International Conference on World Wide Web, pp. 1229–1230. ACM, Banff (2007)
Liu, Y., Mitra, P., Giles, C.L., Bai, K.: Automatic extraction of table metadata from digital documents. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, June 11-15, pp. 339–340. ACM Press, Chapel Hill (2006)
Wang, H.H., Mohamad, D., Ismail, N.A.: Image Retrieval: Techniques, Challenge, and Trend. In: The International Conference on Machine Vision, Image Processing, and Pattern Analysis, Bangkok, pp. 25–27 (2009); ibid. J. Waset, v60–v122 (2011)
Radev, D.R., Jing, H., Budzikowska, M.: Centroid-based summarization of multiple documents: sentence extraction, utility based evaluation, and user studies. In: ANLP/NAACL Workshop on Summarization, vol. 40, pp. 21–29. ACL, Seattle (2000)
Aker, A., Gaizauskas, R.: Evaluating automatically generated user-focused multi-document summaries for geo-referenced images. In: Bandyopadhyay, S., Poibeau, T., Saggion, H., Yangarber, R. (eds.) COLING 2008: Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization, pp. 41–48. ACL, Manchester (2008)
Bhatia, S., Lahiri, L., Mitra, P.: Generating synopses for document-element search. In: Proceeding of the 18th ACM Conference on Information and Knowledge Management, pp. 2003–2006. ACM, New York (2009)
Yu, H., Liu, F., Ramesh, B.P.: Automatic Figure Ranking and User Interfacing for Intelligent Figure Search. PLoS ONEÂ 5(10), e12983 (2010)
Agarwal, S., Yu, H.: Figure Summarizer browser extensions for PubMed Central. J. Bioinformatics 27(12), 1723–1724 (2011)
Radev, D.R., Teufel, S., Saggion, H., Lam, W., Blitzer, J., Qi, H., Celebi, A., Liu, D., Drabek, E.: Evaluation challenges in large-scale multi-document summarization: the mead project. In: Hinrichs, E., Roth, D. (eds.) Proceedings of ACL 2003, pp. 375–382. ACL, Sapporo (2003)
The Free and Open Productivity Suite, http://www.openoffice.org
Odt to html translator, http://odt2html.gradsoft.ua/Odt2Html.html
Kim, H.D., Zhai, C., Han, J.: Aggregation of Multiple Judgments for Evaluating Ordered Lists. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Ruger, S.M., Rijsbergen, K.V. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 166–178. Springer, Heidelberg (2010)
Wessa, P.: Free Statistics Software, Office for Research Development and Education, version 1.1.23-r7 (2011), http://www.wessa.net
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sadh, A., Sahu, A., Srivastava, D., Sanyal, R., Sanyal, S. (2012). Extraction of Relevant Figures and Tables for Multi-document Summarization. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28601-8_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-28601-8_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28600-1
Online ISBN: 978-3-642-28601-8
eBook Packages: Computer ScienceComputer Science (R0)