Abstract
Every day we search new information in the web, and we found a lot of documents which contain pages with a great amount of information. There is a big demand for automatic summarization in a rapid and precise way. Many methods have been used in automatic extraction but most of them do not take into account the hierarchical structure of the documents. A novel method using the structure of the document was introduced by Yang and Wang in 2004. It is based in a fractal view method for controlling the information displayed. We explain its drawbacks and we solve them using the new concept of fractal dimension of a text document to achieve a better diversification of the extracted sentences improving the performance of the method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Buyukkokten, O., Garcia-Molina, H., Paepcke, A.: Seeing the whole in parts: Text summarization for web browsing on handheld devices. In: 10th International WWW Conference, Hong Kong (2001)
Camastra, F., Vinciarelli, A.: Estimating the intrinsic dimension of data with a fractal-based method. IEEE Transactions on Pattern Analysis and Machine Intelligence (2002)
Dalamagas, T., Sheng, T., Winkel, K.J., Sellis, T.: A methodology for clustering xml documents by structure. In: European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 137–148 (2004)
Daume III, H., Marcu, D.: Induction of word and phrase alignments for automatic document summarization. Computational Linguistics 31(4), 505–530 (2005)
Edmundson, H.P.: New methods in automatic extracting. Journal of the Association for Computing Machinery 16(2), 264–285 (1969)
Goldstein, J., Kantrowitx, M., Mittal, V., Carbonell, J.: Summarizing text documents: Sentence selection and evaluation metrics. In: SIGIR 1999, pp. 121–128 (1999)
Grasberger, P., Procaccia, I.: Measuring the strangeness of strange attractors. Physica 9D, 189–208 (1983)
Guerrini, G., Mesiti, M., Sanz, I.: An overview of similarity measures for clustering XML documents. In: Vakali, A., Pallis, G. (eds.) (2006)
Hovy, E.: Text Summarization. Oxford Handbook of computational linguistics, ch. 32
Koike, H.: Fractal views: a fractal-based method for controlling information display. ACM Transactions on Information Systems 13(3), 305–323 (1995)
Kraft, R.: Fractals and dimensions. HTTP-Protocol (1995), http://www.weihenstephan.de
Lian, W., Sheung, D., Mamoulis, N., Yiu, S.M.: An efficient and scalable algorithm for clustering xml documents by structure. TKDEE 16(1), 82–96 (2004)
Liebovitch, L.S., Toth, T.: A fast algorithm to determine fractal dimensions by box counting. Physics Letters A 141(8,9), 386–390 (1989)
Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal, pp.159–165 (April 1958)
Mandelbrot, B.B.: The Fractal Geometry of Nature. W.H. Freeman, New York (1983)
Mandelbrot, B.B.: Self-affine fractal sets. In: Pietronero, L., Tosatti, E. (eds.) Fractals in Physics, Amsterdam (1986)
Marcu, D.: Improving summarization through rhetorical parsing tuning. In: The COLINGACL Workshop on Very Large Corpora, Montreal, Canada (1998)
Morris, G., Kasper, G.M., Adams, D.A.: The effect and limitation of automated text condensing on reading comprehension performance. Information System Research, 17–35 (1992)
Ruiz, M.D., Bailón, A.B.: Fractal dimension of text documents: Application in fractal summarization. In: IADIS International Conference WWW/Internet, vol. 2, pp. 349–353 (2006)
Salton, G., McGill, M.J.: Introduction to modern Information Retrieval. McGraw-Hill Book Co., New York (1983)
Sheskin, D.: Handbook of parametric and nonparametric statistical procedures, 3rd edn. Chapman & Hall/CRC (2003)
Yang, C.C., Chen, H., Hong, K.: Visualization of large category map for Internet browsing. Decision Support Systems 35, 89–102 (2003)
Yang, C.C., Wang, F.L.: Fractal summarization for mobile devices to access large documents on the Web. In: 12th International WWW Conference, Budapest, Hungary (2003)
Yang, C.C., Wang, F.L.: Fractal summarization: Summarization based on fractal theory. In: SIGIR 2003, Toronto, Canada (2003)
Yang, C.C., Wang, F.L.: A relevance feedback model for fractal summarization. In: Chen, Z., Chen, H., Miao, Q., Fu, Y., Fox, E., Lim, E.-p. (eds.) ICADL 2004. LNCS, vol. 3334, pp. 368–377. Springer, Heidelberg (2004)
Ko, Y., et al.: Topic keyword identification for text summarization using lexical clustering. IEICE transactions on information and systems, vol. E86-D, pp.1695–1701 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ruiz, M.D., Bailón, A.B. (2008). Summarizing Structured Documents through a Fractal Technique. In: Filipe, J., Cordeiro, J., Cardoso, J. (eds) Enterprise Information Systems. ICEIS 2007. Lecture Notes in Business Information Processing, vol 12. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88710-2_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-88710-2_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88709-6
Online ISBN: 978-3-540-88710-2
eBook Packages: Computer ScienceComputer Science (R0)