Abstract
Although dimension reduction techniques for text documents can be used for preprocessing of blogs, these techniques will be more effective if they deal with the nature of the blogs properly. In this paper we propose a shallow summarization method for blogs as a preprocessing step for blog mining which benefits from specific characteristics of the blogs including blog themes, time interval between posts, and body-title composition of the posts. We use our method for summarizing a collection of Persian blogs from PersianBlog hosting and investigate its influence on blog clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Tang, B., Shepherd, M., Milios, E., Heywood, M.: Comparing and combining dimension reduction techniques for efficient text clustering. In: Proceeding of SIAM International Workshop on Feature Selection for Data Mining, pp. 17–26 (2005)
Molina, L.C., Belanche, L., Nebot, A.: Feature selection algorithms: a survey and experimental evaluation. In: Proceeding of ICDM 2002, pp. 306–313 (2002)
Mani, I.: Automatic summarization. John Benjamins Publishing, Amsterdam (2001)
Jones, K.S.: Automatic summarizing: factors and directions. In: Advances in automatic text summarization. MIT Press, Cambridge (1999)
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceeding of ACM SIGIR 1998, pp. 335–336 (1998)
Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J.C., Elebi, A., Dimitrov, S., Drabek, E., Hakim, A., Lam, W., Liu, D., Otterbacher, J., Qi, H., Saggion, H., Teufel, S., Topper, M., Winkel, A., Zhang, Z.: MEAD - a platform for multidocument multilingual text summarization. In: Proceeding of LREC 2004 (2004)
Berger, A.L., Mittal, V.O.: OCELOT: a system for summarizing Web pages. In: Proceeding of 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 144–151 (2000)
Sun, J.T., Shen, D., Zeng, H.J., Yang, Q., Lu, Y., Chen, Z.: Web-page summarization using clickthrough data. In: Proceeding of SIGIR 2005, pp. 194–201 (2005)
Shen, D., Chen, Z., Yang, Q., Zeng, H.J., Zhang, B., Lu, Y., Ma, W.Y.: Web-page classification through summarization. In: Proceeding of 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 242–249 (2004)
Minqing, H., Bing, L.: Mining and summarizing customer reviews. In: Proceeding of SIGKDD 2004, pp. 168–177 (2004)
Ku, L.W., Liang, Y.T., Chen, H.H.: Opinion extraction, summarization and tracking in news and blog corpora. In: Proceeding of AAAI-CAAW 2006 (2006)
Zhou, L., Hovy, E.: On the summarization of dynamically introduced information: online discussions and blogs. In: Proceeding of AAAI-CAAW 2006 (2006)
Hu, M., Sun, A., Lim, E.P.: Comments-oriented blog summarization by sentence extraction. In: Proceeding of CIKM 2007, pp. 901–904 (2007)
Lin, Y.R., Sundaram, H.: Blog antenna: summarization of personal blog temporal dynamics based on self-similarity factorization. In: Proceeding of International Conference on Multimedia and Expo. (ICME 2007), pp. 540–543 (2007)
Amigo, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. In: Information Retrieval. Springer, Heidelberg (2008)
He, J., Tan, A.H., Tan, C.L., Sung, S.Y.: On quantitative evaluation of clustering systems. In: Wu, W., Xiong, H., Shekhar, S. (eds.) Clustering and Information Retrieval, pp. 105–133. Kluwer Academic Publishers, Dordrecht (2004)
Taghva, K., Beckley, R., Sadeh, M.: A list of farsi stopwords. Technical Report 2003-01, Information Science Research Institute, University of Nevada, Las Vegas (2003)
Karypis, G.: CLUTO: a clustering toolkit. Technical Report 02-017, University of Minnesota (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Asbagh, M.J., Sayyadi, M., Abolhassani, H. (2009). Blog Summarization for Blog Mining. In: Lee, R., Ishii, N. (eds) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. Studies in Computational Intelligence, vol 209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01203-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-01203-7_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01202-0
Online ISBN: 978-3-642-01203-7
eBook Packages: EngineeringEngineering (R0)