Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 209))

Abstract

Although dimension reduction techniques for text documents can be used for preprocessing of blogs, these techniques will be more effective if they deal with the nature of the blogs properly. In this paper we propose a shallow summarization method for blogs as a preprocessing step for blog mining which benefits from specific characteristics of the blogs including blog themes, time interval between posts, and body-title composition of the posts. We use our method for summarizing a collection of Persian blogs from PersianBlog hosting and investigate its influence on blog clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tang, B., Shepherd, M., Milios, E., Heywood, M.: Comparing and combining dimension reduction techniques for efficient text clustering. In: Proceeding of SIAM International Workshop on Feature Selection for Data Mining, pp. 17–26 (2005)

    Google Scholar 

  2. Molina, L.C., Belanche, L., Nebot, A.: Feature selection algorithms: a survey and experimental evaluation. In: Proceeding of ICDM 2002, pp. 306–313 (2002)

    Google Scholar 

  3. Mani, I.: Automatic summarization. John Benjamins Publishing, Amsterdam (2001)

    MATH  Google Scholar 

  4. Jones, K.S.: Automatic summarizing: factors and directions. In: Advances in automatic text summarization. MIT Press, Cambridge (1999)

    Google Scholar 

  5. Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceeding of ACM SIGIR 1998, pp. 335–336 (1998)

    Google Scholar 

  6. Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J.C., Elebi, A., Dimitrov, S., Drabek, E., Hakim, A., Lam, W., Liu, D., Otterbacher, J., Qi, H., Saggion, H., Teufel, S., Topper, M., Winkel, A., Zhang, Z.: MEAD - a platform for multidocument multilingual text summarization. In: Proceeding of LREC 2004 (2004)

    Google Scholar 

  7. Berger, A.L., Mittal, V.O.: OCELOT: a system for summarizing Web pages. In: Proceeding of 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 144–151 (2000)

    Google Scholar 

  8. Sun, J.T., Shen, D., Zeng, H.J., Yang, Q., Lu, Y., Chen, Z.: Web-page summarization using clickthrough data. In: Proceeding of SIGIR 2005, pp. 194–201 (2005)

    Google Scholar 

  9. Shen, D., Chen, Z., Yang, Q., Zeng, H.J., Zhang, B., Lu, Y., Ma, W.Y.: Web-page classification through summarization. In: Proceeding of 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 242–249 (2004)

    Google Scholar 

  10. Minqing, H., Bing, L.: Mining and summarizing customer reviews. In: Proceeding of SIGKDD 2004, pp. 168–177 (2004)

    Google Scholar 

  11. Ku, L.W., Liang, Y.T., Chen, H.H.: Opinion extraction, summarization and tracking in news and blog corpora. In: Proceeding of AAAI-CAAW 2006 (2006)

    Google Scholar 

  12. Zhou, L., Hovy, E.: On the summarization of dynamically introduced information: online discussions and blogs. In: Proceeding of AAAI-CAAW 2006 (2006)

    Google Scholar 

  13. Hu, M., Sun, A., Lim, E.P.: Comments-oriented blog summarization by sentence extraction. In: Proceeding of CIKM 2007, pp. 901–904 (2007)

    Google Scholar 

  14. Lin, Y.R., Sundaram, H.: Blog antenna: summarization of personal blog temporal dynamics based on self-similarity factorization. In: Proceeding of International Conference on Multimedia and Expo. (ICME 2007), pp. 540–543 (2007)

    Google Scholar 

  15. Amigo, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. In: Information Retrieval. Springer, Heidelberg (2008)

    Google Scholar 

  16. He, J., Tan, A.H., Tan, C.L., Sung, S.Y.: On quantitative evaluation of clustering systems. In: Wu, W., Xiong, H., Shekhar, S. (eds.) Clustering and Information Retrieval, pp. 105–133. Kluwer Academic Publishers, Dordrecht (2004)

    Google Scholar 

  17. Taghva, K., Beckley, R., Sadeh, M.: A list of farsi stopwords. Technical Report 2003-01, Information Science Research Institute, University of Nevada, Las Vegas (2003)

    Google Scholar 

  18. Karypis, G.: CLUTO: a clustering toolkit. Technical Report 02-017, University of Minnesota (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Asbagh, M.J., Sayyadi, M., Abolhassani, H. (2009). Blog Summarization for Blog Mining. In: Lee, R., Ishii, N. (eds) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. Studies in Computational Intelligence, vol 209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01203-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01203-7_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01202-0

  • Online ISBN: 978-3-642-01203-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics