Abstract
Large amounts of information are posted daily on the Web, such as articles published online by traditional news agencies or blog posts referring to and commenting on various events. Although the users sometimes rely on a small set of trusted sources from which to get their information, they often also want to get a wider overview and glimpse of what is being reported and discussed in the news and the blogosphere. In this paper, we present an approach for supporting this discovery and exploration process by exploiting term clouds. In particular, we provide an efficient method for dynamically computing the most frequently appearing terms in the posts of monitored online sources, for time intervals specified at query time, without the need to archive the actual published content. An experimental evaluation on a large-scale real-world set of blogs demonstrates the accuracy and the efficiency of the proposed method in terms of computational time and memory requirements.
Chapter PDF
Similar content being viewed by others
References
Angel, A., Koudas, N., Sarkas, N., Srivastava, D.: What’s on the grapevine? In: SIGMOD, pp. 1047–1050 (2009)
Bansal, N., Koudas, N.: Blogscope: spatio-temporal analysis of the blogosphere. In: WWW, pp. 1269–1270 (2007)
Bansal, N., Koudas, N.: Searching the blogosphere. In: WebDB (2007)
Berlocher, I., Lee, K.-I., Kim, K.: TopicRank: bringing insight to users. In: SIGIR, pp. 703–704 (2008)
Chi, Y., Tseng, B.L., Tatemura, J.: Eigen-trend: trend analysis in the blogosphere based on singular value decompositions. In: CIKM, pp. 68–77 (2006)
Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. In: PVLDB, pp. 1530–1541 (2008)
Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: SODA, pp. 28–36 (2003)
He, B., Macdonald, C., He, J., Ounis, I.: An effective statistical approach to blog post opinion retrieval. In: CIKM, pp. 1063–1072 (2008)
Jatowt, A., Kawai, Y., Tanaka, K.: Visualizing historical content of web pages. In: WWW, pp. 1221–1222 (2008)
Jin, C., Qian, W., Sha, C., Yu, J.X., Zhou, A.: Dynamically maintaining frequent items over a data stream. In: CIKM, pp. 287–294 (2003)
Juffinger, A., Lex, E.: Crosslanguage blog mining and trend visualisation. In: WWW, pp. 1149–1150 (2009)
Kendall, M., Gibbons, J.D.: Rank Correlation Methods. Edward Arnold, London (1990)
Koutrika, G., Zadeh, Z.M., Garcia-Molina, H.: Data clouds: summarizing keyword search results over structured data. In: EDBT, pp. 391–402 (2009)
Kuo, B.Y.-L., Hentrich, T., Good, B.M., Wilkinson, M.D.: Tag clouds for summarizing web search results. In: WWW, pp. 1203–1204 (2007)
Leskovec, J., Backstrom, L., Kleinberg, J.M.: Meme-tracking and the dynamics of the news cycle. In: KDD, pp. 497–506 (2009)
Manerikar, N., Palpanas, T.: Frequent items in streaming data: An experimental evaluation of the state-of-the-art. Data Knowl. Eng. 68(4), 415–430 (2009)
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: VLDB, pp. 346–357 (2002)
Melville, P., Gryc, W., Lawrence, R.D.: Sentiment analysis of blogs by combining lexical knowledge with text classification. In: KDD, pp. 1275–1284 (2009)
Platakis, M., Kotsakos, D., Gunopulos, D.: Searching for events in the blogosphere. In: WWW, pp. 1225–1226 (2009)
Tantono, F.I., Manerikar, N., Palpanas, T.: Efficiently discovering recent frequent items in data streams. In: SSDBM, pp. 222–239 (2008)
Wong, R.C.-W., Fu, A.W.-C.: Mining top-k frequent itemsets from data streams. Data Mining and Knowledge Discovery 13, 193–217
Zhang, W., Yu, C.T., Meng, W.: Opinion retrieval from blogs. In: CIKM, pp. 831–840 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Papapetrou, O., Papadakis, G., Ioannou, E., Skoutas, D. (2010). Efficient Term Cloud Generation for Streaming Web Content. In: Benatallah, B., Casati, F., Kappel, G., Rossi, G. (eds) Web Engineering. ICWE 2010. Lecture Notes in Computer Science, vol 6189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13911-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-13911-6_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13910-9
Online ISBN: 978-3-642-13911-6
eBook Packages: Computer ScienceComputer Science (R0)