Efficient Term Cloud Generation for Streaming Web Content

Papapetrou, Odysseas; Papadakis, George; Ioannou, Ekaterini; Skoutas, Dimitrios

doi:10.1007/978-3-642-13911-6_26

Odysseas Papapetrou²⁰,
George Papadakis²⁰,
Ekaterini Ioannou²⁰ &
…
Dimitrios Skoutas²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6189))

Included in the following conference series:

International Conference on Web Engineering

1784 Accesses
5 Citations

Abstract

Large amounts of information are posted daily on the Web, such as articles published online by traditional news agencies or blog posts referring to and commenting on various events. Although the users sometimes rely on a small set of trusted sources from which to get their information, they often also want to get a wider overview and glimpse of what is being reported and discussed in the news and the blogosphere. In this paper, we present an approach for supporting this discovery and exploration process by exploiting term clouds. In particular, we provide an efficient method for dynamically computing the most frequently appearing terms in the posts of monitored online sources, for time intervals specified at query time, without the need to archive the actual published content. An experimental evaluation on a large-scale real-world set of blogs demonstrates the accuracy and the efficiency of the proposed method in terms of computational time and memory requirements.

Download to read the full chapter text

Chapter PDF

Query Expansion with a Little Help from Twitter

A framework for social media data analytics using Elasticsearch and Kibana

Article 11 December 2018

A novel time-shifting method to find popular blog post topics

Article 02 November 2019

References

Angel, A., Koudas, N., Sarkas, N., Srivastava, D.: What’s on the grapevine? In: SIGMOD, pp. 1047–1050 (2009)
Google Scholar
Bansal, N., Koudas, N.: Blogscope: spatio-temporal analysis of the blogosphere. In: WWW, pp. 1269–1270 (2007)
Google Scholar
Bansal, N., Koudas, N.: Searching the blogosphere. In: WebDB (2007)
Google Scholar
Berlocher, I., Lee, K.-I., Kim, K.: TopicRank: bringing insight to users. In: SIGIR, pp. 703–704 (2008)
Google Scholar
Chi, Y., Tseng, B.L., Tatemura, J.: Eigen-trend: trend analysis in the blogosphere based on singular value decompositions. In: CIKM, pp. 68–77 (2006)
Google Scholar
Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. In: PVLDB, pp. 1530–1541 (2008)
Google Scholar
Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: SODA, pp. 28–36 (2003)
Google Scholar
He, B., Macdonald, C., He, J., Ounis, I.: An effective statistical approach to blog post opinion retrieval. In: CIKM, pp. 1063–1072 (2008)
Google Scholar
Jatowt, A., Kawai, Y., Tanaka, K.: Visualizing historical content of web pages. In: WWW, pp. 1221–1222 (2008)
Google Scholar
Jin, C., Qian, W., Sha, C., Yu, J.X., Zhou, A.: Dynamically maintaining frequent items over a data stream. In: CIKM, pp. 287–294 (2003)
Google Scholar
Juffinger, A., Lex, E.: Crosslanguage blog mining and trend visualisation. In: WWW, pp. 1149–1150 (2009)
Google Scholar
Kendall, M., Gibbons, J.D.: Rank Correlation Methods. Edward Arnold, London (1990)
Google Scholar
Koutrika, G., Zadeh, Z.M., Garcia-Molina, H.: Data clouds: summarizing keyword search results over structured data. In: EDBT, pp. 391–402 (2009)
Google Scholar
Kuo, B.Y.-L., Hentrich, T., Good, B.M., Wilkinson, M.D.: Tag clouds for summarizing web search results. In: WWW, pp. 1203–1204 (2007)
Google Scholar
Leskovec, J., Backstrom, L., Kleinberg, J.M.: Meme-tracking and the dynamics of the news cycle. In: KDD, pp. 497–506 (2009)
Google Scholar
Manerikar, N., Palpanas, T.: Frequent items in streaming data: An experimental evaluation of the state-of-the-art. Data Knowl. Eng. 68(4), 415–430 (2009)
Article Google Scholar
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: VLDB, pp. 346–357 (2002)
Google Scholar
Melville, P., Gryc, W., Lawrence, R.D.: Sentiment analysis of blogs by combining lexical knowledge with text classification. In: KDD, pp. 1275–1284 (2009)
Google Scholar
Platakis, M., Kotsakos, D., Gunopulos, D.: Searching for events in the blogosphere. In: WWW, pp. 1225–1226 (2009)
Google Scholar
Tantono, F.I., Manerikar, N., Palpanas, T.: Efficiently discovering recent frequent items in data streams. In: SSDBM, pp. 222–239 (2008)
Google Scholar
Wong, R.C.-W., Fu, A.W.-C.: Mining top-k frequent itemsets from data streams. Data Mining and Knowledge Discovery 13, 193–217
Google Scholar
Zhang, W., Yu, C.T., Meng, W.: Opinion retrieval from blogs. In: CIKM, pp. 831–840 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

L3S Research Center, Hannover, Germany
Odysseas Papapetrou, George Papadakis, Ekaterini Ioannou & Dimitrios Skoutas

Authors

Odysseas Papapetrou
View author publications
You can also search for this author in PubMed Google Scholar
George Papadakis
View author publications
You can also search for this author in PubMed Google Scholar
Ekaterini Ioannou
View author publications
You can also search for this author in PubMed Google Scholar
Dimitrios Skoutas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CSE, University of New South Wales, Australia
Boualem Benatallah
Department of Information Engineering and Computer Science, University of Trento, Via sommarive 14, 38050, Povo (Trento), Italy
Fabio Casati
Business Informatics Group, Vienna University of Technology, P.O. Box,
Gerti Kappel
Facultad de Informática, Universidad Nacional de La Plata and Conicet, Argentina
Gustavo Rossi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Papapetrou, O., Papadakis, G., Ioannou, E., Skoutas, D. (2010). Efficient Term Cloud Generation for Streaming Web Content. In: Benatallah, B., Casati, F., Kappel, G., Rossi, G. (eds) Web Engineering. ICWE 2010. Lecture Notes in Computer Science, vol 6189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13911-6_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-13911-6_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13910-9
Online ISBN: 978-3-642-13911-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Term Cloud Generation for Streaming Web Content

Abstract

Chapter PDF

Similar content being viewed by others

Query Expansion with a Little Help from Twitter

A framework for social media data analytics using Elasticsearch and Kibana

A novel time-shifting method to find popular blog post topics

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Efficient Term Cloud Generation for Streaming Web Content

Abstract

Chapter PDF

Similar content being viewed by others

Query Expansion with a Little Help from Twitter

A framework for social media data analytics using Elasticsearch and Kibana

A novel time-shifting method to find popular blog post topics

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation