Abstract
We are witnessing a widespread of web syndication technologies such as RSS or Atom for a timely delivery of frequently updated Web content. Almost every personal weblog, news portal, or discussion forum employs nowadays RSS/Atom feeds for enhancing pull-oriented searching and browsing of web pages with push-oriented protocols of web content. Social media applications such as Twitter or Facebook also employ RSS for notifying users about the newly available posts of their preferred friends. Unfortunately, previous works on RSS/Atom statistical characteristics do not provide a precise and updated characterization of feeds’ behavior and content, characterization which can be used to successfully benchmark effectiveness and efficiency of various RSS processing/analysis techniques. In this paper, we present the first thorough analysis of three complementary features of real-scale RSS feeds, namely, publication activity, items structure and length, as well as, vocabulary of its content which we believe are crucial for Web 2.0 applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ahmad, F., Kondrak, G.: Learning a Spelling Error Model from Search Query Logs. In: EMNLP (2005)
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999)
Bouras, C., Poulopoulos, V., Tsogkas, V.: Creating Dynamic, Personalized RSS Summaries. In: ICDM, pp. 1–15 (2008)
Choi, S.-W.: Some statistical properties and zipf’s law in korean text corpus. JQL 7(1), 19–30 (2000)
Haghani, P., Michel, S., Aberer, K.: The gist of everything new: personalized top-k processing over web 2.0 streams. In: CIKM, pp. 489–498. ACM, New York (2010)
Hatzigeorgiu, N., Mikros, G., Carayannis, G.: Word length, word frequencies and zipf’s law in the greek language. JQL 8(3), 175–185 (2001)
Hristidis, V., Valdivia, O., Vlachos, M., Yu, P.S.: A System for Keyword Search on Textual Streams. In: SDM (2007)
Hu, C.-L., Chou, C.-K.: RSS Watchdog: an Instant Event Monitor on Real Online News Streams. In: CIKM, pp. 2097–2098 (2009)
Irmak, U., Mihaylov, S., Suel, T., Ganguly, S., Izmailov, R.: Efficient Query Subscription Processing for Prospective Search Engines. In: USENIX, pp. 375–380 (2006)
König, A.C., Church, K.W., Markov, M.: A Data Structure for Sponsored Search. In: ICDE, pp. 90–101 (2009)
Kumar, R., Vassilvitskii, S.: Generalized distances between rankings. In: WWW, pp. 571–580 (2010)
Laherrère, J., Sornette, D.: Stretched exponential distributions in nature and economy: ”fat tails” with characteristic scales. Eur. Phys. J. B 2(4), 525–539 (1998)
Lambiotte, R., Ausloos, M., Thelwall, M.: Word Statistics in Blogs and RSS Feeds: Towards Empirical Universal Evidence. In: CoRR (2007)
Levering, R., Cutler, M.: The portrait of a common html web page. In: ACM Symp. on Document Engineering, pp. 198–204 (2006)
Liu, H., Ramasubramanian, V., Sirer, E.G.: Client Behavior and Feed Characteristics of RSS, a Publish-Subscribe System for Web Micronews. In: IMC, p. 3 (2005)
Ma, S., Zhang, Q.: A Study on Content and Management Style of Corporate Blogs. In: HCI, vol. 15, pp. 116–123 (2007)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Montemurro, M.A.: Beyond the zipf-mandelbrot law in quantitative linguistics. Physica A 300(3-4), 567–578 (2001)
Petrovic, M., Liu, H., Jacobsen, H.-A.: CMS-ToPSS: Efficient Dissemination of RSS Documents. In: VLDB, pp. 1279–1282 (2005)
Pitoura, T., Triantafillou, P.: Self-join size estimation in large-scale distributed data systems. In: ICDE, pp. 764–773 (2008)
Press, O.U.: Rt this: Oup dictionary team monitors twitterer’s tweets (June 2009)
Roitman, H., Carmel, D., Yom-Tov, E.: Maintaining dynamic channel profiles on the web. VLDB 1(1), 151–162 (2008)
Schmidt-Maenz, N., Koch, M.: Patterns in search queries. In: Data Analysis and Decision Support (2005)
Sia, K.C., Cho, J., Cho, H.-K.: Efficient monitoring algorithm for fast news alerts. TKDE 19, 950–961 (2007)
Silberstein, A., Terrace, J., Cooper, B.F., Ramakrishnan, R.: Feeding frenzy: selectively materializing users’ event feeds. In: SIGMOD, pp. 831–842 (2010)
Taddesse, F.G., Tekli, J., Chbeir, R., Viviani, M., Yetongnon, K.: Semantic-Based Merging of RSS Items. In: WWW, vol. 13(1-2), pp. 169–207 (2010)
Thelwall, M., Prabowo, R., Fairclough, R.: Are Raw RSS Feeds Suitable for Broad Issue Scanning? A Science Concern Case Study. JASIST 57(12), 1644–1654 (2006)
Williams, H.E., Zobel, J.: Searchable words on the Web. JODL 5(2), 99–105 (2005)
Zien, J.Y., Meyer, J., Tomlin, J.A., Liu, J.: Web Query Characteristics and their Implications on Search Engines. In: WWW (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hmedeh, Z., Vouzoukidou, N., Travers, N., Christophides, V., du Mouza, C., Scholl, M. (2011). Characterizing Web Syndication Behavior and Content. In: Bouguettaya, A., Hauswirth, M., Liu, L. (eds) Web Information System Engineering – WISE 2011. WISE 2011. Lecture Notes in Computer Science, vol 6997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24434-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-24434-6_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24433-9
Online ISBN: 978-3-642-24434-6
eBook Packages: Computer ScienceComputer Science (R0)