Abstract
Although the extraction of facts and aggregated information from individual Online Social Networks (OSNs) has been extensively studied in the last few years, cross–social media–content examination has received limited attention. Such content examination involving multiple OSNs gains significance as a way to either help us verify unconfirmed-thus-far evidence or expand our understanding about occurring events. Driven by the emerging requirement that future applications shall engage multiple sources, we present the architecture of a distributed crawler which harnesses information from multiple OSNs. We demonstrate that contemporary OSNs feature similar, if not identical, baseline structures. To this end, we propose an extensible model termed SocWeb that articulates the essential structural elements of OSNs in wide use today. To accurately capture features required for cross-social media analyses, SocWeb exploits intra-connections and forms an “amalgamated” OSN. We introduce a flexible API that enables applications to effectively communicate with designated OSN providers and discuss key design choices for our distributed crawler. Our approach helps attain diverse qualitative and quantitative performance criteria including freshness of facts, scalability, quality of fetched data and robustness. We report on a cross-social media analysis compiled using our extensible SocWeb-based crawler in the presence of Facebook and Youtube.
This work was supported by PIRG06-GA-2009-256603.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Asur, S., Huberman, B.A., Szabo, G., Wang, C.: Trends in Social Media. In: 5th Int. AAAI Conf. on Weblogs and Social Media, Barcelona, Spain (February 2011)
Backstrom, L., Huttenlocher, D., Kleinberg, J., Lan, X.: Group Formation in Large Social Networks: Membership, Growth, and Evolution . In: Proc. of the 12th ACM SIGKDD Conf., Philadelphia, PA (October 2006)
Bar-Yossef, Z., Berg, A., Chien, S., Fakcharoenphol, J., Weitz, D.: Approximating Aggregate Queries about Web Pages via Random Walks. In: Proc. of 26th Int. VLDB Conf., Seoul, Korea, pp. 535–544 (September 2006)
Becker, H., Iter, D., Naaman, M., Gravano, L.: Identifying Content for Planned Events across Social Media Sites. In: Proc. of 5th ACM Int. Conf. on WSDM, Seattle, WA (February 2012)
Budak, C., Agrawal, D., El Abbadi, A.: Structural Trend Analysis for Online Social Networks. Proc. of the VLDB Edowment 4(10), 646–656 (2011)
Catanese, S.A., De Meo, P., Ferrara, E., Fiumara, G., Provetti, A.: Crawling facebook for social network analysis purposes. In: Proc. of the Int. Conf. on Web Intelligence, Mining and Semantics (WIMS 2011), Songdal, Norway (May 2011)
Chau, D.H., Pandit, S., Wang, S., Faloutsos, C.: Parallel crawling for online social networks. In: Proc. of the 16th Int. Conf. on WWW, Banff, Canada, pp. 1283–1284 (May 2007)
Cho, J., Garcia-Molina, H.: Synchronizing a database to improve freshness. In: Proc. of the 2000 ACM SIGMOD Conf., Dallas, TX, pp. 117–128 (May 2000)
Cho, J., Garcia-Molina, H.: Parallel Crawlers. In: Proc. of the 11th Int. Conf. on WWW, Honolulu, HI, pp. 124–135 (May 2002)
Rundensteiner, E.A., Wang, D., Ellison, R.T.: Active Complex Event Processing Over Event Streams. Proc. of the VLDB Endow 4(10), 634–645 (2011)
Dou, W., Wang, K., Ribarsky, W., Zhou, M.: Event Detection in Social Media Data. In: IEEE VisWeek Workshop on Interactive Visual Text Analytics, Seattle, WA (October 2012)
Ali, M.H., et al.: Microsoft CEP Server and Online Behavioral Targeting. Proc. of the VLDB Endow. 2(2), 1558–1561 (2009)
Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A.: Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. In: Proc. of the 29th INFOCOM Conf., San Diego, CA (March 2010)
Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: On Near-uniform URL Sampling. In: Proc. of the 9th Int WWW Conf., Amsterdam, The Netherlands (May 2000)
Ipeirotis, P.G., Agichtein, E., Jain, P., Gravano, L.: To search or to crawl?: Towards a query optimizer for text-centric tasks. In: Proc. of the ACM SIGMOD Cong., Chicago, IL, pp. 265–276 (June 2006)
Kahle, B.: Preserving the Internet. In: Scientific American. Nature Publishing Group (March 1997), www.sciamdigital.com
Leskovec, J., Lang, K.J., Mahoney, M.: Empirical Comparison of Algorithms for Network Community Detection. In: Proc. of the 19th Int. Conf. on WWW, Raleigh, NC, pp. 631–640 (April 2010)
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Naaman, M., Boase, J., Lai, C.-H.: Is It Really About Me?: Message Content in Social Awareness Streams. In: Proc. of ACM Conf. on Computer Supported Cooperative Work (CSCW 2010), Savannah, GA, pp. 189–192 (February 2010)
Ntoulas, A., Zerfos, P., Cho, J.: Downloading Textual Hidden Web Content Through Keyword Queries. In: Proc. of the 5th ACM/IEEE JCDL Conf., Denver, CO (June 2005)
Rabinovitch, M., Spatscheck, O.: Web Crawling and Replication. Addison Wesley (2001)
Punera, K., Chakrabarti, S., Subramanyam, M.: Accelerated focused crawling through online relevance feedback. In: Proc. of the 2002 ACM WWW Conf., Honolulu, Hawaii, USA, pp. 148–159 (2002)
Sadilek, A., Kautz, H., Bigham, J.P.: Finding your Friends and Following Them to Where You Are. In: Proc. of the 5th ACM Int. Conf. on WSDM, Seattle, WA, pp. 723–732 (February 2012)
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors. In: Proc. of the 19th Int. Conf. on WWW, Raleigh, NC, pp. 851–860 (April 2010)
Shkapenyuk, V., Suel, T.: Design and Implementation of a High-performance Distributed Web Crawler. In: Proc. of the 18th IEEE ICDE Conf., San Jose, CA, pp. 357–368 (February 2002)
Wu, E., Diao, Y., Rizvi, S.: High-Performance Complex Event Processing Over Streams. In: Proc. of the 2006 ACM SIGMOD Conf., Chicago, IL, pp. 407–418 (June 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Psallidas, F., Ntoulas, A., Delis, A. (2013). Soc Web: Efficient Monitoring of Social Network Activities. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds) Web Information Systems Engineering – WISE 2013. WISE 2013. Lecture Notes in Computer Science, vol 8181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41154-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-41154-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41153-3
Online ISBN: 978-3-642-41154-0
eBook Packages: Computer ScienceComputer Science (R0)