Abstract
In everyday life, we often observe unusually frequent interactions among people before or during important events, e.g., people send/receive more greetings to/from their friends on holidays than regular days. We also observe that some videos or hashtags suddenly go viral through people’s sharing on online social networks (OSNs). Do these seemingly different phenomena share a common structure? All these phenomena are associated with the sudden surges of node interactions in networks, which we call “bursts” in this work. We uncover that, in many scenarios, the emergence of a burst is accompanied with the formation of triangles in networks. This finding motivates us to propose a new and robust method for burst detection on an OSN. We first introduce a new measure, i.e., “triadic cardinality distribution,” corresponding to the fractions of nodes with different numbers of triangles, i.e., triadic cardinalities, in a network. We show that this distribution not only changes when a burst occurs, but it also has a robustness property that it is immunized against common spamming social-bot attacks. Hence, by tracking triadic cardinality distributions, we can more reliably detect bursts than simply counting node interactions on an OSN. To avoid handling massive activity data generated by OSN users during the triadic tracking, we design an efficient “sample-estimate” framework to provide maximum likelihood estimate of the triadic cardinality distribution. We propose several sampling methods and provide insights into their performance difference through both theoretical analysis and empirical experiments on real-world networks.
This is a preview of subscription content, access via your institution.















Notes
- 1.
We assume two Enron users are friends if they have at least one email communication in the dataset.
References
- 1.
Ahmed NK, Duffield N, Neville J, Kompella R (2014) Graph sample and hold: a framework for big-graph analytics. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining
- 2.
Arifuzzaman S, Khan M, Vishnu M (2019) Fast parallel algorithms for counting and listing triangles in big graphs. ACM Trans Knowl Discov Data 14(1):1–34
- 3.
Backstrom L, Bakshy E, Kleinberg J, Lento TM, Rosenn I (2011) Center of attention: How Facebook users allocate attention across friends. In: Proceedings of the 5th international AAAI conference on weblogs and social media
- 4.
Barabasi AL (2005) The origin of bursts and heavy tails in human dynamics. Nature 435:207–211
- 5.
Becchetti L, Boldi P, Castillo C, Gionis A (2008) Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining
- 6.
Beta-binomial distribution. https://en.wikipedia.org/wiki/Beta-binomial_distribution (Retrieved June 2020)
- 7.
Beutel A, Xu W, Guruswami V, Palow C, Faloutsos C (2013) CopyCatch: stopping group attacks by spotting lockstep behavior in social networks. In: Proceedings of the 22nd international world wide web conference
- 8.
Boshmaf Y, Muslukhov I, Beznosov K, Ripeanu M (2011) The socialbot network: When bots socialize for fame and money. In: Proceedings of the 27th annual computer security applications conference
- 9.
Budak C, Agrawal D, Abbadi AE (2011) Structural trend analysis for online social networks. In: Proceedings of the VLDB endowment
- 10.
Cadena J, Vullikanti A (2018) Mining heavy temporal subgraphs: Fast algorithms and applications. In: Proceedings of the 32nd AAAI conference on artificial intelligence
- 11.
Chierichetti F, Kleinberg J, Kumar R, Mahdian M, Pandey S (2014) Event detection via communication pattern analysis. In: Proceedings of the 8th international AAAI conference on weblogs and social media
- 12.
Chu Z, Gianvecchio S, Wang H, Jajodia S (2010) Who is tweeting on Twitter: Human, bot, or cyborg? In: Proceedings of the 26th annual computer security applications conference
- 13.
Costa AF, Yamaguchi Y, Traina AJM Jr, Faloutsos CT (2017) Modeling temporal activity to detect anomalous behavior in social media. ACM Trans Knowl Discov Data 11(4):1–23
- 14.
Cramér-Rao bound. https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_bound (Retrived June 2020)
- 15.
Duffield N, Lund C, Thorup M (2003) Estimating flow distributions from sampled flow statistics. In: Proceedings of the ACM special interest group on data communication
- 16.
Durand M, Flajolet P (2003) Loglog counting of large cardinalities. In: Proceedings of the 11th annual European symposium on algorithms
- 17.
Eftekhar M, Koudas N, Ganjali Y (2013) Bursty subgraphs in social networks. In: Proceedings of the 6th international ACM conference on web search and data mining
- 18.
Eswaran D, Faloutsos C, Guha S, Mishra N (2018) SpotLight: Detecting anomalies in streaming graphs. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining
- 19.
Gjoka M, Kurant M, Butts CT, Markopoulou A (2011) Practical recommendations on crawling online social networks. IEEE J Sel Areas Commun 29(9):1872–1892
- 20.
Gorman JD, Hero AO (1990) Lower bounds for parametric estimation with constraints. IEEE Transit Inf Theory 26(6):1285–1301
- 21.
Grier C, Thomas K, Paxson V, Zhang M (2010) @spam: The underground on 140 characters or less. In: Proceedings of the ACM SIGSAC conference on computer and communications security
- 22.
Harvey M (2020) Fans mourn artist for whom it didn’t matter if you were black or white. http://wayback.archive.org/web/20100531165925/http://www.timesonline.co.uk/tol/news/world/us_and_americas/article6580897.ece (Retrieved June 2020)
- 23.
Jha M, Seshadhri C, Pinar A (2013) A space efficient streaming algorithm for triangle counting using the birthday paradox. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining
- 24.
Kleinberg J (2002) Bursty and hierarchical structure in streams. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining
- 25.
Klimt B, Yang Y (2004) The Enron corpus: A new dataset for email classification research. In: Proceeding of the European conference on machine learning and principles and practice of knowledge discovery in databases
- 26.
Kossinets G, Watts DJ (2006) Empirical analysis of an evolving social network. Science 311(5757):88–90
- 27.
Krikorian R (2020) New tweets per second record, and how! https://blog.twitter.com/engineering/en_us/a/2013/new-tweets-per-second-record-and-how.html (Retrived June 2020)
- 28.
Leskovec J, McGlohon M, Faloutsos C, Glance N, Hurst M (2007) Cascading behavior in large blog graphs. In: Proceedings of the 7th SIAM international conference on data mining
- 29.
Lim Y, Kang U (2015) MASCOT: Memory-efficient and accurate sampling for counting local triangles in graph streams. In: Proceedings of the 21st ACM SIGKDD international conference on knowledge discovery and data mining. Sydney, Australia
- 30.
Liu X, Ge T, Wu Y (2019) Finding densest lasting subgraphs in dynamic graphs: a stochastic approach. In: Proceedings of the 35th IEEE international conference on data engineering
- 31.
London riots: more than 2,000 people arrested over disorder. http://www.mirror.co.uk/news/uk-news/london-riots-more-than-2000-people-185548 (Retrived June 2020)
- 32.
Manzoor E, Milajerdi, SM, Akoglu L (2016) Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. San Francisco, California, USA
- 33.
Mathioudakis M, Bansal N, Koudas N (2010) Identifying, attributing and describing spatial bursts. In: Proceedings of the VLDB endowment
- 34.
Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827
- 35.
Pagh R, Tsourakakis CE (2012) Colorful triangle counting and a MapReduce implementation. J Inf Process Lett 112(7):277–281
- 36.
Paranjape A, Benson AR, Leskovec J (2017)Motifs in temporal networks. In: Proceedings of the 10th ACM international conference on web search and data mining
- 37.
Parikh N, Sundaresan N (2008)Scalable and near real-time burst detection from ecommerce queries. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining
- 38.
Pavan A, Tangwongsan K, Tirthapura S, Wu KL (2013)Counting and sampling triangles from a graph stream. In: Proceedings of the VLDB endowment
- 39.
Ribeiro B, Towsley D, Ye T, Bolot JC (2006) Fisher information of sampled packets: An application to flow size estimation. In: Proceedings of the 6th ACM SIGCOMM conference on internet measurement
- 40.
Rodrigues T, Benevenuto F, Cha M, Gummadi KP, Almeida V (2011) On word-of-mouth based discovery of the web. In: Proceedings of the 11th ACM SIGCOMM conference on internet measurement
- 41.
Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: Real-time event detection by social sensors. In: Proceedings of the 19th international world wide web conference
- 42.
Sanei-Mehri SV, Sarüyüce AE, Tirthapura S (2018) Butterfly counting in bipartite networks. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining
- 43.
Sarüyüce AE, Pinar A (2018) Peeling bipartite networks for dense subgraph discovery. In: Proceedings of the 11th international ACM conference on web search and data mining, pp. 504–512
- 44.
Seshadhri C, Pinar A, Kolda TG (2013) Triadic measures on graphs: The power of wedge sampling. In: Proceedings of the 13th SIAM international conference on data mining
- 45.
Shiels M (2020) Web slows after Jackson’s death. http://news.bbc.co.uk/2/hi/technology/8120324.stm (Retrieved June 2020)
- 46.
Shin K, Oh S, Kim J, Hooi B, Faloutsos C (2019) Fast, accurate and provable triangle counting in fully dynamic graph streams. ACM Trans Knowl Discov Data 14(2):1–39
- 47.
Stefani LD, Epasto A, Riondato M, Upfal E (2016) TRIEST: Counting local and global triangles in fully-dynamic streams with fixed memory size. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining
- 48.
Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In: Proceedings of the 26th annual computer security applications conference
- 49.
Takahashi T, Tomioka R, Yamanishi K (2011) Discovering emerging topics in social streams via link anomaly detection. In: Proceedings of the IEEE international conference on data mining
- 50.
Teng X, Yan M, Ertugrul AM, Lin YR (2018) Deep into hypersphere: Robust and unsupervised anomaly discovery in dynamic networks. In: Proceedings of the 27th international joint conference on artificial intelligence
- 51.
Thomas K, Grier C, Paxson V, Song D (2011) Suspended accounts in retrospect: an analysis of Twitter spam. In: Proceedings of the 11th ACM SIGCOMM conference on internet measurement
- 52.
Trees HLV (2001) Detection, estimation, and modulation theory. Part I. Wiley, Hoboken
- 53.
Tsotsis A (2020) First credible reports of Bin Laden’s death spread like wildfire on Twitter. https://techcrunch.com/2011/05/01/news-of-osama-bin-ladens-death-spreads-like-wildfire-on-twitter (Retrived June 2020)
- 54.
Tsourakakis CE, Kang U, Miller GL, Faloutsos C (2009) DOULION: Counting triangles in massive graphs with a coin. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining
- 55.
Tune P, Veitch D (2011) Fisher information in flow size distribution estimation. IEEE Trans Inf Theory 57(10):7011–7035
- 56.
Turkett W, Fulp E, Lever C, Edward Allan J (2011) Graph mining of motif profiles for computer network activity inference. In: Proceedings of the 7th workshop on mining and learning with graphs
- 57.
Veitch D, Tune P (2015) Optimal sampling for the flow size distribution. IEEE Trans Inf Theory 61(6):3075–3099
- 58.
Wang P, Guan X, Zhao J, Tao J, Qin T (2014) A new sketch method for measuring host connection degree distribution. IEEE Trans Inf Forensics Secur 9(6):948–960
- 59.
Wang P, Lui JC, Ribeiro B, Towsley D, Zhao J, Guan X (2014) Efficiently estimating motif statistics of large networks. ACM Trans Knowl Discov Data 9(2):1–27
- 60.
Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442
- 61.
Wu B, Yi K, Li Z (2016) Counting triangles in large graphs by random sampling. IEEE Trans Knowl Data Eng 28(8):2013–2026
- 62.
Yi J (2005) Detecting buzz from time-sequenced document streams. In: Proceedings of the IEEE international conference on e-technology, e-commerce and e-service
- 63.
Yoon M, Hooi B, Shin K, Faloutsos C (2019) Fast and accurate anomaly detection in dynamic graphs with a two-pronged approach. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining
- 64.
Yu C, Zelterman D (2002) Sums of dependent Bernoulli random variables and disease clustering. Stat Probab Lett 57(1):363–373
- 65.
Yu M, Song C, Gu J, Liu M (2019) Distributed triangle counting algorithms in simple graph stream. In: Proceedings of the 25th IEEE international conference on parallel and distributed system
- 66.
Zhao J, Lui JC, Towsley D, Wang P, Guan X (2015) Tracking triadic cardinality distributions for burst detection in social activity streams. In: ACM conference on online social networks
- 67.
Zhu Y, Shasha D (2003) Efficient elastic burst detection in data streams. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining
Acknowledgements
This work was supported in part by the National Key R&D Program of China (2016YFE0206700), Shenzhen Basic Research Grant (JCYJ20170816100819428), Sichuan Science and Technology Program (2019YFG0407), and the National Natural Science Foundation of China (61922067, U1736205, 61902305).
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhao, J., Wang, P., Chen, Z. et al. Tracking triadic cardinality distributions for burst detection in high-speed graph streams. Knowl Inf Syst (2021). https://doi.org/10.1007/s10115-021-01543-x
Received:
Revised:
Accepted:
Published:
Keywords
- Burst detection
- Graph sampling
- Streaming algorithms
- Online social network
- Estimation theory