Graph sampling with applications to estimating the number of pattern embeddings and the parameters of a statistical relational model

Ravkic, Irma; Žnidaršič, Martin; Ramon, Jan; Davis, Jesse

doi:10.1007/s10618-018-0553-2

Graph sampling with applications to estimating the number of pattern embeddings and the parameters of a statistical relational model

Published: 06 March 2018

Volume 32, pages 913–948, (2018)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Irma Ravkic¹^na1,
Martin Žnidaršič²^na1,
Jan Ramon¹ &
…
Jesse Davis¹

510 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Counting the number of times a pattern occurs in a database is a fundamental data mining problem. It is a subroutine in a diverse set of tasks ranging from pattern mining to supervised learning and probabilistic model learning. While a pattern and a database can take many forms, this paper focuses on the case where both the pattern and the database are graphs (networks). Unfortunately, in general, the problem of counting graph occurrences is #P-complete. In contrast to earlier work, which focused on exact counting for simple (i.e., very short) patterns, we present a sampling approach for estimating the statistics of larger graph pattern occurrences. We perform an empirical evaluation on synthetic and real-world data that validates the proposed algorithm, illustrates its practical behavior and provides insight into the trade-off between its accuracy of estimation and computational efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering graph data: the roadmap to spectral techniques

Article Open access 22 January 2024

Graph based anomaly detection and description: a survey

Article 05 July 2014

Causal Structure Learning: A Combinatorial Perspective

Article Open access 01 August 2022

Notes

Graphs where each edge is included, independent of all other edges, with probability p.
For the code see: https://dtai.cs.kuleuven.be/software/gs-srl.
https://snap.stanford.edu/data/.
http://www.informatik.uni-trier.de/~ley/db/index.html.
http://alchemy.cs.washington.edu.
Note that this is a slight simplification as LBN uses first-order logic to perform parameter tying across multiple random variables.
We omit FACT on these plots to declutter them and because its results would simply be a point.
The experiments were run on a machine with 10 Gb of RAM.
https://dtai.cs.kuleuven.be/software/gs-srl.

References

Ariely D (2008) Predictably irrational: the hidden forces that shape our decisions. Harper Collins, New York
Google Scholar
Barabasi AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Article MathSciNet MATH Google Scholar
Baskerville K, Grassberger P, Paczuski M (2007) Graph animals, subgraph sampling, and motif search in large networks. Phys Rev E 76(3):036107
Article MathSciNet Google Scholar
Bordino I, Donato D, Gionis A, Leonardi S (2008) Mining large networks with subgraph counting. In: Proceedings of the 2008 IEEE international conference on data mining (ICDM), pp 737–742
Cordella L, Foggia P, Sansone C, Vento M (2004) A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372
Article Google Scholar
Das M, Wu Y, Khot T, Kersting K, Natarajan S (2016) Scaling lifted probabilistic inference and learning via graph databases. In: Proceedings of the 2016 SIAM international conference on data mining (SDM), pp 738–746
Davis J, Domingos P (2009) Deep transfer via second-order Markov logic. In: Proceedings of the 26th international conference on machine learning (ICML), pp 217–224
Davis J, Burnside E, Dutra IC, Page D, Costa VS (2005) An integrated approach to learning Bayesian networks of rules. In: Proceedings of the 16th European conference on machine learning (ECML), pp 84–95
Di Natale R, Ferro A, Giugno R, Mongiovi M, Pulvirenti A, Shasha D (2010) SING: subgraph search in non-homogeneous graphs. BMC Bioinform 11(1):96
Article Google Scholar
Fierens D, Blockeel H, Ramon J, Bruynooghe M (2004) Logical Bayesian networks. In: Proceedings of the 3rd international workshop on multi-relational data mining (MRDM), pp 19–30
Friedman N, Goldzsmidt M (1996) Learning Bayesian networks with local structure. In: Proceedings of the 12th annual conference on uncertainty in artificial intelligence (UAI), pp 252–262
Fürer M, Kasiviswanathan SP (2014) Approximately counting embeddings into random graphs. Combin Probab Comput 23(6):1028–1056
Article MathSciNet MATH Google Scholar
Getoor L, Taskar B (2007) Introduction to statistical relational learning. MIT Press, Cambridge
MATH Google Scholar
Giugno R, Shasha D (2002) GraphGrep: A fast and universal method for querying graphs. In: Proceedings of the 16th international conference on pattern recognition (ICPR), pp 112–115
Huynh T, Mooney R (2008) Discriminative structure and parameter learning for Markov logic networks. In: Proceedings of the 25th international conference on machine learning, pp 416–423
Inokuchi A, Washio T, Motoda H (2003) Complete mining of frequent patterns from graphs: mining graph data. Mach Learn 50(3):321–354
Article MATH Google Scholar
Jowhari H, Ghodsi M (2005) New streaming algorithms for counting triangles in graphs. In: Proceedings of the 11th international conference on computing and combinatorics (COCOON), pp 710–716
Kashtan N, Itzkovitz S, Milo R, Alon U (2004) Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20(11):1746–1758
Article Google Scholar
Kersting K, De Raedt L, Kramer S (2000) Interpreting Bayesian logic programs. In: Proceedings of the AAAI-2000 workshop on learning statistical models from relational data, pp 29–35
Kok S, Domingos P (2005) Learning the structure of Markov logic networks. In: Proceedings of the 22nd international conference on machine learning (ICML), pp 441–448
Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 631–636
Mewes HW, Frishman D, Gruber C, Geier B, Haase D, Kaps A, Lemcke K, Mannhaupt G, Pfeiffer F, Schüller C, Stocker S, Weil B (2000) MIPS: a database for genomes and protein sequences. Nucleic Acids Res 28(1):37–40
Article Google Scholar
Pržulj N (2007) Biological network comparison using graphlet degree distribution. Bioinformatics 23(2):177–183
Article Google Scholar
Ravkic I, Ramon J, Davis J (2015) Learning relational dependency networks in hybrid domains. Mach Learn 100(2–3):217–254
Article MathSciNet MATH Google Scholar
Richards BL, Mooney RJ (1992) Learning relations by pathfinding. In: Proceedings of the 10th national conference on artificial intelligence (AAAI), pp 50–55
Richardson M, Domingos P (2006) Markov logic networks. Mach Learn 62(1–2):107–136
Article Google Scholar
Shervashidze N, Vishwanathan S, Petri T, Mehlhorn K, Borgwardt K (2009) Efficient graphlet kernels for large graph comparison. In: Proceedings of the 12th international conference on artificial intelligence and statistics (AISTATS), pp 488–495
Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: Extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 990–998
Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM 23(1):31–42
Article MathSciNet Google Scholar
Van Haaren J, Kolobov A, Davis J (2015) TODTLER: two-order-deep transfer learning. In: Proceedings of the 29th AAAI conference on artificial intelligence, pp 3007–3015
Venugopal D, Sarkhel S, Gogate V (2015) Just count the satisfied groundings: scalable local-search and sampling based inference in MLNs. In: Proceedings of the 29th AAAI conference on artificial intelligence, pp 3606–3612
Wernicke S (2005) A faster algorithm for detecting network motifs. In: Proceedings of the 5th international workshop on algorithms in bioinformatics (WABI), pp 165–177
Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM), pp 721–724
Zou R, Holder LB (2010) Frequent subgraph mining on a single large graph using sampling techniques. In: Proceedings of the 8th workshop on mining and learning with graphs (MLG), pp 171–178

Download references

Acknowledgements

IR was partially supported by the KU Leuven Research Fund (OT/11/051) and is currently affiliated with the University of California, Los Angeles. MZ was partially supported by the KU Leuven Research Fund (OT/11/051) and the Slovenian Research Agency (P2-0103). JD is partially supported by the KU Leuven Research Fund (OT/11/051, C14/17/070, C22/15/015, C32/17/036) and FWO-Vlaanderen (G.0356.12, SBO-150033).

Author information

The first two authors contributed equally to this work and are ordered alphabetically.

Authors and Affiliations

Department of Computer Science, KU Leuven, 3001, Heverlee, Leuven, Belgium
Irma Ravkic, Jan Ramon & Jesse Davis
Jožef Stefan Institute, Jamova Cesta 39, 1000, Ljubljana, Slovenia
Martin Žnidaršič

Authors

Irma Ravkic
View author publications
You can also search for this author in PubMed Google Scholar
Martin Žnidaršič
View author publications
You can also search for this author in PubMed Google Scholar
Jan Ramon
View author publications
You can also search for this author in PubMed Google Scholar
Jesse Davis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Irma Ravkic.

Additional information

Responsible editor: Andrea Passerini, Thomas Gaertner, Celine Robardet and Mirco Nanni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ravkic, I., Žnidaršič, M., Ramon, J. et al. Graph sampling with applications to estimating the number of pattern embeddings and the parameters of a statistical relational model. Data Min Knowl Disc 32, 913–948 (2018). https://doi.org/10.1007/s10618-018-0553-2

Download citation

Received: 18 April 2016
Accepted: 07 February 2018
Published: 06 March 2018
Issue Date: July 2018
DOI: https://doi.org/10.1007/s10618-018-0553-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Graph sampling with applications to estimating the number of pattern embeddings and the parameters of a statistical relational model

Abstract

Access this article

Similar content being viewed by others

Clustering graph data: the roadmap to spectral techniques

Graph based anomaly detection and description: a survey

Causal Structure Learning: A Combinatorial Perspective

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Graph sampling with applications to estimating the number of pattern embeddings and the parameters of a statistical relational model

Abstract

Access this article

Similar content being viewed by others

Clustering graph data: the roadmap to spectral techniques

Graph based anomaly detection and description: a survey

Causal Structure Learning: A Combinatorial Perspective

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation