Mining patterns in graphs with multiple weights

Preti, Giulia; Lissandrini, Matteo; Mottin, Davide; Velegrakis, Yannis

doi:10.1007/s10619-019-07259-w

Mining patterns in graphs with multiple weights

Published: 18 February 2019

Volume 39, pages 281–319, (2021)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Giulia Preti ORCID: orcid.org/0000-0002-2126-326X¹,
Matteo Lissandrini²,
Davide Mottin³ &
…
Yannis Velegrakis¹

620 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Graph pattern mining aims at identifying structures that appear frequently in large graphs, under the assumption that frequency signifies importance. In real life, there are many graphs with weights on nodes and/or edges. For these graphs, it is fair that the importance (score) of a pattern is determined not only by the number of its appearances, but also by the weights on the nodes/edges of those appearances. Scoring functions based on the weights do not generally satisfy the apriori property, which guarantees that the number of appearances of a pattern cannot be larger than the frequency of any of its sub-patterns, and hence allows faster pruning. Therefore, existing approaches employ other, less efficient, pruning strategies. The problem becomes even more challenging in the case of multiple weighting functions that assign different weights to the same nodes/edges. In this work we propose a new family of scoring functions that respects the apriori property, and thus can rely on effective pruning strategies. We provide efficient and effective techniques for mining patterns in multi-weighted graphs, and we devise both an exact and an approximate solution. In addition, we propose a distributed version of our approach, which distributes the appearances of the patterns to examine among multiple workers. Extensive experiments on both real and synthetic datasets prove that the presence of edge weights and the choice of scoring function affect the patterns mined, and the quality of the results returned to the user. Moreover, we show that, even when the performance of the exact algorithm degrades because of an increasing number of weighting functions, the approximate algorithm performs well and with fairly good quality. Finally, the distributed algorithm proves to be the best choice for mining large and rich input graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 10

Distributed Top-k Pattern Mining

A Highly Modular Architecture for Canned Pattern Selection Problem

Pattern Extraction from Graphs and Beyond

Notes

https://developers.google.com/Freebase/data.
https://jmcauley.ucsd.edu/data/amazon/.
https://github.com/ehab-abdelhamid/GraMi.
https://github.com/lady-bluecopper/ReSuM.
www.crowdflower.com.
For this experiment we kept a single edge-weighting function, and parameter \({\alpha }=0.05\), with \({\tau }=6000\) for Freebase-O, and \({\tau }=100\) for CiteSeer.

References

Abdelhamid, E., Abdelaziz, I., Kalnis, P., Khayyat, Z., Jamour, F.: Scalemine: Scalable parallel frequent subgraph mining in a single large graph. In: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 716–727 (2016)
Aggarwal, C.C.: Recommender Systems. Springer, Berlin (2016)
Book Google Scholar
Aluç, G., Hartig, O., Özsu, M. T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: International Semantic Web Conference, pp. 197–212. Springer, Heidelberg (2014)
Babu, N., John, A.: A distributed approach to weighted frequent subgraph mining. In: International Conference on Emerging Technological Trends, pp. 1–7 (2016)
Bagan, G., Bonifati, A., Ciucanu, R., Fletcher, G.H.L., Lemay, A., Advokaat, N.: gMark: schema-driven generation of graphs and queries. IEEE Trans. Knowl. Data Eng. 29(4), 856–869 (2017)
Article Google Scholar
Bandari, D., Xiang, S., Leskovec, J.: Categorizing user sessions at pinterest. arXiv:1703.09662 (2017)
Bogdanov, P., Mongiovì, M., Singh, A.K.: Mining heavy subgraphs in time-evolving networks. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 81–90. IEEE (2011)
Bringmann, B., Nijssen, S.: What is frequent in a single graph? In: PAKDD, pp. 858–863 (2008)
Chen, Y., Zhao, X., Lin, X., Wang, Y.: Towards frequent subgraph mining on single large uncertain graphs. In: 2015 IEEE International Conference on Data Mining, pp. 41–50 (2015)
Costello, J.C., Dalkilic, M.M., Beason, S.M., Gehlhausen, J.R., Patwardhan, R., Middha, S., Eads, B.D., Andrews, J.R.: Gene networks in drosophila melanogaster: integrating experimental data to predict gene function. Genome Biol. 10(9), R97 (2009)
Article Google Scholar
De Raedt, L., Zimmermann, A.: Constraint-based pattern set mining. In: SDM, pp. 237–248 (2007)
Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P.: Grami: frequent subgraph and pattern mining in a single large graph. PVLDB 7(7), 517–528 (2014)
Google Scholar
Fiedler, M., Borgelt, C.: Subgraph support in a single large graph. In: ICDM workshops, pp. 399–404 (2007)
Geng, R., Dong, X., Zhang, P., Xu, W.: Wtmaxminer: efficient mining of maximal frequent patterns based on weighted directed graph traversals. In: CCIS, pp. 1081–1086 (2008)
Greenwald, M., Khanna, S.: Space-efficient online computation of quantile summaries. ACM SIGMOD Rec. 30, 58–66 (2001)
Article Google Scholar
He, R., McAuley, J.: Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. University of California, San Diego (2016)
Google Scholar
Holder, L.B., Cook, D.J., Djoko, S. et al.: Substucture discovery in the subdue system. In: KDD Workshop, pp. 169–180 (1994)
Huan, J., Wang, W., Prins, J., Yang, J.: Spin: mining maximal frequent subgraphs from graph databases. In: SIGKDD, pp. 581–586 (2004)
Huan, J., Bandyopadhyay, D., Wang, W., Snoeyink, J., Prins, J., Tropsha, A.: Comparing graph representations of protein structure for mining family-specific residue-based packing motifs. J. Comput. Biol. 12(6), 657–671 (2005)
Article Google Scholar
Jamil, S., Khan, A., Halim, Z., Baig, A.R.: Weighted muse for frequent sub-graph pattern finding in uncertain dblp data. In: 2011 International Conference on Internet Technology and Applications, pp. 1–6 (2011)
Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., Banich, B.: Knowledge discovery from transportation network data. In: ICDE, pp. 1061–1072 (2005)
Jiang, H., Wang, H., Yu, P.S., Zhou, S.: Gstring: a novel approach for efficient search in graph databases. In: ICDE, pp. 566–575 (2007)
Jiang, C., Coenen, F., Zito, M.: Frequent sub-graph mining on edge weighted graphs. In: DAWAK, pp. 77–88 (2010)
Jin, X., Wang, C., Luo, J., Yu, X., Han, J.: Likeminer: a system for mining the power of ’like’ in social media networks. In: KDD, pp. 753–756 (2011)
Kanehisa, M., Goto, S.: Kegg: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)
Article Google Scholar
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: ICDM, pp. 313–320 (2001)
Kuramochi, M., Karypis, G.: Finding frequent patterns in a large sparse graph. DMKD 11(3), 243–271 (2005)
MathSciNet Google Scholar
Li, J., Zou, Z., Gao, H.: Mining frequent subgraphs over uncertain graph databases under probabilistic semantics. VLDBJ 21(6), 753–777 (2012)
Article Google Scholar
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MathSciNet Google Scholar
Mackworth, A.K.: Consistency in networks of relations. Artif. Intell. 8(1), 99–118 (1977)
Article MathSciNet Google Scholar
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146. ACM (2010)
McSherry, F., Isard, M., Murray, D.G.: Scalability! but at what cost? In: HotOS, vol. 15, pp. 14–14. Citeseer (2015)
Mottin, D., Lissandrini, M., Velegrakis, Y., Palpanas, T.: Exemplar queries: a new way of searching. VLDB J. 25, 741–765 (2016)
Article Google Scholar
Newman, M.E.: Analysis of weighted networks. Phys. Rev. E 70(5), 056131 (2004)
Article Google Scholar
Noble, C.C., Cook, D.J.: Graph-based anomaly detection. In: SIGKDD, pp. 631–636 (2003)
Papapetrou, O., Ioannou, E., Skoutas, D.: Efficient discovery of frequent subgraph patterns in uncertain graph databases. In: Proceedings of the 14th International Conference on Extending Database Technology, pp. 355–366 (2011)
Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining access patterns efficiently from web logs. In: PAKDD, pp. 396–407 (2000)
Preti, G., Lissandrini, M., Mottin, D., Velegrakis, Y.: Beyond frequencies: graph pattern mining in multi-weighted graphs. In: Proceedings of the 21th International Conference on Extending Database Technology, EDBT (2018)
Shaw, M.J., Subramaniam, C., Tan, G.W., Welge, M.E.: Knowledge management and data mining for marketing. Decis. Support Syst. 31, 127–137 (2001)
Article Google Scholar
Silva, A., Meira Jr., W., Zaki, M.J.: Mining attribute-structure correlated patterns in large attributed graphs. PVLDB 5(5), 466–477 (2012)
Google Scholar
Song, Q., Wu, Y., Dong, X.L.: Mining summaries for knowledge graph search. In: ICDM, pp. 1215–1220 (2016)
Steinbach, M., Ertöz, L., Kumar, V.: The challenges of clustering high dimensional data. In: New Directions in Statistical Physics, pp. 273–309 (2004)
Teixeira, C.H., Fonseca, A.J., Serafini, M., Siganos, G., Zaki, M.J., Aboulnaga, A.: Arabesque: a system for distributed graph mining. In: Proceedings of the 25th Symposium on Operating Systems Principles, pp. 425–440. ACM (2015)
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Article Google Scholar
Vanetik, N., Shimony, S.E., Gudes, E.: Support measures for graph data. Data Min. Knowl. Discov. 13(2), 243–260 (2006)
Article MathSciNet Google Scholar
Wang, H., Aggarwal, C.C.: A survey of algorithms for keyword search on graph data. In: Managing and Mining Graph Data, pp. 249–273 (2010)
Wu, D., Ren, J., Sheng, L.: Uncertain maximal frequent subgraph mining algorithm based on adjacency matrix and weight. Int. J. Mach. Learn. Cybern. (2017). https://doi.org/10.1007/s13042-017-0655-y
Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: ICDM, pp. 721–724 (2002)
Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: SIGMOD, pp. 335–346 (2004)
Yang, J., Su, W., Li, S., Dalkilic, M.M.: Wigm: discovery of subgraph patterns in a large weighted graph. In: SDM, pp. 1083–1094 (2012)
Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Article Google Scholar
Zou, Z., Li, J., Gao, H., Zhang, S.: Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22(9), 1203–1218 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Trento, Trento, Italy
Giulia Preti & Yannis Velegrakis
Aalborg University, Aalborg, Denmark
Matteo Lissandrini
Aarhus University, Aarhus, Denmark
Davide Mottin

Authors

Giulia Preti
View author publications
You can also search for this author in PubMed Google Scholar
Matteo Lissandrini
View author publications
You can also search for this author in PubMed Google Scholar
Davide Mottin
View author publications
You can also search for this author in PubMed Google Scholar
Yannis Velegrakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giulia Preti.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The current paper is an extended version of a recent EDBT’18 article [38].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Preti, G., Lissandrini, M., Mottin, D. et al. Mining patterns in graphs with multiple weights. Distrib Parallel Databases 39, 281–319 (2021). https://doi.org/10.1007/s10619-019-07259-w

Download citation

Published: 18 February 2019
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10619-019-07259-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining patterns in graphs with multiple weights

Abstract

Access this article

Similar content being viewed by others

Distributed Top-k Pattern Mining

A Highly Modular Architecture for Canned Pattern Selection Problem

Pattern Extraction from Graphs and Beyond

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining patterns in graphs with multiple weights

Abstract

Access this article

Similar content being viewed by others

Distributed Top-k Pattern Mining

A Highly Modular Architecture for Canned Pattern Selection Problem

Pattern Extraction from Graphs and Beyond

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation