Using Synthetic Networks for Parameter Tuning in Community Detection
Community detection is one of the most important and challenging problems in network analysis. However, real-world networks may have very different structural properties and communities of various nature. As a result, it is hard (or even impossible) to develop one algorithm suitable for all datasets. A standard machine learning tool is to consider a parametric algorithm and choose its parameters based on the dataset at hand. However, this approach is not applicable to community detection since usually no labeled data is available for such parameter tuning. In this paper, we propose a simple and effective procedure allowing to tune hyperparameters of any given community detection algorithm without requiring any labeled data. The core idea is to generate a synthetic network with properties similar to a given real-world one, but with known communities. It turns out that tuning parameters on such synthetic graph also improves the quality for a given real-world network. To illustrate the effectiveness of the proposed algorithm, we show significant improvements obtained for several well-known parametric community detection algorithms on a variety of synthetic and real-world datasets.
KeywordsCommunity detection Parameter tuning Hyperparameters LFR benchmark
This study was funded by the Russian Foundation for Basic Research according to the research project 18-31-00207 and Russian President grant supporting leading scientific schools of the Russian Federation NSh-6760.2018.1.
- 1.Adamic, L.A., Glance, N.: The political blogosphere and the 2004 us election: divided they blog. In: Proceedings of the 3rd International Workshop on Link Discovery, pp. 36–43. ACM (2005)Google Scholar
- 10.Golovin, D., Solnik, B., Moitra, S., Kochanski, G., Karro, J., Sculley, D.: Google vizier: a service for black-box optimization. In: International Conference on Knowledge Discovery and Data Mining, pp. 1487–1495. ACM (2017)Google Scholar
- 18.Miasnikof, P., Prokhorenkova, L., Shestopaloff, A.Y., Raigorodskii, A.: A statistical test of heterogeneous subgraph densities to assess clusterability. In: 13th LION Learning and Intelligent OptimizatioN Conference. Springer (2019)Google Scholar
- 22.Newman, M.: Community detection in networks: modularity optimization and maximum likelihood are equivalent. arXiv preprint arXiv:1606.02319 (2016)
- 24.Prokhorenkova, L., Tikhonov, A.: Community detection through likelihood optimization: in search of a sound model. In: The World Wide Web Conference, pp. 1498–1508. ACM (2019)Google Scholar
- 25.Snoek, J., et al.: Scalable Bayesian optimization using deep neural networks. In: International Conference on Machine Learning, pp. 2171–2180 (2015)Google Scholar
- 26.Šubelj, L., Bajec, M.: Model of complex networks based on citation dynamics. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 527–530. ACM (2013)Google Scholar