Evaluations of Similarity Measures on VK for Link Prediction
 415 Downloads
Abstract
Recommender system is one of the most important components for many companies and social networks such as Facebook and YouTube. A recommendation system consists of algorithms which allow to predict and recommend friends or products. This paper studies to facilitate finding likeminded people with same interests in social networks. In our research, we used real data from the most popular social network in Russia, VK (Vkontakte). The study is motivated on the assumption that similarity breeds connection. We evaluate wellknown similarity measures in the field on our collected VK datasets and find limited performance results. The result shows that majority of users in VK tend not to add possible users with whom they have common acquaintances. We also propose a topologybased similarity measure to predict future friends. Then, we compare our results with the results of other wellknown methods and discuss differences.
Keywords
Recommender system Graph theory Link prediction Similarity1 Introduction
Social networks help people to interact through Internet. Nowadays social network services allow to share interests via texts, music, video, etc. Clearly, people want to get more information and new contacts as fast as possible, and the information should be relevant to their preferences.
Link recommendation has become one of the most important features in online social networks and has been an active research area [3, 8, 9, 10, 14, 16]. There are wellknown examples of link recommendation such as “People You May Want to Hire” on LinkedIn, “You May Know” on Google+ and “People You May Know” on Facebook. Given the tremendous academic and practical interests in link recommendation [6], we examined existing approaches and applied recommendation engine for VK social network.
We especially focus on the most popular similarity measures which are used in the literature, namely, cosine similarity, common neighbors, Jaccard similarity and Adamic–Adar. We show the performance of each metric using confusion matrix plus F1 measure which is the standard baselines. We examine each metric on a set of small online social networks as well as collected VK datasets.
A social network is a graph as a data structure, the users are nodes, and the users’ friendships (relations) are edges [11]. We aim to understand factors which may affect the emergence of new edges and try to predict future connections in social networks. In Sect. 2, we introduce some of major existing methods on link prediction. In Sect. 3, we present existing methods which we test and compare our results with on VK datasets. In Sect. 4, we describe in detail about our experiment settings and datasets which we collected and preprocess. Section 5, then, presents how we can evaluate the results as well as the evaluation of the most wellknown methods. In Sect. 6, we show our results and discuss implications. Finally, in Sect. 7, we conclude our research.
2 Related Work
Wu et al. [17] study understanding users’ temporal behaviors in social network platforms. The evolution of social network services is driven by the interplay between users’ preferences and social network structures. Authors argue that users’ future preference behavior is affected by the network around them and the homophily effect. In [1], authors demonstrate the developed Social Poisson Factorization (SPF), a probabilistic model that incorporates social network information into a traditional factorization method. SPF introduces the social aspect of algorithmic recommendation. They develop a scalable algorithm for analyzing data with SPF, and demonstrate that it outperforms competing methods on six realworld datasets.
Zhang et al. [18] studied the social influence problem in a large microblogging network, Weibo.com.^{1} They investigate (re)tweet behaviors of users by considering ego network of users. In our proposed method, we expand another step by considering second level degrees of users. Kooti et al. [5] demonstrate online consumers’ behavior and try to explain ways to improve ad (advertising) targeting systems. Researchers use information in emails such as purchase logs and communications between users to find patterns and their interaction. For example, authors measured the effect of gender and age which showed that a female email user is more likely to be an online shopper than an average male email user. Also the spending ability goes up with the age until age of 30, then stabilizes in the early 60s and starts to drop afterward. Such findings help to predict future customers’ behavior and make purchases more pleasant for consumers.
Measuring similarity between nodes is the main task in link prediction problem. In [15], authors constructed a new way to measure similarity between nodes based on gametheoretic interaction index. The basic form of the interaction index is built upon two solution concepts from game theory: the Shapley value and the Banzhaf index. It is also generalized to a wider class of solution concepts: Shapley value and Banzhaf index. Authors showed that using their approach, it is possible to improve existing results in link prediction and community detection problems.
In this paper, we consider link prediction problems, and more precisely, we discuss similarity measures for link prediction problems. One of the wellaccepted hypotheses is that similar users will become future friends. Therefore, it is essential to measure similarity between users so that future links can be accurately predicted. As we will discuss throughout the paper, there exist many metrics to measure the similarity. Once the similarity is measured, there are many ways to predict future connections. One intuitive way is to have a threshold and predict users who have similarities greater than the given threshold. Other simple ways include predicting top k users and top l% of users. More advanced techniques include learning how much to predict using machine learning algorithms.
2.1 VKontakte
Any user in VK has a profile which contains various information. First and last names are mandatory fields, and other data such as birthday, city and interests are optional. Figure 1 is an example of a profile of the social network. In addition, users can share interesting information via posting on the profile wall as well as making repost from other users or groups,^{5} and each post may contain attachments—documents or media files.
VK has open APIs for application developments. It allows developers to access the server through requests. The APIs provide the ability to access some user information with his/her consent, such as photographs, friends, profile and wall.
3 Methods
The link prediction problem is connected with network structure. In our research, we use similarity between users using information from their second level degrees.
3.1 Problem Statement
Given a snapshot of a social network \(G^t=(V^t,E^t)\) where V is a set of users and E is a set of relationships, we aim to find a relationship \((v_i, v_j)\notin E^t\) and \((v_i, v_j)\in E^{t'}\) where \(t<t'\).
3.2 Second Common Neighbor Similarity
4 Experiments
Ideally, the evaluation of link prediction algorithms should take place every timestamp to check the performance and adjust the algorithm if necessary. However, such an approach takes a long period of time and it is hard to track of incoming and outgoing users from the initial network. Therefore, it is common practice to delete a small portion of edges from the network as if they do not exist and then try to predict the deleted edges. This static approach has possible flaws since the similarity between users is not static; thus, deleted edges are easier to predict than future edges.
In the experiment, we compare the common practice with the real link prediction based on actual future links without deleting edges. To be able to evaluate the accuracy, we collected datasets from the same set of users in different time intervals to build a learning model and then compare the predictions.
4.1 Collection of Dataset
VK networks with different time stamps
Date  Number of nodes  Number of edges  Average degree 

23.12.2016  942469  1475811  3.1318 
24.01.2017  950503  1490873  3.1370 
23.02.2017  936847  1465214  3.1280 
24.03.2017  939985  1469854  3.1274 
VK networks after preprocessing
Date  Number of nodes  Number of edges  Average degree  Excess nodes 

23.12.2016  190,550  712,014  7.4733  7412 
24.01.2017  190,550  716,478  7.5201  5459 
23.02.2017  190,550  712,196  7.4752  1083 
24.03.2017  190,550  713,030  7.4839  340 
4.2 Implementation
We used Python programming language and developed two variants of the implementation. In the first variant, we used Pandas library to collect and process the dataset, and then, we wrote codes for the similarity measures, i.e., cosine similarity, common neighbors, Jaccard similarity and Adamic–Adar index.

Python 3.5.2
Python is a highlevel generalpurpose programming language, focused on improving developer productivity and code readability. The syntax of the Python kernel is minimal, and at the same time, the standard library includes a large amount of useful functions.

IPython 5.1.0
IPython is an interactive computational environment for the Python programming language, in Jupyter Notebook 4.3.1 shell. It is an opensource program and one of the popular tools for data scientists.
4.2.1 Python Libraries

Pandas
Pandas library makes Python a powerful tool for data analysis. It is an opensource, easytouse, BSDlicensed. The package allows to build summary tables, performs a grouping, provides easy access to tabular data and has robust IO tools for loading and saving data from flat files (CSV and delimited). The library is well suited for different kinds of data: forms of statistical or observational datasets, arbitrary matrix data with row and column labels, and it might be heterogeneous or homogeneously typed, tabular data like Excel spreadsheet. Pandas package provides high performance because many lowlevel algorithms are tweaked in Cython code. Cython language is a superset of Python that allows the compiler to generate very efficient C codes from Python.

Matplotlib
Matplotlib is a plotting library for the Python, which produces publication quality figures in a variety of formats. It allows to draw graphics, plots, histograms, power spectra, bar charts, error charts and scatterplots on the resulting data sets.

Scikitlearn
Scikitlearn is a free, BSDlicensed machine learning library for Python. It features various classification, regression and clustering algorithms including support vector machines and random forests. The library is built on SciPy (fundamental library for scientific computing and other useful tools) and NumPy (base ndimensional array package)

Networkx
NetworkX is a free, BSDlicensed library for the creation, manipulation and study of the structure, dynamics and functions of graphs and complex networks in Python.
5 Evaluations
 Precision (7) is the ratio of a number of events that are correctly predicted to a number of all predicted events.$$\begin{aligned} {\rm Precision} = \frac{{\rm tp}}{{\rm tp} + {\rm fp}} \end{aligned}$$(7)
 Recall (8) is the ratio of a number of the events that are correctly predicted to a number of all correct events.$$\begin{aligned} {\rm Recall} = \frac{{\rm tp}}{{\rm tp} + {\rm fn}} \end{aligned}$$(8)
 Accuracy (9) is the ratio of correctly predicted events. Perhaps, it is the most intuitive performance measure, but it is only good for symmetric datasets where false positives and false negatives are roughly the same.$$ {{\rm Accuracy} = \frac{{\rm tp} + {\rm tn}}{N}} $$(9)$$\begin{aligned}&{\text { where }}\,N \, (10)\,{\hbox{ is sum of rates:}} \\&\quad N = {\rm fn} + {\rm fp} + {\rm tn} + {\rm tp} \end{aligned}$$(10)
 \(F_{1}\,{\rm score}\) (11) is harmonic average of precision and recall.$$\begin{aligned} F_{1}\,{\rm score} = 2 \times \frac{{\rm Precision} \times {\rm Recall}}{{\rm Precision} + {\rm Recall}} \end{aligned}$$(11)

true positives (tp) are the categories that we expected to see and received at the output of the classifier;

false negatives (fn) are the categories, which we expected to see, but the classifier did not define them, also known type II error;

false positives (fp) are the categories, which should not be output, but the classifier returned them erroneously at the output, also known type I error ;

true negatives (tn) are the categories, which should not be output, and they are also correctly absent at the output of the classifier.
5.1 Evaluation of Existing Methods
To have a base line, we started from the most commonly used measures of similarity [13] which were introduced in the previous section, cosine (1) similarity, common neighbors (2), Jaccard (3) similarity and Adamic–Adar (4).
5.1.1 Evaluation of Existing Methods on Facebook
To make evaluations, we used Facebook dataset^{6} which contains 4039 nodes and 88,234 edges. To predict connections, 30% of the edges were deleted and after deleting, the number of edges became 61,764. One of the questions that we need to answer is how many edges we should predict? The first approach we tried is that the similarity index (cosine, Jaccard and Adamic–Adar) should be greater than zero which led to very poor results. Then, we added a threshold value which decides if a future link should be predicted or not.
Evaluation measures for Facebook dataset
Common neighbor  Cosine  Jaccard  Adamic–Adar  

Precision  0.009341  0.006901  0.009341  0.010108 
Recall  0.040130  0.346467  0.040130  0.037982 
Accuracy  0.966173  0.746648  0.966173  0.969709 
F1 score  0.015155  0.013532  0.015155  0.015967 
To examine popular similarity measures to predict links, we used data from Vkontakte (VK) social network. First, we processed Facebook dataset. The average number of users’ friends is 23 in Facebook dataset and 246 in VK dataset. To efficiently process the big data, we have reduced from 10,728 nodes (2,332,068 edges) to 4000 user so that it contains 4000 nodes and 1,122,226 edges. To test the common practice, we deleted 30% edges and then try to predict the deleted ones. After deleting 30% of the edges, the number of edges was 784,499.
Evaluating measures table for VK (baseline)
Common neighbor  Cosine  Jaccard  Adamic–Adar  

Precision  0.002936  0.019121  0.002936  0.014459 
Recall  0.008744  0.171432  0.008744  0.005208 
Accuracy  0.837793  0.959589  0.837793  0.822696 
F1 score  0.004395  0.0005134  0.004395  0.007658 
As shown in the table, the quality of predictions for VK network is lower than for Facebook dataset. One of the reasons is that VK network is very sparse and the relationships are concentrated in small regions. The results of the VK dataset show that Adamic–Adar similarity index is the best evaluating measure according to F1score and cosine for others.
5.2 Evaluation of Existing Methods on Different Datasets
Datasets
Dataset  Nodes  Edges  Avg. degree 

Facebook1  324  2218  13.6914 
Facebook1_del  320  1553  9.7063 
Last.fm  1892  12,717  13.4429 
Last.fm_del  1801  8902  9.8856 
GrQc  5242  14,496  5.5307 
GrQc_del  4729  10,147  4.2914 
HepTh  9877  25,998  5.2644 
HepTh_del  9020  18,199  4.0353 
CondMat  23133  93,497  8.0834 
CondMat_del  21982  65,448  5.9547 
Tables 6, 7, 8, 9, 10, 11, 12, 13, 14 and 15 illustrate performance of predictions.
5.3 A Small Dataset from Facebook
Illustration of prediction rates for the small dataset from Facebook
Cosine  

(a) Cosine rates  
tp  102 
fn  22 
tn  8337 
fp  60,167 
Jaccard  

(b) Jaccard rates  
tp  89 
fn  35 
tn  56,306 
fp  9504 
Common  

(c) Common neighbor rates  
tp  89 
fn  35 
tn  56,306 
fp  9504 
Adamic  

(d) Adamic–Adar rates  
tp  59 
fn  65 
tn  58,243 
fp  74,874 
Evaluating similarity measures for the small dataset from Facebook
Precision  Recall  Accuracy  F1score  

Cosine  0.001692  0.822581  0.122967  0.003378 
Jaccard  0.009278  0.717742  0.855325  0.018318 
Common  0.009278  0.717742  0.855325  0.018318 
Adamic  0.007819  0.475806  0.885322  0.015385 
5.4 LastFM
Illustration of prediction rates of LastFM dataset
Cosine  

(a) Cosine rates  
tp  406 
fn  168 
tn  384,640 
fp  1,424,806 
Jaccard  

(b) Jaccard rates  
tp  176 
fn  398 
tn  1,569,014 
fp  124,157 
Common  

(c) Common neighbor rates  
tp  176 
fn  398 
tn  1,569,014 
fp  124,157 
Adamic  

(d) Adamic–Adar rates  
tp  136 
fn  438 
tn  1,588,450 
fp  104,017 
Evaluating similarity measures for LastFM dataset
Precision  Recall  Accuracy  F1score  

Cosine  0.000285  0.707317  0.212730  0.000570 
Jaccard  0.001416  0.306620  0.926462  0.002818 
Common  0.001416  0.306620  0.926462  0.002818 
Adamic  0.001306  0.236934  0.938303  0.002597 
5.5 GrQc
General relativity and quantum cosmology (GRQC) collaboration network is from the eprint arXiv and covers scientific collaborations among authors with papers submitted to General Relativity and Quantum Cosmology category.
Illustration of similarity rates for GrQc dataset
Cosine  

(a) Cosine rates  
tp  472 
fn  479 
tn  3,018,531 
fp  5,461,423 
Jaccard  

(b) Jaccard rates  
tp  321 
fn  630 
tn  7,853,685 
fp  29,385 
Common  

(c) Common neighbor rates  
tp  321 
fn  630 
tn  7,853,685 
fp  29,385 
Adamic  

(d) Adamic–Adar rates  
tp  147 
fn  804 
tn  7,861,336 
fp  21,129 
Evaluating measures table for GrQc dataset
Precision  Recall  Accuracy  F1score  

Cosine  0.000086  0.496320  0.355977  0.000173 
Jaccard  0.010806  0.337539  0.996193  0.020941 
Common  0.010806  0.337539  0.996193  0.020941 
Adamic  0.006909  0.154574  0.997218  0.013227 
5.6 HepTh
Illustration of various similarity rates for HepTh dataset
Cosine  

(a) Cosine rates  
tp  1073 
fn  869 
tn  10,804,371 
fp  21,680,233 
Jaccard  

(b) Jaccard rates  
tp  681 
fn  1261 
tn  29,960,801 
fp  82,874 
Common  

(c) Common neighbor rates  
tp  681 
fn  1261 
tn  29,960,801 
fp  82,874 
Adamic  

(d) Adamic–Adar rates  
tp  319 
fn  1623 
tn  29,988,844 
fp  53,063 
Evaluating measures table for HepTh dataset
Precision  Recall  Accuracy  F1score  

Cosine  0.000049  0.552523  0.332613  0.000099 
Jaccard  0.008150  0.350669  0.997200  0.015930 
Common  0.008150  0.350669  0.997200  0.015930 
Adamic  0.005976  0.164264  0.998180  0.011532 
5.7 CondMat
Illustration of similarity rates for CondMat dataset
Cosine  

(a) Cosine rates  
tp  3817 
fn  2323 
tn  59,017,890 
fp  190,840,644 
Jaccard  

(b) Jaccard rates  
tp  2820 
fn  3320 
tn  230,958,602 
fp  627,368 
Common  

(c) Common neighbor rates  
tp  2820 
fn  3320 
tn  230,958,602 
fp  627,368 
Adamic  

(d) Adamic–Adar rates  
tp  1610 
fn  4530 
tn  231,080,071 
fp  499,409 
Evaluating measures table for CondMat dataset
Precision  Recall  Accuracy  F1score  

Cosine  0.000020  0.621661  0.236215  0.000040 
Jaccard  0.004475  0.459283  0.997277  0.008863 
Common  0.004475  0.459283  0.997277  0.008863 
Adamic  0.003213  0.262215  0.997824  0.006349 
Based on the performance results that we showed for five different datasets, we can observe how different measures behave on different datasets. Generally, what we want to have is high TP and low FP. In reality, however, when TP grows FP also grows. The same applies to TN and FN. Therefore, at least for these five datasets that we examined here, cosine similarity is not a good metric for link prediction problems since it results in high false positives. Although it has the highest TP rate among the four similarity measures under examination, it is only at the cost of high FP. We carefully suggest that, according to the results, Jaccard similarity is the most successful measure among the four, since it stably shows high TP rate with low FP as well as high TN and low FN. Adamic–Adar similarity sometimes has better performance than Jaccard similarity, but it is not as stable since sometimes it shows poor TP rate.
6 Results
Obtained results implies that it is possible to predict future connections at the price of false positives. Also the increasing size of a network (number of nodes) implies the exponential increase of TN. To overcome such shortcomings, we improvised new approaches.
6.1 Next Steps of Prediction on VK
The next step of our predictions using VK dataset was the assumption that not all user are “useful” and we should exclude them from our dataset. The assumption of “Usefulness” concerns that some profiles are not profiles of an ordinary user, but a company profile or a famous person (like singers), some profiles are not may be companies or popular people and in both cases they do not need to have recommendations otherwise the mean and the median of users in our dataset are about 240 friends also based on Dunbar’s number [2]; therefore, we delete users who have more than 500 users in their friend list.
Evaluating measures table for VK (deleting approach)
Cosine  Jaccard  Adamic–Adar  

Precision  0.0004111  0.0053608  0.0289651 
Recall  0.0428219  0.0403065  0.0016984 
Accuracy  0.0085148  0.9190919  0.8309117 
F1 score  0.0008144  0.0094629  0.0032087 
Table 16 illustrates evaluations of similarity measures: precision, recall, accuracy and F1score. We found that Jaccard similarity has the best values in precision and F1score.
6.1.1 Wall Analysis
6.1.2 Real Data
Evaluating measures table for VK (real data comparisons)
Cosine  Jaccard  Adamic–Adar  

Precision  0.0000038  0.0000297  0.0001549 
Recall  0.0001232  0.0001102  0.0000779 
Accuracy  0.0030727  0.8596805  0.9532686 
F1 score  0.0000073  0.0000468  0.0001037 
6.2 Proposed Measure
 1.
According to the evaluation further we pay attention only to TP, FP and FN rates, without using precision, recall, F1score and accuracy.
 2.
Also we try to limit the number of predictions and select only the highest values of similarity to predict future connections.
 3.
In Sect. 6.1.1, we describe results of the approach using users’ walls to find similarity between users.
6.3 Evaluation of Other Approaches
Tables 18, 19 and 20 illustrate results of the similarity measures when we limited the number of predictions by selecting the top 2% from the best similarity indexes for 1000 randomly selected users.
Illustration of various rates (data from January)
Jaccard  

(a) Jaccard rates  
tp  0 
fn  534 
fp  1006 
Adamic  

(b) Adamic–Adar rates  
tp  0 
fn  534 
fp  1006 
Second  

(c) Second neighbor rates  
tp  0 
fn  4608 
fp  1006 
Second  

(d) Shortest path rates  
tp  0 
fn  4800 
fp  1006 
Illustration of various rates (data from February 2017)
Jaccard  

(a) Jaccard rates  
tp  0 
fn  534 
fp  1252 
Adamic  

(b) Adamic–Adar rates  
tp  0 
fn  534 
fp  1252 
Second  

(c) Second neighbor rates  
tp  0 
fn  4608 
fp  1252 
Shortest  

(d) Shortest path rates  
tp  0 
fn  4800 
fp  1252 
Illustration of various rates (data from March 2017)
Jaccard  

(a) Jaccard rates  
tp  0 
fn  534 
fp  1567 
Adamic  

(b) Adamic–Adar rates  
tp  0 
fn  534 
fp  1567 
Second  

(c) Second neighbor rates  
tp  0 
fn  4608 
fp  1567 
Shortest  

(d) Shortest path rates  
tp  0 
fn  4800 
fp  1567 
6.3.1 Top 25
Illustration of various rates (data from January 2017)
Jaccard  

(a) Jaccard rates  
tp  2 
fn  24,336 
fp  397 
Adamic  

(b) Adamic–Adar rates  
tp  2 
fn  24,336 
fp  397 
Second  

(c) Second Neighbor rates  
tp  2 
fn  211,130 
fp  397 
Shortest  

(d) Shortest path rates  
tp  2 
fn  219,836 
fp  397 
Illustration of various rates (data from February 2017)
Jaccard  

(a) Jaccard rates  
tp  3 
fn  24,335 
fp  459 
Adamic  

(b) Adamic–Adar rates  
tp  3 
fn  24,335 
fp  459 
Second  

(c) Second neighbor rates  
tp  3 
fn  211,129 
fp  459 
Shortest  

(d) Shortest path rates  
tp  3 
fn  219,835 
fp  459 
Illustration of various rates for 25% top predictions (data from March 2017)
Jaccard  

(a) Jaccard rates  
tp  4 
fn  24,334 
fp  535 
Adamic  

(b) Adamic–Adar rates  
tp  4 
fn  24,334 
fp  535 
Second  

(c) Second neighbor rates  
tp  4 
fn  211,128 
fp  535 
Shortest  

(d) Shortest path rates  
tp  4 
fn  219,834 
fp  535 
6.4 Comparison of Second Neighbor Approach
Illustration of various rates (data from the Facebook dataset)
Jaccard  

(a) Jaccard rates  
tp  1776 
fn  105,520 
fp  10,212 
Adamic  

(b) Adamic–Adar rates  
tp  1776 
fn  105,520 
fp  10,212 
Second  

(c) Second neighbor rates  
tp  1791 
fn  545,865 
fp  10,197 
Shortest  

(d) Shortest path rates  
tp  1791 
fn  922,919 
fp  10,197 
7 Conclusion and Future Work
There exist many different types of link prediction approaches, and they can be roughly categorized into learningbased methods and proximitybased methods. In our research, we considered proximitybased methods, namely, cosine similarity, common neighbor, Jaccard similarity, Adamic–Adar index, second neighbor and shortest path. All of them are structural proximitybased methods which does not consider nodal information. We have thoroughly tested the measures on different datasets. Many existing studies make prediction only for randomly deleted edges due to lack of real datasets. We collected our own snapshots of graphs from VK social network within 4 months. To facilitate the collected dataset for prediction, we filtered users that have too many links and also maintained the same users for each snapshot since we predict links but not new users in this study.
The results do not conclude which approach is superior in most of the cases. The prediction method to use for real networks should be chosen with heuristics given the nature of the network. In this study, we showed that there is a great difference with the same measures in VK and Facebook datasets.
In [4], authors discuss on the problem of current link prediction approaches. We have pointed out that evaluating link predictions based on deleted edges is not suitable considering the dynamics of social networks. In addition, links are not only added but also deleted over time. Since this fact also affects the topological similarity measures, it is not effective to predict future links only based on the current snapshot of the network. We plan to work toward dynamics of networks to predict links with less false positives.
Footnotes
 1.
The most popular Chinese microblogging service.
 2.
SimilarWeb Ltd. is a digital market intelligence company that provides web analytics, data analysis and business intelligence services for international corporations. It uses large data processing technologies to collect, measure, analyze and provide insights on behavioral patterns and statistics for the involvement of users from websites and mobile applications.
 3.
“Audience VKontakte” www.liveinternet.ru. Retrieved January 2017.
 4.
“List of VK users” Vk.com. Retrieved January 2017.
 5.
The VK group is a community where people can communicate with each other, exchanging ideas and proposals.
 6.
References
 1.Chaney AJB, Blei DM, EliassiRad T (2015) A probabilistic model for using social networks in personalized item recommendation. In: Proceedings of the 9th ACM conference on recommender systems, RecSys 2015, Vienna, Austria, Sept 16–20, pp 43–50Google Scholar
 2.Dunbar RIM (1992) Neocortex size as a constraint on group size in primates. J Hum Evolut 22(6):469–493CrossRefGoogle Scholar
 3.Hussain R, Nawaz W, Lee JY, Son J, Seo JT (2016) A hybrid trust management framework for vehicular social networks. In: International conference on computational social networks. Springer, pp 214–225Google Scholar
 4.Junuthula RR, Xu KS, Devabhaktuni VK (2016) Evaluating link prediction accuracy in dynamic networks with added and removed edges. In: 2016 IEEE international conferences on big data and cloud computing (BDCloud), social computing and networking (SocialCom), sustainable computing and communications (SustainCom) (BDCloudSocialComSustainCom), pp 377–384Google Scholar
 5.Kooti F, Lerman K, Aiello LM, Grbovic M, Djuric N, Radosavljevic V (2015) Portrait of an online shopper: understanding and predicting consumer behavior. CoRR, abs/1512.04912Google Scholar
 6.Lebedev A, Lee JY, Rivera V, Mazzara M (2017) Link prediction using topk shortest distances, LNCS. SpringerGoogle Scholar
 7.Lee JY, Oh JC (2017) Agent perspective social networks: distributed second degree estimation. Encyclopedia of social network analysis and mining, pp 1–12Google Scholar
 8.Lee JY (2014) Reputation computation in social networks and its applicationsGoogle Scholar
 9.Lee JY, Duan Y, Oh JC, Du W, Blair H, Wang L, Jin X (2011) Automatic reputation computation through document analysis: a social network approach. In: 2011 international conference on advances in social networks analysis and mining (ASONAM), pp 559–560. IEEEGoogle Scholar
 10.Lee JY, Duan Y, Oh JC, Du W, Blair H, Wang L, Jin X (2012) Social network based reputation computation and document classification. J UCS 18(4):532–553Google Scholar
 11.Lee JY, Lopatin K, Hussain R, Nawaz W (2017) Evolution of friendship: a case study of mobiclique. In: Proceedings of the computing frontiers conference. ACM, pp 267–270Google Scholar
 12.Lee JY, Oh JC (2014) Estimating the degrees of neighboring nodes in online social networks. In: International conference on principles and practice of multiagent systems. Springer, pp 42–56Google Scholar
 13.Li Z (Lionel), Fang X, Sheng ORL (2015) A survey of link recommendation for social networks: methods, theoretical foundations, and future research directions. CoRR, abs/1511.01868Google Scholar
 14.Solonets S, Drobny V, Rivera V, Lee JY (2017) Introducing ADegree: anonymisation of social networks through constraint programming, LNCS. SpringerGoogle Scholar
 15.Szczepanski PL, Barcz AS, Michalak TP, Rahwan T (2015) The game–theoretic interaction index on social networks with applications to link prediction and community detection. In: Proceedings of the twentyfourth IJCAI 2015, Buenos Aires, Argentina, July 25–31, pp 638–644Google Scholar
 16.Tigunova A, Lee JY, Nobari S (2015) Location prediction via social contents and behaviors: locationaware behavioral lda. In: 2015 IEEE international conference on data mining workshop (ICDMW). IEEE, pp 1131–1135Google Scholar
 17.Wu L, Ge Y, Liu Q, Chen E, Long B, Huang Z (2016) Modeling users’ preferences and social links in social networking services: a jointevolving perspective. In: Proceedings of the thirtieth AAAI, Feb 12–17, Phoenix, AZ, USA, pp 279–286Google Scholar
 18.Zhang J, Tang J, Li J, Liu Y, Xing C (2015) Who influenced you? predicting retweet via social influence locality. TKDD 9(3):25CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.