DDNE: Discriminative Distance Metric Learning for Network Embedding
- 1 Citations
- 825 Downloads
Abstract
Network embedding is a method to learn low-dimensional representations of nodes in networks, which aims to capture and preserve network structure. Most of the existing methods learn network embedding based on distributional similarity hypothesis while ignoring adjacency similarity property, which may cause distance bias problem in the network embedding space. To solve this problem, this paper proposes a unified framework to encode distributional similarity and measure adjacency similarity simultaneously, named DDNE. The proposed DDNE trains a siamese neural network which learns a set of non-linear transforms to project the node pairs into the same low-dimensional space based on their first-order proximity. Meanwhile, a distance constraint is used to make the distance between a pair of adjacent nodes smaller than a threshold and that of each non-adjacent nodes larger than the same threshold, which highlight the adjacency similarity. We conduct extensive experiments on four real-world datasets in three social network analysis tasks, including network reconstruction, attribute prediction and recommendation. The experimental results demonstrate the competitive and superior performance of our approach in generating effective network embedding vectors over baselines.
Keywords
Network embedding Social network Metric learning1 Introduction
In order to well preserve the structure of a given network, existing researches encode local proximity, and inherent properties to learn network embedding [1, 6, 8]. Typically, Node2vec, DeepWalk and Line [2, 7, 9] approximate nodes’ local proximity, including the first- and second-order proximity, via random walks or neural network models with specific objective functions. The essence is to learn the vector representation of a node by predicting its neighborhood, which is inspired by the word embedding principle. Based on this principle, the vector representation satisfies the distributional similarity property of network, i.e. nodes with similar neighborhoods are closer in the network embedding space.
In practical applications, there is another fundamental property of network besides distributional similarity, called adjacency similarity. adjacency similarity means that a pair of nodes are similar in some aspects. For example, in the link prediction task, node pairs with higher similarity are more likely to be considered as adjacent nodes. In the label propagation task, the adjacent nodes are considered sharing the common labels. So, adjacent nodes should be closer than non-adjacent ones in the network embedding space. However, in most of previous embedding learning methods, this adjacency similarity is ignored, which may generate distance bias in the network embedding space [4].
Figure 1 shows an example of what is the distance bias. In Fig. 1(a), node \(v_0\) and \(v_1\) share the same neighbors, but there is no link between them. In contrast, we add a link between node \(v_0\) and \(v_1\) in Fig. 1(b). As a result of previous method (taking DeepWalk as an example), the distance between \(v_0\) and \(v_1\) in Fig. 1(a) is smaller than that in Fig. 1(b) in the network embedding space (shown in Fig. 1(c)). However, if adjacency similarity is taken into consideration, the distance between \(v_0\) and \(v_1\) in Fig. 1(a) would be larger than that in Fig. 1(b) (shown in Fig. 1(e)). We call this inaccurate estimation of distance between two nodes as a distance bias problem.
To address the distance bias problem, we propose a novel node embedding method to simultaneously preserve the distributional similarity and adjacency similarity property of the network. This model consists of two modules: the Node Encoder and the Distance Metric-learner. For a given network, Node-encoder encodes the first-order proximity of the nodes using a neural network model. In the input layer, each node is represented as a sequence of its neighbors, and then it goes through multiple non-linear transformation in hidden layers. Because different neighbors contribute to similarity measurement differently, we adopt the attention mechanism to adaptively assign weight to different neighbors. The output is node embedding representation, and nodes with common neighbors will gain similar encoding. The Distance Metric-learner measures the distance of pair-wise node embedding vectors, which aims to assign the adjacent nodes-pair a smaller distance to highlight the adjacency similarity. For this purpose, we use a well-designed objective function to pull the node toward its neighbors, and push non-adjacent nodes further away. Based on this, the structure of the network would be preserved better in the embedding space.
To verify the effectiveness of our approach, we conduct experiments through network reconstruction, attribute prediction and recommendation tasks on four real-world datasets. We take five state-of-the-art embedding algorithms as comparative methods. The experimental results show that our approach is able to not only solve the distance bias problem, but also outperform comparative methods in all above tasks, especially in network reconstruction.
We analyze the distance bias problem in traditional network embedding methods, which is induced by disregarding the adjacency similarity property.
We propose a discriminative distance metric learning method to preserve the adjacency similarity property of networks and improve the effectiveness of node representations.
We evaluate our method on three tasks over four datasets and experimental results show that our approach achieves a significant improvement.
2 Proposed Approach
In this section, we present the details of the proposed network embedding based on neural network and attention mechanism. Firstly, we briefly introduce the definition of the problem. Then we discuss the details of the proposed discriminative distance metric learn model DDNE. Finally, we present some discussion and implementation of our objective function.
2.1 Preliminaries
Notations. Given a network \(G = (V,E)\), \(V={\lbrace v_1,...,v_V}\rbrace \) represents the set of nodes and \(E={\lbrace e_{ab}\rbrace _{a,b = 1}^n}\) represents the set of edges. We define the adjacency matrix of G as \(X = [X_{ab}]\), where \(X_{ab} = 1\) if \(v_a\) and \(v_b\) linked by an edge, otherwise, \(X_{ab} = 0\). Accordingly, given a node pair \(v_a, v_b\) in the network, \(X_{ab}\) is also the adjacency relation label of this node pair. D is a diagonal matrix, where \(D_{aa}\) represents the degree of \(v_a\).
Distributional Similarity. In this paper, the distributional similarity describes the relationship between node and its first-order proximity. For a node \(v_i\), \(N(v_i)\) denotes a set of nodes directly connected to \(v_i\). The distributional similarity of \(v_i\) is generated by its neighbors \(N(v_i)\), which means that nodes with similar neighborhoods are closer in the network embedding space.
Adjacency Similarity. In the network embedding space, the learned embedding vectors of two nodes are expected closer if they are adjacent. Accordingly, for each node pair, if \(X_{ij} = 1\), there exists a larger adjacency similarity than those without adjacency relation.
Network Embedding. Given a network denoted as \(G=(V,E)\), network embedding aims to learn a mapping function \(f:v_i\rightarrow u_i\in R^d\), where \(d\ll \vert V\vert \). The objective of our method is to make the distance between adjacent node pair closer than those node pairs without adjacency relation in the embedding space, while the distance between node pairs with similar neighbors(distributional similarity) is also closer in this space.
2.2 DDNE
In the attention phase, we calculate the weight \(\alpha _i^t\) of neighbor \(v_i^t\) by Eq. (3), which makes sure that the weight is larger when the degree of \(v_i^t\) is comparable with \(v_i\). Node embedding vector \(u_i\) is computed by Equation (4). The advantage of our attention model is that it can dynamically learn the weight of each neighbor according to its degree with the ego-node (same, large or low).
Distance Metric-Learner. Embedding vectors generated by distributional similarity based methods may generate distance bias problem. That is to say, the distance between non-adjacent nodes is closer than adjacent nodes, which does not conform with reality. In order to eliminate this problem, we measure the adjacency similarity using distance metric learning method, which aims to pull the distance between adjacent nodes-pair closer to highlight the adjacency similarity. For this purpose, we propose a distance constraint to restrict the distance margin between node pair with adjacency relation (positive node pair) and node pair without adjacency relation (negative node pair). Based on this, the adjacency similarity would be measured and the distance bias problem in the embedding space would be eliminated, as shown in Figure 3.
3 Experiment
In this section, we firstly introduce datasets and baseline methods in this work. We then evaluate our proposed methods in three network analysis tasks: network reconstruction, attribute prediction and recommendation. Finally, we analyze the quantitative experimental results and investigate the sensitivity across parameters.
3.1 Datasets and Baseline Methods
Google+^{1} is one of social networks. In which, nodes represent users and each has gender, university title, job title, last-name and workspace as its attribute.
Sina^{2} is the social network. In this network, users have attributes such as following number, self-introduction, constellation, age and location.
DBLP^{3} is a citation network in which nodes refer to papers and edges represent the citation relationship among papers. Each paper has attributes like title, authors, publication venue and abstract.
Movieslens^{4} is a recommendation network in which nodes refer to users and movies respectively and edges represent viewing record between users and movies. Each user has age, gender and occupation as its attribute information.
The statistic of datasets
Data | Nodes | Edges | Categories |
---|---|---|---|
Google+ | 3,126 | 22,829 | 7 |
Sina | 29,418 | 800,174 | 8 |
DBLP | 244,021 | 4,354,534 | 9 |
Movieslens | 943 | 100,000 | 4 |
SDNE [13] is the best topology-only network embedding method, which introduces an auto-encoder algorithm to learn the node embedding vector and considers the first-order and second-order proximities information.
LINE [9] is a popular topology-only network embedding method, which also considers the first-order and second-order proximities information.
DeepWalk [7] is a topology-only network embedding method, which introduces the Skip-gram algorithm to learn the node embedding vector.
GraphGAN [10] is a topology-only network embedding method, which introduces the GAN network to learn the node embedding vector.
DDNE is our proposed method using neural network (NN or LSTM) to model the distributional similarity and distance metric learning to model the adjacency similarity, which include \(\mathbf {DDNE}_\mathbf {NN}\) and \(\mathbf {DDNE}_\mathbf {LSTM}\).
Sigmoid: In this method nodes are represented by the local proximity through neural network (NN or LSTM) and the network structure is preserved through the sigmoid loss function, which includes \(\mathbf {S}_\mathbf {NN}\) and \(\mathbf {S}_\mathbf {LSTM}\).
3.2 Distance Bias Analysis
Compared with baselines, DDNE can guarantee consistency of the phenomenon that the distance between positive node pairs is closer than the distance between negative node pairs on different datasets. For example, with the sigmoid method, the distance between positive node pairs on Sina dataset is 2.0 but the distance between negative node pairs is 1.5, this distance bias is obviously contrary to cognition. Similarly, LINE and SDNE on DBLP, DeepWalk on Sina will also result in distance bias.
In contrast, the distances between positive node pairs with DDNE are smallest on all datasets, which means that the embedding vectors obtained by DDNE can better reflect the network structure.
3.3 Network Reconstruction
From Fig. 5, we can see that DDNE achieves the best performance when \(\beta = 0.6\), which improves the accuracy by \(6\%\) at most comparing to the best baseline SDNE. In addition, our method is more sensitive to the pre-defined threshold \(\beta \), which indicates that DDNE preserves the network structure better than other methods because there is a clearly distance margin between positive node pairs and negative node pairs in the embedding space generated by DDNE.
Precision of occupation prediction (%)
Methods | \(\alpha \) | SDNE | DeepWalk | LINE | GraphGAN | \(DDNE_{N}\) | \(DDNE_{L}\) | \(S_{N}\) | \(S_{L}\) |
---|---|---|---|---|---|---|---|---|---|
Google+ | 30% | 52.3 | 46.8 | 40.9 | 52.2 | 54.6 | 54.3 | 51.4 | 51.3 |
40% | 51.9 | 46.2 | 50.1 | 55.4 | 56.3 | 55.9 | 53.1 | 53.2 | |
Sina | 30% | 62.9 | 61.9 | 64.8 | 65.4 | 65.4 | 63.9 | 61.8 | 61.5 |
40% | 65.3 | 63.1 | 66.2 | 67.1 | 68.3 | 67.5 | 65.2 | 66.0 | |
Movieslens | 30% | 56.4 | 53.9 | 55.6 | 57.3 | 59.2 | 58.7 | 56.1 | 56.2 |
40% | 58.5 | 55.2 | 57.8 | 59.9 | 61.3 | 60.5 | 58.2 | 58.9 |
3.4 Attribute Prediction
We utilize the vectors generated by various network embedding or social network embedding methods to preform profile prediction task. User always cancel their attribute information or no attributes were filled in because personal attributes often involve users’ privacy issues, which results in a problem that user’s essential information can not be obtained directly. Thus, attribute prediction task can solve this problem and we treat this task as a classification problem. In our experiment, the embedding vector of each node (user) is treated as its feature vector, and then we use a linear support vector machine model to return the most likely category(value) of the missing attribute. For each dataset, we predict occupation. The training dataset consists of \(\alpha \)- portion nodes which are randomly picked from the network, and the rest of users are the test data.
Occupation Prediction. We make the experiment about occupation prediction. The result of this experiment is shown in Table 2.
One can see that DDNE also outperforms other embedding methods. Comparing to the best baseline GraphGAN, our method improve the accuracy by \(2.4\%\) at most. Besides, DDNE performs better than Sigmoid which demonstrates that the effective of our distance metric leaning objective function can help preserve the network structure better.
3.5 Recommendation
AUC of recommendation
Methods | SDNE | DeepWalk | LINE | GraphGAN | \(DDNE_{N}\) | \(DDNE_{L}\) | \(S_{N}\) | \(S_{L}\) |
---|---|---|---|---|---|---|---|---|
Movieslens | 65.9 | 62.3 | 63.6 | 64.8 | 79.8 | 78.9 | 71.2 | 71.9 |
DBLP | 78.7 | 75.2 | 76.1 | 76.8 | 88.9 | 88.2 | 81.0 | 80.6 |
From Table 3, we can see that DDNE performs best in both movies and papers recommendation. Compared to the SDNE, DDNE improves the AUC score by \(13.9\%\) in Movieslens and \(10.2\%\) in DBLP, which demonstrates the effectiveness of DDNE in learning good node embedding vectors for the task of recommendation.
4 Related Work
Network embedding aims to learn a distributed representation vector for each node in a network. Most of existing works can be categorized into three categories: matrix factorization based, random walking based and deep learning based methods. Matrix factorization based methods first express the input network with a affinity matrix into a low-dimensional space using matrix factorization techniques, including singular value decomposition which seeks a low-dimensional projection of the input matrix, and spectral decomposition (eigen-decomposition) [3] which uses a spectral decomposition of the graph Laplacian to compute the low-dimensional representation of input data. However, matrix factorization based methods rely on the decomposition of the affinity matrix, which is time-consuming when the data is large real-world networks.
Random Walk is an optimization algorithm in Graph, which can compute the globally optimal solution. As the first attempt, DeepWalk [7] introduces the word2vec algorithms(skip-gram) into learn the embedding of nodes in graph. Another famous work is Node2vec [2], which is a variant of Deepwalk. The most difference between those two is that node2vec changes random walk into biased random walk, and then it can select the next node in an heterogeneous way.
The last category is Deep learning based methods. Tang et al. propose LINE [9], which optimizes a carefully designed objective function through maximizing edge reconstruction probability. SDNE [13] is a deep network embedding method based on auto-encoder, which captures the highly non-linear network structure and exploits the first-order and second-order proximities to preserve the network structure. GraphGAN [10] is a framework that unifies generative and discriminative thinking for network embedding. DKN [12] learns knowledge graph embedding by TransX. The author used a CNN framework for combining word embedding and entity embedding and present an attention-based CTR prediction model meanwhile. SHINE [11] is a network embedding on signed heterogeneous information network, which is also based on auto-encoder.
5 Conclusion
In this paper, we introduce discriminative distance metric learning method to solve the distance bias problem. To adopt the adjacency similarity property, our model is able to preserve the network structure more efficiently. Experiments on three network analysis tasks verified the effectiveness of our approach. In the future work, we will research more deeply on the node encoder. On one hand, we will compare with other deep neural network models, such as CNN or deep RNN. On the other hand, we will try to integrate distributional similarity and adjacency similarity simultaneously in the node encoding phase.
Footnotes
Notes
Acknowledgement
This work was sponsored by the National Key R&D Program of China (NO. 2018 YFB1004704), the National Natural Science Foundation of China (U1736106).
References
- 1.Farnadi, G., Tang, J., Cock, M.D., Moens, M.: User profiling through deep multi modal fusion. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, Marina Del Rey, CA, USA, 5–9 February 2018, pp. 171–179 (2018)Google Scholar
- 2.Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016, pp. 855–864 (2016)Google Scholar
- 3.Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolution networks. CoRR (2016). abs/1609.02907Google Scholar
- 4.Feng, R., Yang, Y., Hu, W., Wu, F., Zhang, Y.: Representation learning for scale-free networks. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, 2–7 February 2018 (2018)Google Scholar
- 5.Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 8–13 December 2014, Montreal, Quebec, Canada, pp. 2177–2185 (2014)Google Scholar
- 6.Li, C., et al.: PPNE: property preserving network embedding. In: Candan, S., Chen, L., Pedersen, T.B., Chang, L., Hua, W. (eds.) DASFAA 2017. LNCS, vol. 10177, pp. 163–179. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55753-3_11CrossRefGoogle Scholar
- 7.Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, New York, NY, USA 24–27 August 2014, pp. 701–710 (2014)Google Scholar
- 8.Ribeiro, L.F.R., Saverese, P.H.P., Figueiredo, D.R.: struc2vec: learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017, pp. 385–394 (2017)Google Scholar
- 9.Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: LINE: large-scale in formation network embedding. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, 18–22 May 2015, pp. 1067–1077 (2015)Google Scholar
- 10.Wang, H., et al.: Graphgan: graph representation learning with generative adversarial nets. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, 2–7 February 2018 (2018)Google Scholar
- 11.Wang, H., Zhang, F., Hou, M., Xie, X., Guo, M., Liu, Q.: SHINE: signed heterogeneous information network embedding for sentiment link prediction. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, Marina Del Rey, CA, USA, 5–9 February 2018, pp. 592–600 (2018)Google Scholar
- 12.Wang, H., Zhang, F., Xie, X., Guo, M.: DKN: deep knowledge-aware network for news recommendation. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, 23–27 April 2018, pp. 1835–1844 (2018)Google Scholar
- 13.Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings of the 22nd ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016, pp. 1225–1234 (2016)Google Scholar