Keywords

1 Introduction

Question and Answering (Q&A) websites are gaining momentum as an effective platform for knowledge sharing. These websites usually have numerous users who continuously contribute. Many researchers have shown interests in the recommendation issues on these websites such as identifying experts. Despite the tremendous research efforts on user recommendation, no state-of-the-art algorithms consistently stand out compared with the others. As the recent work increasingly focuses on domain-specific expertise recommendation, there emerges the research on multi-domain (or cross-domain) recommendation in the “Stack Exchange (SE) Networks”Footnote 1 repository. SE is a network of 98 Q&A subsites, all following the same structure. This consistency enables us to expand our approach from one subsites to the all the other subsites on SE. These subsites cover various disciplines from computer science to even the Ukrainian language. Take “Stack Overflow”Footnote 2 (SO) as an example (Fig. 4). It is a software-domain-oriented website where users can post and answer questions, or vote up/down to other users’ questions and answers. The author of a question (a.k.a., the requester) can mark an answer as accepted and offer a bounty to the answerer.

So far, there are two popular ways to locate experts: collaborative filtering (CF) and content-based recommendation. The former extracts similar people without understanding the contents while the latter focuses on building user profiles based on users’ activity history. CF relies merely on ratings (e.g., scores in SE networks) and therefore may not well handle sparse Q&A subsites data, where many questions involve very limited users. Usually, users can vote on questions, and the vote counts can serve as ratings to the questions. An earlier work [1] also suggests that the lack of information can be a challenge for recommendation techniques. The work aims to address the data sparsity issue by selectively using the ratings of some experts. This experts presumed by this approach is exactly the same experts we aim to find. As for content-based approaches, a typical approach (e.g., [17]) builds user profiles based on user’s knowledge scores and user authority in link analysis. The knowledge scores are called reputation in [17], which is derived from users’ historical question-answering records. Srba et al. [22] point out that some users may maliciously post low-quality content, and those highly active spammers might be taken as experts in a system. Huna et al. [11] solve this problem by calculating question and answer difficulties based on three aspects of hints: the numbers of user-owned questions and answers, time difference of the question being posted and answered, average answering time, and score of the answer with the maximum of score among all the answers provided by the answerer. Although these approach may compute user reputation, they also take considerable cost on building user profiles. Matrix Factorization is one method that works on sparse data , while matrices can only store two dimensions of data, which is not handy in many applications, where users’ attributes can be vital to the identification of experts. Recently tensor-based approaches became popular as an alternative to matrix factorization, made it feasible to handle multi-faceted data [27]. For example, Ge et al. in [7] decompose a (Users, Topics, Experts) tensor for the personalized expert recommendation; Bhargave et al. [3] propose a (User, Location, Activity, Time) tensor decomposition along with correlated matrix to make recommendations based on user preferences.

Fig. 1.
figure 1

Work-flow of our proposed methodology: for a given input query, experts are output based on the detected topic of the query combined with our 4th order tensor, which contains latent information like topics, questions, voting, and experts.

We aim to recommend experts in multiple areas simultaneously (Fig. 1). In particular, we use the Stack Exchange networks dump, which contains various areas, to build up a multi-domain dataset. We propose group lasso [15] that works on a relationship tree formed upon the natural structure of the SE network. The tree is used to guide the decomposition of 4th rank tensor data consisting of questions, topics, voting and expertise information. We additionally factorize selected matrices to provide additional latent information.

Our contributions in this work are as follows:

  1. 1.

    We take the hierarchical relationship between participants and topics into account and build a model that combines tree-guided tensor decomposition and matrix factorization;

  2. 2.

    We introduce the relationship tree group lasso to alleviate the data sparsity problem;

  3. 3.

    We conduct experiments on real-world data and evaluate the proposed approach against state-of-the-art baselines.

2 Related Works

Expert recommendation has been studied extensively in the past decade. Generally, skillfulness and resourcefulness of experts can assist users in making decisions more professionally and solving problems more effectively and efficiently. That is, making appropriate recommendations to users with the different requirement can be important.

The expert recommendation techniques apply to many areas, and different fields may require differently in methodologies to handle different situations. Baloga et al. [2] introduce a generative probabilistic framework for find experts in various enterprise data sources. Daud et al. [4] devise a Temporal-Expert-Topic model to capture both the semantic and dynamic expert information and to identify experts for different time periods. Fazelzarandi et al. [6] develop an expert recommendation system with utilizing the social networks analysis and multiple data source integration techniques. Wang et al. [23] propose a model ExpertRank which take both document profile and authority of experts into consideration to perform better. Huang et al. [10] take advantage of word embedding technology to rank experts both semantically and numerically. More relate works can be found in a survey by Wang et al. [24].

The works mentioned above mostly focus on recommend experts for organizations, enterprises or institutes. There is also some literature on recommending experts in Q&A System, which is more related to our work. Kao et al. [13] propose to incorporate user subject relevance, user reputation and authority of categories into expert finding system in Q&A websites. Riahi et al. [21] investigate two topic model namely Segmented Topic Model and Latent Dirichlet Allocation model to direct new questions in Stack-overflow to related experts. Ge et al. [7] propose a personalized tensor-based method for expert recommendation by considering factors like geospatial, topical and preferences. Liu et al. in [18] propose a method to rank user authority by exploiting interactions between users, which is aimed to avoid potential impacts of users with considerable social influences. They introduced topical similarities into link analysis to rank user authorities for each question. Latent Dirichlet allocation is applied to extract topics from both the questions and answers of users so that topical similarities between questions and answers can be measured, and then related users can be ranked by links. Huna et al. found Q&A communities often evaluate user reputation limited to the number of user activities [11], regardless of efforts on creating high-quality contents. This causes inaccurate measurements in user expertise and their value. Inspired by former works, they calculate user reputations for asking and answering questions. The reputation results from the combination of the difficulty score of a question and the utility score for the question or answer. A utility score measures the distance between a score and the maximum score of the post, and the difficulty measures the times that a user spends on the question. The time spent on questions is normalized on each topic. Fang et al. [5] are well aware of the quantity of social information Q&A website can provide, along with the importance of user-generated textual contents. Their idea to simultaneously model both social links and textual contents leads to the proposed framework named “HSNL” (CQA via Heterogeneous Social Network Learning). The framework adopts random walk to exploit social information and build the heterogeneous social network, and a deep recurrent neural network was trained to give a text-based matching score for questions and answers.

Our proposed model builds on tensor decomposition, which has been applied to various fields such as neuroscience, computer vision, and data mining [16]. CANDECOMP/PARAFAC (CP) and Tucker decomposition are two effective ways to solve tensor decomposition problems. We adopt the former in this work. Tensor decomposition based recommender systems can also be found widespread in recent studies. Rendle et al. [19] introduce a tensor factorization based ranking approach for tag recommendation. They further improve the model by introducing pairwise interaction and significantly improve the optimization efficiency. Xiong et al. [25] propose a probabilistic tensor decomposition model and regard the temporal dynamics as the third-dimension of the tensor. Karatzoglou et al. [14] offer a context-aware tensor decomposition model to integrate context information with collaborative filtering tightly. Hidas et al. [9] investigate approach which combines implicit feedback with context-aware decomposition. Bhargava et al. [3] present a tensor decomposition-based approach to model the influence of multi-dimensional data sources. Yao et al. [26] decompose tensor with contextual regularization to recommend location points of interest.

3 Methodology

CANDECOMP/PARAFAC Tensor Decomposition, or CP Decomposition, is discovered by Kiers and Möcks independently [16]. For a Rank-R size-N tensor \(\mathcal {X}\) (\(R\in \mathbb {N}\)), let \(U_1\in \mathbb {R}^{I_1\times {R}}, U_2\in \mathbb {R}^{I_2\times {R}}, ..., U_R\in \mathbb {R}^{I_N\times {R}}\), we have the decomposition:

$$\begin{aligned} \mathcal {X} \approx \sum _{r=1}^R {U_1}_{i_1r}{U_2}_{i_2r}\cdots {U_R}_{i_Nr} \end{aligned}$$
(1)

While multiple methods can do tensor decomposition, the most common and effective one shall be the alternating least squares (ALS) [16].

3.1 Relationship Tree Modelling

Our data is naturally divided into subsites, topics, and posts, as shown in Fig. 3. This decomposition forms a tree, with subsites on top, and posts as leaves. As our tensor models the expertise information based on user activities, this tree reserves the relationships of entities. We illustarte the construction of the tree as follows (Fig. 2).

Fig. 2.
figure 2

An example of modeled tree representation of hierarchical relationship

Given the tree \(\mathcal {T}\), we assume that the i-th level of \(\mathcal {T}\) has \(n_i\) nodes and organized as \(\mathcal {T}_i = \{ G_1^i, G_2^i,..., G_{n_i}^i\}\). And so, a group \(G_v\) where node \(v \in V\) is in the tree, and all leaves under v are in \(G_v\). Now we can define a tree-structured regulation as

$$\begin{aligned} Weight(\mathbf {U}_1)=\frac{\lambda _W}{2}\sum _{k=1}^{J}\omega _j^i\Vert \mathbf {U}_{1_k}\Vert _2^2 \quad \text {(} \mathbf {U}_{1_k} \in G_j^i \text {)} \end{aligned}$$
(2)

This inspired from Moreau-Yosida regularization, and here \(\lambda _W\) is the Moreau-Yoshida regulation parameter for tree \(\mathcal {T}\), \(\Vert \cdot \Vert \) denotes Euclide an norm, \(\mathbf {U}_{1_k}\) is a vector of \(\mathbf {U}_1\), where \(\mathbf {U}_1\) is the first factor matrix of the tensor \(\mathcal {X}\), which corresponding to a question post and detailed explaination can be found in the following subsection. Additionally, \(\omega _j^i\) is set by following Kim’s approach [15] and it means a pre-set weight for j-th node at level i. \(\omega _j^i\) can be obtained by setting two variables summed up to 1, i.e. \(s_j^i\) for the weight of independent relevant covariates selecting and \(g_j^i\) for group relevant covariates selecting. We have:

$$\begin{aligned} \sum _{i}^{d}\sum _{j}^{n}\omega _i^j\Vert \mathbf {U}_{1_{G_j^i}}\Vert _2 = \lambda \omega _0^j \end{aligned}$$
(3)

where

$$\begin{aligned} \omega _i^j = {\left\{ \begin{array}{ll} s_j^i \cdot \sum \nolimits _{c_p^q \in \text {Child}(v_j^i)}|\omega _p^q| + g_j^i \cdot \Vert \mathbf {U}_{1_{G_j^i}}\Vert _2 &{}v_j^i~\text {is a internal node},\\ |\mathbf {U}_{1_{G_j^i}}| &{} v_j^i~\text {is a leaf node}. \end{array}\right. } \end{aligned}$$
(4)
Fig. 3.
figure 3

Tree representation of hierarchical entity relationship

Fig. 4.
figure 4

An example of Stack Overflow post (postId:34672987), here demonstrates a question with its description and comments, along with score of the question.

3.2 Proposed Model

Our dataset is obtained naturally categorized by their subdomains, which we call it “subsites” here. Additionally, in each subsite, we can find tags in every post, and such information is often an indicator of the post’s topics. Accordingly, after gathering those data, we can build a tree to represent such hierarchical information (shown in Fig. 3).

All Stack Exchange subsites share the same structure. That means, in all this subsites, answerers may propose multiple answers and questioners can adopt only one answer for each question. Also, both question and answers can be commented and voted, and the difference between vote-ups or vote-downs on each question is calculated into a score. Figure 4 shows an example.

Fig. 5.
figure 5

Proposed decomposition

Instead of the simple score-user matrix based recommendation, we propose a tensor-decomposition based tree-guided method, based on the basic idea of Tree-Guided Sparse Learning [12].

  1. 1.

    A 4th-order-tensor, Question  \(\times \) Topic \(\times \) Voting \(\times \) Expert. Shown in Fig. 5, we denoted it as \(\mathcal {X} \in \mathbb {R}^{I\times {J}\times {K}\times {L}}\), where I is the number of questions, J is the number of Topics, K is the number of voting of question towards questioners, L is the expert users and the value of the tensor is the number of expertise evaluation criterion. With limited users participated in certain domains, it is believed that the tensor is very sparse. Additionally we denote \(\mathbf {U_1} \in \mathbb {R}^{I\times {R}}, \mathbf {U_2} \in \mathbb {R}^{J\times {R}}, \mathbf {U_3} \in \mathbb {R}^{X\times {R}}, \mathbf {U_4} \in \mathbb {R}^{L\times {R}}\) as factor matrices of tensor \(\mathcal {X}\).

  2. 2.

    A subsite \(\times \) answerer matrix. We denoted this as \(M \in \mathbb {R}^{X\times {Z}}\), where if answerer z appears in subsite x, \(M_{x,z} = 1\) else \(M_{x,z} = 0\).

  3. 3.

    A topics \(\times \) answerer matrix. We denoted this as \(N \in \mathbb {R}^{Y\times {Z}}\), similarly here, when answerer z appears in topic y, \(M_{y,z} = 1\) else \(M_{y,z} = 0\).

  4. 4.

    Hierarchical relationship tree \(\mathcal {T}\) of depth d. Due to the isolation of subsites and their topics, our data show clearly a structured sparsity. Thus, we can utilize tree-guided group lasso in our model. That is, besides above two supplement matrices, we also use the tree shown in Fig. 3 to guide the learning.

Table 1. Symbol table

After modeling the data, we apply CANDECOMP/PARAFAC (CP) tensor decomposition to factorize the tensor and solve the tree-structured regression with group lasso (Table 1).

figure a

First, we decompose the 4th-order tensor with regulation by Alternating Least Square (ALS) as follows:

$$\begin{aligned} \begin{aligned} Tensor(\mathbf {U_1}, \mathbf {U_2}, \mathbf {U_3}, \mathbf {U_4})&=\frac{1}{2} \Vert \mathcal {X}-{\llbracket \mathbf {U_1}, \mathbf {U_2}, \mathbf {U_3}, \mathbf {U_4} \rrbracket }\Vert _F^2 \\&+ \frac{\lambda _{\mathcal {X}}}{2} (\Vert \mathbf {U_1}\Vert _F^2 + \Vert \mathbf {U_2}\Vert _F^2 + \Vert \mathbf {U_3}\Vert _F^2 + \Vert \mathbf {U_4}\Vert _F^2) \end{aligned} \end{aligned}$$
(5)

Then, we can have the aforementioned 2 matrices decompose as:

$$\begin{aligned} Networks(\mathbf {S}, \mathbf {A})=\frac{1}{2}\Vert \mathbf {M}_{site} - \mathbf {SA}^T\Vert _F^2 + \frac{\lambda _S}{2}(\Vert \mathbf {S}\Vert _F^2 + \Vert \mathbf {A}\Vert _F^2) \end{aligned}$$
(6)
$$\begin{aligned} Topic(\mathbf {T}, \mathbf {A})=\frac{1}{2}\Vert \mathbf {M}_{topic} - \mathbf {TA}^T\Vert _F^2 + \frac{\lambda _T}{2}(\Vert \mathbf {T}\Vert _F^2 + \Vert \mathbf {A}\Vert _F^2) \end{aligned}$$
(7)

Since each subsite \(S_j\) contains a group of questions \(\mathbf {U}_{1_{j}}\), we expect \(S_j\) to be similar to the average \(\mathbf {U}_{1_{j}}\), which can be solved as a regulation:

$$\begin{aligned} Site(\mathbf {S}, \mathbf {U_1})=\frac{\lambda _S}{2}\sum _{j=1}^{U}\Vert \mathbf {S}_j-\frac{1}{G_j^1}\sum _{\mathbf {U}_{1_k} \in G_j^1}\mathbf {U}_{1_k}\Vert _2^2 \end{aligned}$$
(8)

By combining those objectives and regulations, we have the following objective function:

$$\begin{aligned} \begin{aligned} f(\mathbf {U_1}, \mathbf {U_2}, \mathbf {U_3}, \mathbf {U_4}, \mathbf {S}, \mathbf {A}, \mathbf {T})&= Tensor(\mathbf {U_1}, \mathbf {U_2}, \mathbf {U_3}, \mathbf {U_4})\\&+Weight(\mathbf {U}_1) + Networks(\mathbf {S}, \mathbf {A})\\&+ Topic(\mathbf {T}, \mathbf {A}) + Site(\mathbf {S}, \mathbf {U_2}) \end{aligned} \end{aligned}$$
(9)

Equation 5 follows the CANDECOMP/PARAFAC Decomposition, accomplished by the ALS algorithm (see Algorithm 1), which is a popular way to decompose a tensor.

Computational Complexity Analysis. The time complexity of the above decomposition includes two parts. The first concerns initializing the set of \(\mathbf {A}^{(n)}\)s. We note the average of the dimension of our tensor as D, which we use to represent the size of the tensor as \(\mathbf {D}^{N}\). The initialization is a traverse of \(\mathbf {A}^{(n)}\)s and has a time complexity of \(\mathcal {O}(NDR)\). Assuming that we use index flip to implement the matrix transpose, its time complexity is \(\mathcal {O}(1)\). Thus, the total time complexity on N loops is \(\mathcal {O}((NDR)^2 + N^2DR)\) time. Combining the two steps, we now have the time complexity of the algorithm as \(\mathcal {O}((NDR)^2)\).

4 Experiments and Evaluation

In this section, we report our experiments to evaluate our proposed approach. We first briefly introduce our dataset and the evaluation metrics, and then present the results analysis and evaluation.

Table 2. Adopetd reputation rules

Until now, there is no “gold standard” to evaluate our approach regarding expert recommendation, to the best of our knowledge. Also, it is difficult to judgment user’s expertise manually due to the large-scale data (e.g., our test data contains more than 2 million users and nearly 20 million voting activities on 5 million posts) and the lack of ranking information in the dataset—the reputation scores of users in Stack Exchange systems are computed globally, which cannot be utilized to evaluate individual’s ability in specific domains or topics.

Similar to Huna et al. [11], we calculate the reputation score of each user by topics, according to the rules adopted by Stack ExchangeFootnote 3. We simplify the rule by removing bounty-related and edition-related reputation differences. Table 2 summarizes the simplification results. A rank can be established based on the built-in reputation scores of users, following the approach proposed by Huna et al. [11]. The rank serves as a baseline for comparative performance evaluation. Given the lack of a standard to measure verifiable expertise of users, we adopt this idea and conduct comparison experiments.

4.1 Dataset and Experiment Settings

Dataset. As mentioned above, the Stack Exchange Networks includes 98 subsites and massive data. We identified 14,220,976 users, 46,575,393 posts, 178,575 tags, and 178,184,014 votes. Computing at such a scale can be challenging to any existing systems. Thus, in this work, we conducted experiments on several reasonably selected subsets, which contains a feasible yet still decent volume of data.

Table 3. Selected statistics profiles of experiment dataset

Note that, our method is a tree-guided tensor decomposition approach, where the tree models the hierarchical entity relationships including topics information. To keep the variance of the topics, we generate our testing subsets from several independent subsites. These subsites are named as “apple”, “math”, “stats”, “askubuntu”, “physics”, “superuser”, “gis”,“serverfault”, and “unix”. Some selected statistics profiles can be found at Table 3.

Due to the massive scale of our data source and its high degree of sparseness, a random sampling could end up output posts with an enormous number of unrelated users and topics. Hence, we first sample randomly to select a subset of users and then enumerations on posts tags and voting are performed. This ensures the selected posts and votes are all related to the sampled users.

4.2 Results Analysis and Evaluation

Evaluation Metrics

  • Precision@ k . Precision@k is one of standard evaluation metrics in information retrieval tasks and recommender systems. It is defined to calculate the proportion of retrieved items in the top-k set that are relevant. Here our frameworks return a list of users so that the Precision@k can be calculated as follows:

    $$\begin{aligned} P@k=\frac{|\{relevant\_top-k\_users\}\cap \{retrived\_top-k\_users\}|}{|\{retrived\_top-k\_users\}|} \end{aligned}$$
  • MRR. The Mean Reciprocal Rank is a statistic measure for evaluating response orderly to a list, which here is average of reciprocal ranks for all tested questions:

    $$\begin{aligned} MRR=\frac{1}{|Q|}\sum {i=1}{|Q|}\frac{1}{Rank_i} \end{aligned}$$

Compared Methods

  • Baselines. Apart from the reputation value calculated by Stack Exchange rules mentioned earlier in Table 2, it also can be found that some baselines are also often used apart from reputation value. Namely, lists generated by rank by “Best Answer Ratio” of users and rank by “Number of Answers” produced by users.

  • MF-BPR [20]. Rendel et al. introduce pairwise BPR ranking loss into standard Matrix Factorization models. It is specifically designed to optimize ranking problems.

  • Zhang et al. [28], Z-Score by Zhang et al., is a well-known reputation measure, despite their original work is a PageRank based system and is not aimed at measurements. This feature-based score can be resolved by q the number of questions a user asked and a, the number of answers the user posted. That is,

    $$\begin{aligned} Z-Score=\frac{a-q}{\sqrt{a+q}} \end{aligned}$$
  • ConvNCF [8]. Outer Product-based Neural Collaborative Filtering, a multi-layer neural network architecture based collaborative filtering method. it use an outer product to find out the pairwise correlations between the dimensions of the embedding space.

Fig. 6.
figure 6

Preformance comparison of our approach to others, tested with 250 users and their historical data

Fig. 7.
figure 7

Precision and MRR of tests at various number of users

Results Analysis. Figure 6 shows the evaluation results with respect to the Precision and MRR of different methods, where precision measures the ability to find experts and MRR the performance of outputting list of experts in correct order. We observed that our approach generally outperformed other tested approaches, although some other approaches produces more accurate list when the length of the requested list is no more than 3, and this can be claimed less likely to be practical. Our approach yielded better ranks in most cases except some case where very short lists were requested. Yet, It can be argued, in real life applications, a the list of approximately 10 or more experts is largely sensible and our approach will have substantial better performance. Also interestingly, here we can see both precision and MRR decreases by the increase of K, which differs from our experience of previous work. And a further look at the distribution of reputation in our tested data reveals it actually sensible, as we can see in Fig. 8, the distribution of users’ reputation is considerably uneven, given very few people high have reputation, which are our goal of output, and most people in the dataset are reputed at value 1. Additionally, to assess the stability of our approach, we conducted tests with various size of input data, ranging from 100 users to 300 users. Besides acceptable fluctuations, the results demonstrate our approach performs relatively stable, both in accuracy and quality (Fig. 7).

Fig. 8.
figure 8

Distribution of reputation of users in our dataset

5 Conclusion

In this paper, we have proposed a framework to identify experts across different collaborative networks. The framework use tree-guided tensor decomposition to exploit insights from Q&A networks. In particular, we decomposite a 4th rank tensor with tree-guided lasso and matrix factorization to exploit the topic information from a collection of Q&A websites in Stack Exchange Networks to alleviate the data sparsity issue. The 4th rank tensor model of the data ensures to keep as much as information as needed, which confirmed by experiments and evaluation. Due to the lack of “Gold Standard”, we compared our approach with baselines accordingly to the rank by the reputation score calculated by Stack Exchange built-in approaches on each topic. The comparison results demonstrate the feasibility of our approach. The proposed approach can be applied to broader scenarios such as finding the most appropriate person to consult on some specific problems for individuals, or identifying the desired employees for enterprises.