Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Social networking has become one of the most popular internet activities online with nearly 74% of online adults using social networking sites, as per a recent studyFootnote 1. Profile creation in the popular social networks (Facebook, Twitter, Google+, LinkedIn) requires users to give out their personal information, with only a few fields (e.g., name) being mandatory and the rest being optional (e.g., college, age, relationship status). Many studies [5, 17] have highlighted the privacy aspect of giving out these personal information and how users navigate around these through proxies, not filling in the attributes or hiding important profile information from being publicly visible. Study on Google+ [8], reported that as much as 70% of the users do not provide any attributes while another Google+ study [14] reported only about 10% of the users provide more than six attributes. A study on Twitter [12] reported only about 20% of users having provided their home cities. These studies confirm the norm of leaving attributes unfilled or making the attributes publicly invisible.

This paper focuses on privacy leakage of personal information even user’s and/or friends’ attributes are not completely filled or publicly visible. Furthermore, research into attribute inference of a user have concentrated on utilising available information that the user himself has publicly disclosed, albeit partially, along with information from the linked users (followers/followings) who may have made their information publicly available. We concentrate on an ego network which is a subnetwork inside a social network, consisting of the ego user (whose attributes are to be inferred), his linked users and the links between them.

Model: Multiple research works have utilised the disclosed attributes by the ego and by the users linked to the ego in inferring ego user attributes and identifying the circles to which users belong. Attribute inference work [16], using Facebook, proposed that friends share all attributes with the ego if they belong to a community (or circle). Circle detection works [10, 21] substantiate this understanding to be flawed by empirically confirming that friends belonging to a circle share only one or a few attributes with the ego. Attribute-Circle Dependency [11], captures this concept of a circle created with friends sharing the same college (i.e., classmates) and another circle created with friends sharing the same employer (i.e., colleagues). Hence, the knowledge of circle can help in attribute inference. Since circle information is publicly unavailable, [11] proposed to co-profile the circle members and then the ego attributes. However, they neglected the aspect of ego user being strongly connected to some while weakly connected to the others. In fact, [3] pointed at users tending to preferentially attach and spend time on creating and maintaining those relationships in which they are interested in. Research into tie-strength measurement point at the often neglected aspect of low cost of link formation that can lead to networks with heterogeneous relationship strengths (e.g., acquaintances and best friends mixed together) [20]. Works [15, 19] also point at another dimension to link creation, that of users feeling obliged to link with users. Hence, to address this, in our current work, we utilize tie-strength as a proxy to measure the influence of each linked user in ego user attribute inference. In our approach for addressing the attribute inference from social neighbours, we utilise the concepts of Influence (I) of linked users, Attribute-Circle dependency and model it into a cost function which is iteratively updated until the cost cannot be further minimised, to arrive at the ego user attribute values. With studies reporting that a majority of users leave their attributes unfilled or publicly invisible [8, 12], we assume all attributes of the ego user have to be inferred.

Attribute Inference in Directed vs Undirected Networks: An ego network in directed social networks consists of linked users who are directly connected to the ego user through a follower or following link. Directed social networks provide the ability to follow any user without the need for the followed user to either provide explicit consent to be followed or reciprocate the friendship. This ability, though beneficial, has a risk attached to it due to attribute inference capabilities from attributes of the followers and/or followings.

Attribute inference works so far have not differentiated attribute inference in directed and undirected networks [11, 16]. Though [8] use Google+, a directed social network, they convert the directed network into undirected network by keeping only bi-directional links (friends subnetwork). (In this paper, we use the standard terminology used in research into social network graph, friend node - to mean a node who follows the ego user and the ego user follows the node back). This is the general norm due to the inherent belief that only mutual friendships give out the true reason for connecting over a social network [8]. We show, further, through experiments that this belief is not beneficial in the case of attribute inference. In a social network where the reciprocity of linking is not essential (directed social network), if the attribute inference is still feasible and is better than the converted undirected social network, the magnitude of risk to privacy is amplified. Hence, in this work, we would like to concentrate on attribute inference in directed network and compare results obtained between directed social network and undirected social network obtained by converting directed network. As far as our knowledge goes, this is the first work to give a detailed account of using followers subnetwork (followers and links only), following subnetwork (following users and links only), or all links subnetwork (followers and following links) along with friends subnetwork for attribute inference in a directed social network.

Experiments and Results: Similar to work that utilise partial social graph for attribute inference [4], partial ego network (a partial social graph) in the form of follower subnetwork or following subnetwork is shown to be capable of attribute inference in directed social network. Experimental results show the proposed method to be better than the Co-profiling approach [11] which is the previous best method of attribute inference bettering works [13, 16, 22, 23].

We present results for attribute inference when attributes are inferred together and when attributes are inferred separately. It is important to note that the works so far [11, 13, 16, 22, 23] have concentrated on inferring each attribute separately. Though, inference accuracy of attributes inferred separately is greater than the inference accuracy of attributes inferred together, the later approach can provide gains in terms of run-time as all attributes are inferred together.

The rest of the paper is organised as follows. We define the problem in Sect. 2 and present the proposed method in Sect. 3. We then show the experimental results in Sect. 4 and conclude the paper in Sect. 5.

2 Problem Abstraction

In our work to infer attributes of a user in a directed social network, we propose to concentrate on a user’s ego network. We study the problem of inferring attributes of the ego user \({v}_{0}\) in a subnetwork of followers, following, friends and all_links each of which have been constructed considering followers, followings, friends and all linked_user’s for each ego user. These can be considered as four datasets which are constructed for every ego network.

Figure 1 shows a sample ego network which is a subnetwork inside a social network consisting of the ego node \({v}_{0}\), nodes connected to the ego node (e.g., \({v}_{1}\), \({v}_{2}\), \({v}_{3}\)), directed links between ego and the nodes (e.g., \({({v}_{0},{v}_{1})}\), \({({v}_{2},{v}_{0})}\)) as also the links between the nodes (e.g., \({({v}_{1},{v}_{2})}\), \({({v}_{6},{v}_{3})}\)). The vertices (or nodes) V and edges E (or links) forming the social network, can be represented as a directed graph \(G= (V,E)\) where edge E is formed by a directed link starting from vertex \({v_i}\) and ending at vertex \({v_j}\) represented as the pair \({(v_i,v_j)}\). The nodes except the ego user are denoted by \({V}^{'}\) and the links between the nodes belonging to \({V}^{'}\) are represented by \({E}^{'}\) with \({E}^{'} \subset {E}\) and \({V}^{'} \cup {v}^{0} = V\). We call users belonging to \({V}^{'}\) as linked_users who are linked to the ego user \({v}_{0}\) through a follower or following link. Since the problem setting addresses attribute inference in directed social networks, the directionality of the link is of importance. Therefore, Link \(({v_i},{v_j}) \ne ({v_j},{v_i})\).

Fig. 1.
figure 1

A sample ego network

Each node represents a social network user, who creates a profile with various attributes describing the user. In this work we infer categorical attributes college, employment and location. Proposed method can be extended to include non-categorical attributes (skills, bio), by clustering them into categorical attributes before applying attribute inference. \({A}_{p}\) denotes attribute p to be inferred and \(a_{i,p}\) denotes its value for user \(v_i\). We assume that each user has a single value for the attribute in question (e.g., user has worked only at Google). This can be extended to include multivalued attributes by considering the top-n results of attribute inference for the user instead of the top value alone.

In the ego user \({v}_{0}\)’s ego network, some of the users have all the attributes filled in while some have filled only a few of their attributes with the majority rest leaving all attributes unfilled or publicly invisible. Users who have filled in only a few attributes are called Partially Labelled users (denoted as P), while users who have filled all the attributes are called Labelled users (denoted as L), with the users who have no attributes filled in, called the Unlabelled users, (denoted as U ). As described in Sect. 1, we assume \(v_0 \in U\).

3 Proposed Solution for Attribute Inference

3.1 Concepts

Given an ego user and the network structure formed by ego’s links with the nodes in his ego network, the aim is to infer ego users’ attributes from the known attributes of the linked_users. In an ego network, the circles to which linked_users belong is unknown and so are some or all of their attributes. A circle can be considered as user grouping suggested by the Social network provider or explicitly grouped by the ego user. It is called as Circles in Google+ while it is called Lists in Facebook and Twitter. Works on circle detection [10, 21] empirically confirm users belonging to a circle share one or a few attributes. With the understanding that both circle and attributes can be co-profiled as both are dependent on each other, the co-profiling approach was proposed by Li et al. [11] which bettered all the existing methods for attribute inference [13, 16, 22]. Hence, our starting point was their work. Two concepts that formed the basis of their work, which we continue to use, are Attribute-Circle dependency (Concept 1) and Circle-Connection dependency (Concept 2).

In social networks, some users may be strongly connected (through preferentially attaching) to the ego user while some form casual links (obligatory links). This forms the basis of research into finding differing tie-strengths between users [7, 20]. We use the concept of tie-strength to find the Influence (I) of each linked_user in attribute inference unlike [11] which gives each linked_user and his attributes equal importance.

Thus, the concepts that form the basis of our model are Attribute-Circle dependency (Concept 1) and Influence (Concept 3) of linked_user. Even though Circle-Connection dependency (Concept 2) is part of the model, Concept 2’s inclusion into the cost function of attribute inference is shown to have no effect on attribute inference. Concept 3 (Influence I) of each linked_user is measured through tie-strength between the ego and the linked_user. Tie-strength offers the concept of users being strongly connected to some users while weakly connected to others. Research into tie-strength measurement point at the often neglected aspect of low cost of link formation that can lead to networks with heterogeneous relationship strengths (e.g., acquaintances and best friends mixed together) [20]. We propose to use tie-strength normalized as Influence (I) by including it in cost function used for ego user attribute inference.

3.2 Notation

\({f_i}\) represents the attribute vector of a given user \({v_i}\) with each dimension of the attribute vector representing a candidate value of an attribute. An attribute vector \(<0,1,0,1,1,1>\) represents a candidate vector set <Harvard, UNSW, Facebook, Google, Sydney, New York>, with 1 representing the presence of the attribute and 0 the absence. The candidate attribute values used in inferring ego user’s attribute are obtained by the known attribute values of the L and P users. The value of the \({y}^{th}\) dimension of \({f_i}\), denoted as \(f_{i,y}\) is a real number greater than 0, indicating the likelihood of this attribute value being the attribute value of \({v_i}\). The L and P users \(\left\{ v_i \in L,P\right\} \) have \(f_{i,y}=1\) for an observed attribute value. The \({f_i}\) is an unknown vector for unlabelled user \(\left\{ v_i \in U\right\} \) initially and is determined through the proposed algorithm.

The circle to which a user \(v_i\) belongs to called circle assignment, denoted by \({x_i}\), can be a value between 1 to K with K denoting the number of circles. A circle \(C_t\) is given as \(\left\{ v_i \in V^{'}|x_i=t\right\} \). We model each linked_user as belonging to one circle in a given ego network in this work and do not consider circle overlapping as part of this work.

The attribute value that is associated with a circle (the common attribute value) is denoted by an association vector \(w_t\) for each circle \(C_t\). \(w_t\) is a binary vector with \(\left\{ w_{t,y} \in {0/1}\right\} \) for a given dimension y indicating its association with the circle \(C_t\). Using the example mentioned above, \(<1,0,0,0,0,0>\) indicates the users in \(C_t\) share one attribute value with the ego, indicating, they all study at Harvard and have other attributes different from ego.

The Influence (I) of a linked_user provides the information about which linked_user should influence more than the other in inferring ego user attributes. Within the circle to which a linked_user belongs, Circle level Influence (\({I_i}_{\equiv C_t}\)) provides the influence of the linked_user \(f_i\) and Global level Influence (\({I_i}_{\equiv L_i}\)) provides the overall influence of the linked_user irrespective of the circle to which he belongs.

3.3 Model

As suggested by Attribute-Circle dependency and Circle Influence, if two users \({v_i}\) and \({v_j}\) belong to the same circle \({C_t}\), their attribute vectors should be close on the dimension of associate attribute value \({w_t}\) of the circle. If \({v_i}\) and \({v_j}\) share same relationship with the ego user \({v_0}\) and have the same relationship t, then their influence within the circle \(C_t\), should also be close. Thus, minimizing the squared distance measure we arrive at \(\sum \limits _{e_{ij} \in E^{'}, v_i, v_j \in C_t} (w_t \cdot (f_i \cdot {{I_i}_{\equiv C_t}} - f_j \cdot {{I_j}_{\equiv C_t}}))^2 + \sum \limits _{v_i\in {C_t}} (w_t \cdot (f_0 \cdot 1 - f_i \cdot {{I_i}_{\equiv C_t}}))^2\) where \(f_i\) and \(f_j\) denote the attribute vectors and \({I_i}_{\equiv C_t}\), \({I_j}_{\equiv C_t}\) denote tie-strength normalized into Influence (I) of \(v_i\), \(v_j\) respectively, with \(w_t\) representing the associate attribute vector of circle \(C_t\).

The L and P users provide the explicit knowledge in establishing the associate attribute value of a circle. As such, the associate attribute value \(w_t\), of a circle \(C_t\), should be the value shared by many L and P users and Global Influence (\({I_i}_{\equiv L_i}\)) should be similar which is to minimize \(\sum \limits _{v_i\in L,P \cap C_t} {I_i}_{\equiv L_i}\cdot (w_t\cdot f_i - 1)^2\). Circle-Connection dependency, is given by \(\sum \limits _{e_{ij}\in E^{'}, x_i != x_j} 1\), which is to minimize the inter circle connections between linked_users. This is also included into the cost function. Thus, we arrive at the cost function as in Eq. 1.

$$\begin{aligned} \begin{aligned}&\sum \limits _{t=1}^K \{ \sum \limits _{e_{ij} \in E^{'}, v_i, v_j \in C_t} (w_t\cdot (f_i\cdot {{I_i}_{\equiv C_t}} - f_j\cdot {{I_j}_{\equiv C_t}}))^2 \\&+ \sum \limits _{v_i\in {C_t}} (w_t\cdot (f_0\cdot 1 - f_i\cdot {{I_i}_{\equiv C_t}}))^2 \} \\&+ \sum \limits _{t=1}^K \sum \limits _{v_i\in L,P \cap C_t} {I_i}_{\equiv L_i}\cdot (w_t\cdot f_i - 1)^2 + \sum \limits _{e_{ij}\in E^{'}, x_i != x_j} 1 \end{aligned} \end{aligned}$$
(1)

3.4 Algorithm

In order to infer the attribute values using the known (ego network structure, tie-strength), partially known (attribute values of some nodes of the ego network) and unknown (circles) information, the unknown values have to be initialized in the first step after which the algorithm can be applied. By initializing the unknown variables and with the knowledge of known variables, we intend to minimize the cost function given by Eq. 1 and iteratively update the unknown variables till convergence, similar to co-ordinate descent method. We update the unknown variables, i.e., circles \(C_t\), the associated circle value \(w_t\), the attribute vectors of the ego user \(f_0\) and Unlabelled users \(f_i \in U\). One variable is iteratively updated while keeping others constant as part of the algorithm.

Initialization: The L and P users attribute vector \(f_i\) is known and the Unlabelled users’ attribute vector and the ego user \(v_0\)’s attribute vector is unknown. We initialise the Unlabelled users attribute vectors to 0.5 on all dimensions (\(f_i = 0.5,\, \forall f_i \in U\)). Since, the number of circles and the circle membership is unknown in our dataset, we initialised the number of circles and the circle membership through the community detection algorithm [1], known as Louvain method. We chose [1] as it works on both directed (all_links, follower, following subnetworks) and undirected networks (friends subnetwork) as is the case in our setting, unlike the algorithm [2] used for community detection in [11], that works only on undirected networks.

Step 1 - Attribute Vector: To update the attribute vectors \(f_i\) of Unlabelled users (\(f_i \in U\)), and \(f_0\) of the ego user, we keep the association vector \(w_t\) and the circle assignment \(x_i\) constant. This reduces the Eq. 1 to a quadratic function as given by Eq. 2. We obtain the first order partial derivate for \(f_i\) and \(f_0\) from the quadratic equation, keeping all the other variables constant to arrive at Eqs. 3 and 4. \(f_0\) and \(f_i\) are iteratively updated until convergence using co-ordinate descent method, similar to the attribute inference work by [11].

Only \(f_{i,y}\) of \(f_i\) and \(f_{0,y}\) of \(f_0\) are updated considering \(w_t\), the association vector of Circle \(C_t\), has only one non-zero dimension \(w_{t,y}\). It is important to note that attributes are propagated only within the circle based on their attribute vector \(f_i\) and Influence (I).

$$\begin{aligned} \begin{aligned}&\sum \limits _{t=1}^K \{ \sum \limits _{e_{ij} \in E^{'}, v_i, v_j \in C_t} (w_t\cdot (f_i\cdot {{I_i}_{\equiv C_t}} - f_j\cdot {{I_j}_{\equiv C_t}}))^2 \\&+ \sum \limits _{v_i\in {C_t}} (w_t\cdot (f_0\cdot 1 - f_i\cdot {{I_i}_{\equiv C_t}}))^2 \} \\ \end{aligned} \end{aligned}$$
(2)
$$\begin{aligned} \begin{aligned}&f_{i,y} = \frac{f_{0,y} + \sum \nolimits _{e_{{ij}} \in E^{'}, v_j \in C_t} f_{j,y} \cdot {{I_j}_{\equiv C_t}}}{1 + \sum \nolimits _{e_{{ij}} \in E^{'}, v_j \in C_t} 1}, v_i \in U \cap C_t, w_{t,y} = 1 \\ \end{aligned} \end{aligned}$$
(3)
$$\begin{aligned} \begin{aligned}&f_{0,y} = \frac{\sum \nolimits _{t=1,w_t,y=1}^{K} \sum \nolimits _{v_{j\in C_t}} f_{j,y} \cdot {{I_j}_{\equiv C_t}}}{\sum \nolimits _{t=1,w_t,y=1}^{K} \sum \nolimits _{v_{j\in C_t}} 1},w_{t,y} = 1, \forall t=1, ...K \end{aligned} \end{aligned}$$
(4)

Step 2 - Circle Assignment: Circle assignment \(x_i\), of each linked_user is updated iteratively keeping the other two variables, the attribute vector \(f_i\) and the associate attribute \(w_t\) of each circle \(C_t\) constant. Intuitively, in every iteration, a given linked_user \(v_i\) belongs to that circle \(x_i\), which minimizes the objective function given by Eq. 1 the most else remain in the current circle if no other assignment can reduce it. Equation 5 tries to find the circle \(C_t\), for an Unlabelled user \(v_i \in U\), such that there are many connections within \(C_t\), with similar Influence \({I_i}_{\equiv C_t}\) as the current circle with which user is associated. Equation 6 finds the circle for a Labelled or Partially Labelled user \(v_i \in \left\{ L,P \right\} \), accounting the prior knowledge of \(f_i\) and Influence \({I_i}_{\equiv L_i}\). Intuitively, if \(v_i\)’s, attribute vector \(f_i\), has the associate attribute value \(w_t\) of a circle, then the user should belong to that circle. Influence \({I_i}_{\equiv L_i}\) in Eq. 5, sees to it that, a user does not belong to that circle whose associate attribute value \(w_t\), is not held by \(f_i\) of \(v_i \in \left\{ L,P \right\} \). Hence, the value of the Influence \({I_i}_{\equiv L_i}\), should be large.

$$\begin{aligned} \begin{aligned}&x_i = \arg \max \limits _{t=1,...,K} [\sum \limits _{e_{ij} \in E^{'}, v_{j} \in C_t} (1- (w_t \cdot (f_i\cdot {{I_i}_{\equiv C_t}} - f_j\cdot {{I_j}_{\equiv C_t}}))^{2}) \\&- (w_t\cdot (f_0\cdot 1 - f_i\cdot {{I_i}_{\equiv C_t}}))^2],\, v_i \in U \end{aligned} \end{aligned}$$
(5)
$$\begin{aligned} \begin{aligned}&x_i = \arg \max \limits _{t=1,...,K} [\sum \limits _{e_{ij} \in E^{'}, v_{j} \in C_t} (1- (w_t \cdot (f_i\cdot {{I_i}_{\equiv C_t}} - f_j\cdot {{I_j}_{\equiv C_t}}))^{2}) \\&- (w_t\cdot (f_0\cdot 1 - f_i\cdot {{I_i}_{\equiv C_t}}))^2 - ({I_i}_{\equiv L_i}\cdot (w_t\cdot f_i - 1)^2)],\, v_i \in L,P \end{aligned} \end{aligned}$$
(6)

Step 3 - Associate Attribute Value of a Circle: Keeping the circle assignment \(C_t\) and the attribute vector \(f_i\) of each user \(v_i\) fixed, we find the associate attribute value \(w_t\) of each circle, with the assumption that each circle has only one attribute value or dimension as 1. With a finite number of dimensions for the attribute values, we select the attribute value that best minimizes the objective function given by Eq. 1.

4 Experiments

4.1 Dataset

We used Google+ social network to evaluate the attribute inference performance of the proposed method. We chose Google+ as it is mid-way between the more popular counterparts, Facebook and Twitter, inheriting the best of both the worlds in it [14]. A total of 154 ego users and their direct followers and following links were crawled using Google+ API resulting in a total of 39860 Google+ users who had made their profiles public between April 2015 to July 2015 and Oct 2015 to Nov 2015. Since, we infer ego user’s attributes college, employment and current location, we crawled these attributes of all the users, if they were publicly available. The dataset consists of 18815 colleges, 37009 employers and 9426 current locations for attributes and a total of 539777 links from 39860 Google+ users.

In a real world setting, the number of linked_users who provide their attributes varies. Hence, as part of the experiments we inferred ego user attributes with different percentages (10%, 20%, 30% and 100%) of Labelled and Partially Labelled users. It has been shown from previous works on attribute inference [11, 16] that attributes of as little as 20% of linked_users is sufficient to infer ego user attributes with significant accuracy. Hence, we continue to hold 20% as the standard percentage of known linked_users against which attribute inference is tested.

Directed vs Undirected Social Networks – Works into attribute inference so far have concentrated on either using undirected networks [4, 11, 16] and generalizing it to both directed and undirected networks. Though few of the works use directed networks, they convert the directed networks into undirected networks by retaining only two-way links and two-way linked nodes as part of the dataset [8]. We do not resort to convert the directed Google+ data into undirected dataset by retaining links (\({v_i,v_j}\)) if and only if both edges (\({v_i,v_j}\)) and (\({v_j,v_i}\)) exist in the ego network. This was done in order to find the attribute inference ability with an ego follower subnetwork or an ego following subnetwork. But, by keeping all the links, there is a possibility of having many spurious links or spammers in the ego network. We are aware that there is a wealth of research in identifying and removing spammers from social networks [6, 9, 18]. The spamming activity is targeted and hence this targeted spamming will place the ego user at a risk of his attributes being exposed easily. It has also been noted in these studies that bringing down these spammer accounts will only be temporary and a complete spam block is not a realistic expectation for any social network. Hence, we chose to retain all the links for attribute inference.

It is important to note that we could not find any previous works using follower, following, or all_links subnetworks for attribute inference. Hence, to facilitate comparing our proposed method with previous works, we convert the Google+ dataset into an undirected network by retaining two-way links and two-way linked nodes to obtain the friend subnetwork. First, we present the results on converted undirected network or friend subnetwork. We then present the results of attribute inference on follower, following and all_links subnetworks by keeping the directedness of the links intact i.e., on directed network.

Baseline – We compare our proposed method with co-profiling approach for attribute inference by Li et al. [11] which has been shown to be better than the previous methods of attribute inference [13, 16, 22, 23]. As such, we compare our work with co-profiling approach for attribute inference [11]. Co-profiling approach (here after referred to as CP), infers both circle and attributes together as (i) both are dependant on each other and (ii) both necessitate inferring due to being missing or absent publicly in a social network. CP proposes that users within a circle share one or a few attributes. Some users known attributes are propogated within circles to determine the attributes of unlabelled friends and the ego user. We implemented the algorithm of CP and as part of the experiments, we tested it with the optimal parameter values as described in the paper.

Attribute Inference Strategy – Works so far have resorted to inferring ego user attributes independently, i.e., inferring one attribute for each run of the experiments. Except for CP which can take multiple attributes as input for inference, works by [13, 16, 22, 23] infer attributes independently. Since the proposed method can infer attributes independently and together, as part of the experiments we infer attributes using both approaches. We report the results for the proposed method as well as the baseline method of CP for both attribute inference approaches.

4.2 Experiment Results

We compare the proposed method with CP on friend subnetwork dataset and present the results in this section.

Table 1. Attribute inference accuracy with 20% Labelled and Partially Labelled users
Fig. 2.
figure 2

Attribute inference accuracy of Co-Profiling (CP) and the proposed method with attributes inferred together

Fig. 3.
figure 3

Attribute inference accuracy of Co-Profiling (CP) and the proposed method with attributes inferred independently

Table 1 provides the results of attribute inference with 20% of Labelled and Partially Labelled linked_users. We ran the attribute inference multiple times to avail mean and variances of the inference accuracy. Figures 2 and 3 provide the attribute inference results of the proposed method and CP. While Fig. 2 gives the inference accuracy of attributes inferred together, inference accuracy of attributes inferred independently is given in Fig. 3. We have the following observations.

Observation 1: Attribute inference accuracy of the proposed method is better across 10%, 20%, 30% and 100% of Labelled and Partially Labelled linked_users, when compared to CP as seen in Figs. 2 and 3. Proposed method utilizes influence of users in place of static parameters used in CP. Result shows the ability of the proposed method which takes advantage of the available mutual friend information to infer the Influence (I) and utilize it in attribute propagation and circle’s associate attribute determination, thus yielding better accuracy.

Observation 2: Accuracy of CP increases with the increase in Labelled users from 10% to 20% and then stagnates with no considerable increase in accuracy for 30% and 100%. Labelled users, who have filled in all three attributes is around 25% in our dataset while Labelled users is shown to be much lesser than 20% from previous work on directed social network [8]. Co-Profiling fails in utilizing the additional attribute information in the form of Partially Labelled linked_users in the ego network, which explains the stagnation of accuracy beyond 20%.

Observation 3: The variances of different runs are small as seen in Table 1. It is even more significant in the context of inference of all three (college, employment, location) attributes together as given by Table 1, rather than each attribute inferred independently. Small variance indicates the results to be reliable.

Observation 4: The accuracy of inference is much higher when the attributes are separately inferred (Fig. 3), one for each run, as against inferring all attributes at once (Fig. 2). Though inferring each attribute separately consumes more time, the gain in terms of accuracy increase warrants the attributes to be inferred separately rather than together. We find the proposed method to provide better accuracy than CP in both the cases.

5 Conclusion and Outlook

We have addressed the issue of privacy leakage in social networks, even though many user attributes are unfilled/hidden. Specifically, we have studied the problem of attribute inference in a directed social network in this paper. We concentrate on the ego network of a user and find the inference capabilities of friend, follower, following and all_links subnetworks of the ego user. We have shown the impact of influence (I) and utilising Partially Labelled users in attribute inference, with inference accuracy better than the previous attribute inference method.

Inferring attributes independently (one attribute for each run of the algorithm) has been the norm followed by multiple works. When attributes were inferred independently, in general the accuracy of inferring attributes increased or remained similar to when attributes were inferred together. Though, inferring attributes independently provides better accuracy, inferring attributes together provides an alternative with gains in terms of run-time.

As our future work, we would like to test the inference capabilities on other social networks and on the impact of inherent make-up of nodes. We would like to incorporate overlapping and hierarchical circles where a user may belong to multiple circles. Finally we would also like to test the proposed method on ego networks of larger sizes.