Towards intelligent complex networks: the space and prediction of information walks
Abstract
In this paper we study the problem of walkspecific information spread in directed complex social networks. Classical models usually analyze the “explosive” spread of information on social networks (e.g., Twitter) – a broadcast or epidemiological model focusing on the dynamics of a given source node “infecting” multiple targets. Less studied, but of equal importance is the case of singletrack information flow, wherein the focus is on the nodebynode (and not necessarily a newly visited node) trajectory of information transfer. An important and motivating example is the sequence of physicians visited by a given patient over a presumed course of treatment or health event. This is the socalled a referral sequence which manifests as a path in a network of physicians. In this case the patient (and her health record) is a source of “information" from one physician to the next. With this motivation in mind we build a Bayesian Personalized Ranking (BPR) model to predict the next node on a walk of a given network navigator using network science features. The problem is related to but different from the wellinvestigated link prediction problem. We present experiments on a dataset of several million nodes derived from several years of U.S. patient referral records, showing that the application of network science measures in the BPR framework boosts hitrate and mean percentile rank for the task of nextnode prediction. We then move beyond the simple information walk to consider the derived network space of all information walks within a period, in which a node represents an information walk and two information walks are connected if have nodes in common from the original (social) network. To evaluate the utility of such a network of information walks, we simulate outliers of information walks and distinguish them with the other normal information walks, using five distance metrics for the derived feature vectors between two information walks. The experimental results of such a proofofconcept application shows the utility of the derived information walk network for the outlier monitoring of information flow on an intelligent network.
Keywords
Information walk prediction Network measures Bayesian personalized ranking Patient referral network Outlier detectionIntroduction
It is common to represent interactions between people using a (social) network, which – regardless of the defining notion of connection – could produce a mathematical structure reflecting the possible paths for the flow of information between the actors. The persontoperson communication in such a network turns into a path (Wikipedia), or more accurately a walk (Wikipedia), since it is possible (and in many contexts even likely) for the “walker" (e.g., news) to revisit some person (node)^{1}. Indeed, multiple “visits" can provide a kind of reinforcement of the information of interest that might be relevant to its learning or absorption. This nodebynode (e.g., personafterperson) information spread model – a “singletrack" model – is a kind of epidemiological model but different from the classical diffusion/broadcasting models that are often used in the analysis of social media. A singletrack model of spreading will not produce the kinds of exponential growth (of “infected nodes”) in each round.
Singletrack information spread is appropriate to our particular interest: the problem of next visit prediction of a walker in a network. Our original motivation arises from research on physician collaboration networks built by referrals (An et al. 2018a), where two nodes/physicians are directly connected with a weighted edge if they have been visited by the same patients within a given time period. Patients “walk" this network in the course of a presumptive treatment event. A predictive application based on the features of such “referral sequences" (a given “visit" is enabled by the first physician making a referral for the patient to the second physician) may provide a better understanding of the process of collaboration among health professionals. Furthermore, precise prediction of the next visited physician may help with the efficient allocation of medical resources for a patient’s treatment. Other examples of singletrack information walks in different contexts include a traveler visiting preferred places, consumers traversing stores in a shopping mall, or the work history of an employee. In all these cases, the records of transitions define the corresponding network, where the existence and times of visiting some nodes in history will influence the future possibility of visiting (transition) between a pair of nodes. Indeed, a walker may be the first to ever traverse from one node to another – suggesting that these nodes did not connect each other in history records. Therefore, a more accurate framing is the problem of visit prediction for a walker in a state space. Since we assume a context of information sharing and metadata in specific domains, we name these information walks and the problem in our research as information walk prediction. In the above instances the entire walk up to the last node may directly affect the selection of the next visited node, so that this problem is generally not a memoryless Markov chain.
In previous work we analyzed a network of physicians (see An et al. (2018a, c)) based on U.S. patient referral records, which pave the way for research on information walks. Herein, exploiting both metrics proposed in our analysis (An et al. 2018a, c) and classical network science measures, we propose a numerical score to model the preference/attractiveness between the last observed node on an information walk and any possible candidate node in the network. This score takes multiple feature vectors from the targeted information walk as well as several groups of involved nodes. Based on the preference score, we apply a general Bayesian Personalized Ranking (BPR) framework to represent the goal of nextnode prediction in an objective function so that the problem could be solved by machine learning. We use a large U.S. patient referral network as an important realworld setting for applying our methodology. Several network science measures (e.g., node centrality) in the national physician network facilitate the prediction for a pair of nodes including those not directly linked in the past. The study and comparison of several models demonstrate that features of an information walk improve the BPR framework since it exceeds Long ShortTerm Memory (LSTM) and other metrics reflecting standards used in link prediction research. The main reasons for such an improvement is the inclusion of network science measures with other metadata features.
This paper moves beyond our published paper (An et al. 2018b) in the Complex Networks 2018 conference through its introduction and use of the relationships among multiple ongoing information walks. We also investigate the space of information walks with a network science model, in which each node represents an information walk and an edge connects two nodes of information walks if they share at least one common node (e.g., the same physician) in the originating network. We find several significant patterns in the new network of information walks and verify them via a statistical test.
A key contribution is our identification of criteria to label an information walk with different structural patterns in the network of information walks as an outlier. We use a simulationbased test of information walk outliers in the network of information walks in order to (1) demonstrate the efficacy of the model for the information walks network; (2) complement the proposed BPRIW model of walk prediction since the users of an intelligent network platform may not have time to focus on every walk and check the prediction of its future direction while an overall outlier detection function could filter some “abnormal” or “new” walks and remind users to check. In related work (Savage et al. 2014; Ranshous et al. 2015; Eswaran and Faloutsos 2018; Takahashi et al. 2011), researchers have targeted different parts in a graph to build a specific outlier detection algorithm, including nodes, subgraphs, separate pointtopoint edges (e.g., TCPIP communication, connections between new accounts in social networks). Herein we are the first to implement outlier detection for a whole information walk, which differs from prior work due to the existence of the same single “walker” or information flow along the sequence of visited nodes.
We simulate the outlier information walks with random replacement of their nodes, explore the measures of an information walk in a network of information walks, and design five distance metrics (based on the walk features) within a general outlier detection framework to distinguish the simulated (outlier) information walks from those actually observed. Moreover, since an outlier information walk may be an abnormal or creative (e.g., new treatment procedure) case, the initial results suggest a way to contribute to a more intelligent network via outlier detection for ongoing information walks, which complements our proposed BPR walk prediction model.
“Related works” section surveys related works about information diffusion and outlier detection in a network. In “Proposed models” section, we introduce the BPR framework with a preference score for the task of information walk prediction, and detail the construction of a network of information walks and the model of outlier detection for information walks. “Evaluation of walk prediction” section provides the details of our walk prediction experiment. “Patterns in network of information walks” section analyzes the network of information walks and reports key patterns within it. “Simulation test of outlier detection” section shows the result of a simulation test of information walk outlier detection. Finally, “Conclusions” section summarizes our findings and suggests directions of further research.
Related works
The focus of our work is related to but different from the wellinvestigated problem of link prediction, in which given a dynamic network observed at a time point, (possibly new) links in the future are predicted (e.g., Martínez et al. (2017)). Another related problem is finding missing links (Nakagawa and Shaw 2004). Traditional link prediction models (e.g., Adamic and Adar (2003) and LibenNowell and Kleinberg (2007)) usually only rely on a form of node similarity derived from network topology and generally ignore the whole (information) walk. Many past works target the problem of multitrack spreading or broadcasting in a directed acyclic graphs (DAGs), while our proposed information walk model allows the existence of a loop. Representative works include the Independent Cascade (IC) model (Kimura and Saito (2006) and Bourigault et al. (2014)), the Linear Threshold (LT) model (He et al. 2012), and probabilistic methods (GomezRodriguez et al. 2011; Myers and Leskovec 2010). In addition, past works do not consider observed information walks as a part of their key inputs. In contrast, we incorporate information walks using summary measures of network features in the corresponding network. Diffusion models are clearly different, and they have been introduced to the research of epidemics (Raj et al. 2012).
In other domains, several applications (most notably online shopping or search) try to predict a visit to a next “item”. The general BPR model (Rendle et al. 2009) previously has been introduced to online shopping (Rendle et al. 2010) to serve users with personalized goods recommendations. A common problem has been to predict the next place of work of a given employee in a labor pool using LSTM (Li et al. 2017) or a “gravity law" (James et al. 2018) based approach. Choi et al. (2016) applied deep learning to estimate the next medication code in a course of treatment by combining codes of medical treatment and physician visiting records to obtain a comprehensive feature representation.
The idea of state transition (e.g., Leibon and Rockmore (2013)) has been tested for an accurate recommendation. Recently, a Transitionbased Factorization Machine model (TFM) (see He et al. (2017) and Pasricha and McAuley (2018)) was used to predict the next state in an abstract space of items for users. In contrast to the TFM model, our proposed preference score model considers network science measures and shows the benefits of incorporating them with other metadata features (more details can be found in “Evaluation of walk prediction” section).
Outlier detection (i.e., anomaly detection) is a thoroughly investigated problem in the field of applied machine learning. A survey paper (Ranshous et al. 2015) reviewed diverse methods of outlier detection in a graph from multiple perspectives, including nodes, edges, subgraphs, and changes due to an event. Another survey (Gupta et al. 2014) applied outlier detection methods to temporal data. Recently, researchers targeted the outlier “bridge edges" in a streaming of graphs (Eswaran et al. 2018; Aggarwal et al. 2011) or a streaming of separate pointtopoint edges (Eswaran and Faloutsos 2018; Takahashi et al. 2011). Our proposed outlier detection method deals with the new target of information walks, which considers both overall structural and temporal patterns. Past work on scan statistics (Park et al. 2009) for a single abnormal edge detection has focused on predicting local patterns at every step. In contrast, we target the whole information walk and use time series features derived from a network of information walks to inform outlier detection.
There is an extensive statistical literature (Barnett and Lewis 1974; Hido et al. 2011; Hodge and Austin 2004) on outlier detection, including that for longitudinal data. In statistical methodology and applications, outlier detection is often characterized by the study of residuals or other measure of deviation of the estimated or fitted values of a model from the observed (i.e., true) values. For example, if the data are assumed to follow a normal distribution law, one might compute the number of residual standard deviations away from the mean in order to rank the observed residuals from most to least indicative of an outlier. In nonparametric statistical models, a statistic for measuring the distance of an observation from the value most expected in the absence of outliers is specified from the onset, as opposed to being implied by the modeling assumptions. We propose new distance metrics to be maximized by a proximity based outlier detection algorithm. As this is the first paper to introduce an information walk network, we assess our outlier detection methods for the purpose of demonstrating the utility of an information walk network rather than seeking to find the optimal outlier detection method among several options (e.g., algorithmic and statistical models). This later task may be undertaken once the concept of an information walk network has been proven.
Proposed models
In this section, we begin with a preference model for information walk prediction, then describe how to build a network of all information walks, and a proximitybased unsupervised framework for information walk outlier detection.
Given an observed information walk in a directed network, the first task is to predict the next visited node. To do so we build a numerical preference/attraction score for the observed part of an information walk (including the last node visited and an overall feature comprising all past visited nodes) and any possible nextvisited candidate. Therefore, when predicting which node would be more likely to be visited by a walker, we can compute and sort the preference/attraction scores over all candidate nodes. We then pick out a small number which have a comparatively large score. As a result, this prediction framework allows for the convenient detection of possible choices from the returned list (see Fig. 2 for an illustration of the identification process). “Results” section evaluates the performance of the prediction in terms of a returned candidate list. The definition of a preference score is a key component of the algorithm.
To formalize the problem, let P denote the set of all chronological node sequences (i.e., information walks). For an information walk i∈P, p_{i} represents the feature vector of the observed sequence of nodes at a time point T, c_{i} refers to the last node on information walk i before time point T, f_{i} is the first node on information walk i after T (i.e., the actual next visited node). Let J represent the set of possible candidates, which could cover a wide range of nodes, even the whole network except c_{i}, or just a subset of nodes in the network after filtering to speed up the computation if the network is large. X(p_{i},c_{i},j) denote the preference/attraction score between the last observed node c_{i}, the overall walk feature p_{i} and a candidate j∈J for the next node. We aim to derive an objective function and train the preferencerelated parameters to make X(p_{i},c_{i},f_{i})>X(p_{i},c_{i},j) for as many candidates j∈J (and j≠c_{i}) as possible. If so, it indicates that a model predicts the next node on an information walk (i.e., the future direction in a network space) more accurately.
Diverse groups of network science features, either exogenous (metadata) or endogenous (topological) about the observed walk, may boost the accuracy of information walk prediction. Our published papers (An et al. 2018a, c) offer groups of such features useful for building our new preference score model. Table 3 shows a detailed list of features used in prior walkprediction analyses.
Preference/Attraction score
Dimension of the model parameters/features in Eq. 1
Feature  Dimension  Note  Parameter  Dimension  Note 

p  M×1  Information walk  V  M×N  Walknode interaction 
β  N×1  Last node  S  M×H  Walknode interaction 
f,γ  H×1  Ground truth / candidate  U  N×H  Node interaction 
d  L×1  Profile similarity  w  1×L  Profile weight 
Equation 1 takes multiple factors into consideration when predicting the next visited node on an information walk. S,V represent the interaction between the initial part of the walk and the candidate/ground truth node, respectively, while U describes the extent of matching between the candidate and the last node on the walk which might influence the decision of the future direction. Network science provides the widely applicable features p,β,f,γ, since they can be computed from the topological structure of a network, regardless of the type of metadata in the network. As the profile distance d relies on the context (e.g., physician specialty), we distinguish it from the other features.
Learning BPRIW model
where σ represents the sigmoid function σ(x)=(1+ exp(−x))^{−1}. Using the sigmoid function, the gap between two preference scores for two candidate nodes is mapped into the interval (0,1) so that the loss function is defined even if the gap diverges to infinity when computing the optimal model parameters. The components of σ, \(\hat {X}(p_{i}, c_{i}, f_{i})  \hat {X}(p_{i}, c_{i}, j)\), describe the gap in the preference scores between the ground truth of the current walk, f_{i}, and another possible candidate, j. P_{train} refers to the training set information walks. In the objective function (2), Θ is a general set parameter to be learned in the training process, such as V,S,U,w introduced by Eq. 1. We can use several random matrices/vectors drawn from a multivariate Gaussian distribution as initial values. The values of the model parameters will be optimized in the iterative training process. As the last item, λ_{Θ} regularizes the objective function to avoid overfitting.
Size of training, test and candidate sets at different time points in 2011, which are derived from the TDI dataset
Date of observation T  P _{ train}  P _{ test}  Candidate nodes J 

03/01  17.6K  18.7K  16.6K 
05/01  51.8K  19.5K  17.3K 
07/01  83.7K  16.6K  14.9K 
09/01  113.1K  15.8K  14.3K 
11/01  142.4K  16.8K  15.1K 
Features about information walks and related nodes including applicable network measures, new metrics defined by our past analysis (An et al. 2018a), and a few from the metadata of medical treatment records, such as Relative Value Units (RVU) of medical service
Group  Measures 

Information walk p  Number of nodes on it, time range, pairs of mutually connected nodes, sum of RVU for all visiting, number of visited hospitals, average node PageRank values 
Next node/candidates (j and f)  Clustering coefficient, PageRank, Hindex, number of initiated cross hospital referral region referrals 
Last node c  Beyond the features in the group of next node/candidate: time gap with last occurrence, RVU, a binary flag of multiple occurrences on the walk, a binary flag of working in the same hospital previous physician (node) 
Metadata for profile similarity d(c,j)  Indicators of the same specialty/residency hospital/hospital referral region, number of referrals in history. 
Network of information walks
An information walk not only transmits information across nodes, but also connects the neighboring nodes in the space of all information walks. Here we define the network of information walks to model the space of all information walks, in which a node represents an information walk and two nodes are connected when they share at least one node in the originating social network.
The network of information walks should be very dense. In the originating social network, node degree refers to the number of directly linked neighbors, but in the network of information walks, it corresponds to the number of nodes (information walks) that are connected by the original node, and sets a lower bound of the size of a clique, since other nodes in the originating social network may extend the clique.
Outlier detection for information walks
An information walk expands node by node. Thus the evolution of an information walk could be presented by a series of cumulative feature vectors at each timestamp when the walker visits a new node. The full list of applied features will be introduced in “Simulation test of outlier detection” section. We present the general algorithmic outlier detection framework in Algorithm 1, which requires a distance metric function between any pair of information walks.
Algorithm 1 is an unsupervised proximitybased outlier detection framework. The key idea is to compute an “outlierscore" for each IW to pick the K information walks with the largest K scores. The data preparation step refers to Line 1 in Algorithm 1, where we compute the time series features for every information walk. In Lines 210, we compute the pairwise distance between two information walks with some metric function (introduced later in this subsection) and treat the distance to the Mth nearest neighbor as the outlier score. Finally, in Lines 1112, we sort the outlier score to get the TopK candidates of outliers. Under a time complexity of O(n^{2}), Algorithm 1 more easily adapts to diverse kinds of proximity measures than statistical outlier detection methods that are reliant on assuming probability distributions of the residuals and models the degree to which IW is an outlier. A drawback is that the algorithm might be sensitive to the choice of M when defining the outlier score, making it necessary to tune the parameter M for each experiment.

Sliding substring matching (SSM). To match the shorter array SA, enumerate all slength consecutive subarrays from LA and take the minimum Manhattan Distance between a subarray in LA and the SA.

Edit distance/Dynamic Time Wrapping (ED/DTW). Equation 5 describes the state transition equation for the dynamic programming model, in which d(i,j) is the distance between the first i units in LA and the first j units in SA. The initial settings are d(i,0)=i×λ for i∈[1,l]d(0,j)=j×λ for j∈[1,s]. λ is the penalty factor to represent the cost of skipping a unit in an array. After the process of dynamic programming in Eq. 5, the value of d(l,s) is our desired distance.

Interpolation. Treats LA and SA as several discrete samples from a function of time in the interval [0, 1], in which the first unit in LA and SA is at zero while the last unit is at one. The rest of the nonextreme units are allocated with an equal interval. For example, if LA=[0.1,0,2,0.3,0,4,0.5], the corresponding time intervals would be (0,0.1),(0.25,0.2),(0.5,0.3),(0.75,0.4),(1.0,0.5). To align SA and LA we take the simple linear interpolation for the corresponding points of LA to get new points that have the same timeindex with SA. Finally, we compute the pairwise Manhattan Distance.

Longest common substring (LCS). The LCS method originally aims to find the longest subsequence common to two strings. In contrast to substrings, subsequences are not required to occupy consecutive positions within the original sequence. Two numerical units are treated as equal if their abstract distance is less than the threshold.

Sliding substring averaging (SSA). Starting from the first node in LA, set a sliding window of length of l−s+1 and extract the average of those units in LA covered by the sliding window. The sliding window moves right one unit each iteration to generate s values from LA, so that it is able to compute the distance between the derived values and SA.
Evaluation of walk prediction
Dataset
The data for our analyses are the U.S. Medicare beneficiary insurance claims for a subgroup of patients over 2007–2011. These data, obtained via a data use agreement by TDI^{2}, contain the patientphysician visiting records for patients who suffered from cardiovascular disease. With this information we are able to link physicians according to patient ID. A referral is defined as the event in which two physicians are visited by the same patients within a short time interval. We then derive a sequence of referrals (An et al. 2018a) for the same patient, which corresponds to an information walk in the professional network of physicians. Because a referral sequence may have loops (e.g., A to B to Afor two physicians), some nodes may be revisited. Therefore, the referral sequence corresponds to an information walk on the physician network rather than a “path” without repeated nodes.
For the set of information walks P in a year, given an observation time point T we build the training set P_{train} to store the walks ending before T. The test set P_{test} includes the walks that are ongoing at T. Figure 2 illustrates two examples. Since information walk A terminates before time point T, it is in the training set P_{train}. At the time point T, a node on walk B is still passing information to the next node, so walk B belongs to the test set P_{test}. In A and B, the observed red nodes contribute to the overall information walk feature p. For a walk in P_{train}, all nodes but the last one belong to the observed part, while the last node serves as the ground truth f. The candidate set J contains the ground truth f of all walks in P_{test}; thus it randomly samples a subset of nodes in the whole network.
The U.S. physician collaboration network derived from the TDI dataset provided 4.66M information (referral) walks in 2011. The training and test set are defined as information walks with at least six visits. Table 2 presents the size of training and test sets at several timepoints T, as wells as the candidate node set J. The size of P_{train} increases from March to November, since it contains all information walks ending before T.
Table 3 groups by the related measures of an information walk p, the feature vector γ of a candidate that of the ground truth f, the feature vector β of the last node c on an information walk. d(c_{i},j) refers to profile similarity between two physicians. Each group contains several representatives of the full list explained in our past works (An et al. 2018a, b). We picked the above measures as they boosted predictive performance in other applications (e.g., the result of medical treatment along an information walk (An et al. 2018a)). To mitigate concerns about reversecausality and to avoid the possible problem of predicting a variable with input features in the future, when we extract features of an information walk in some year (e.g., 2010), we use node centrality measures derived from the network in the previous year (e.g., 2009).
Baseline methods
In addition to our proposed BPRIW, the models/metrics below also generate a preference score X between a candidate node j and the last node c, so they could sort their available candidate nodes for a topK subset as the prediction result.
Most popular (MP). X(c,j)=e(c,j) It takes the edge weight in history between c and a directly connected neighbor. It could be the number of referrals between two physicians. However, the range of candidates is limited.
The performance of traditional link prediction methods are used as benchmarks against which to compare the new methods. Such methods include Common neighbors (CN) (Lorrain and White 1971), Preferential attachment index (PA) (LibenNowell and Kleinberg 2007), AdamicAdar index (Adamic and Adar 2003) and Jaccard index (Jaccard 1901). Notably, these similarity metrics do not incorporate the other nodes on the observed part of an information walk, and are only applicable for the neighbors that interacted with node c before. However, our BPRIW model extends the range of possibly predicted candidates, even without a direct edge or common connected nodes with the last node c.
Markov Chain (MC) (Rendle et al. 2010). X(c,j)=Prob(c.next=jc,c.prev) The twogram version incorporates the secondtolast node c.prev so as to compute the frequency of state transition.
Long shortterm memory (LSTM). Given the corresponding node sequence of an information walk, we treat the features of all nodes (in Table 3) as the time series inputs into a LSTM model (Hochreiter and Schmidhuber 1997). We aim to explore whether the LSTM model could learn the hidden patterns based on the past nodetonode transitions to yield an output tensor that is very close to the ground truth f. However, the hitrate of LSTM is lower than 0.01 under all parameter settings in our experiment. Another paper (Choi et al. 2016) reported a similar level of failure of LSTM when predicting the next medical visit.
The hitrate (defined in Eq. 7) of TFM on the TDI referral data is less than 0.01 under all experimental settings, including an overall \(\overrightarrow {y}\) with our proposed network measures and a comparatively plain \(\overrightarrow {y}\) with three IDs only (walker, current and next node). The majority of the nodes in the network of physicians have a small node degree (< 4). Therefore, in such a coldstart environment TFM may not perform as well as that on a dense dataset (Pasricha and McAuley 2018) consisting of frequent users and a part of nodes. Meanwhile, when most of the applied network measures are not categorical, TFM does not make full use of its advantage of dealing with the features in onehot encoding. TFM enumerates all possible pairs of feature interactions, but some of them may not boost the prediction. As a highlight of TFM, it is better for the latent transition vector \(\overrightarrow {v}^{'}\) to depend on the past track (i.e., observed walk).
BPRnoIW. X(c,j)=wd(c,j). As a comparative method to BPRIW, this model only takes the item of physician profile similarity in Eq. 1 to show the power of the other network science measures about an information walk and the related nodes.
As our main purpose is to prove the significance of network science measures for information walk prediction, we leave further development of Factorization Machine (Rendle 2010) based preference models to future work.
Results
Figures 3 and 4 show the HR and MPR at several time points in 2011 for BPRIW and other baselines, under the setting of K=20 in the returned list. In terms of HR, BPRIW beats the others and BPRnoIW performs the second best. The other baseline methods get close hitrate values between 0.3 and 0.4. In addition, BPRIW and BPRnoIW get the smallest MPR, which suggests the ground truth f would be located near the top of the returned list. For most of the models, the different observation time points do not result in obvious gaps in HR or MPR.
Figures 5 and 6 show the impact of K on HR and MPR on the same day of observation. Note that in Fig. 6, MPR will be 1.0 for all models if the only returned candidate (K=1) hits the ground truth. When K increases from 1 to 20, most of the models predict the next node better because the HR increases as well. For our proposed BPRIW model, under the setting of K=20, the HR is over 0.7 for the test set P_{test} with 10K+ information walks. For MPR, nonBPR models are almost stable when K increases, but BPRIW and BPRIW display a decreasing MPR from 0.4 to 0.2. As a result, it may be more desirable to choose a slightly larger K for BPR related models so that the walk prediction system could present more possible candidates to users, including the key node of ground truth f. We compare those models with different K values since it is relevant to user experience and needs to be accounted for in the design of a real application, like the number of pages returned on a webpage in response to a search query.
Figures 7 and 8 show the HR and MPR on 07/01 from 2008 to 2011, respectively. It seems that all models perform very stable on the same day in those years, which tends to support that the network structure in the years of 20082011 may be steady as well.
The BPRIW recall of five groups of information walks with several K on 07/01/2011
K  0 to 20%  20 to 40%  40 to 60%  60 to 80%  80 to 100% 
(a) Length (i.e., number of nodes).  
1  0.208  0.202  0.201  0.196  0.174 
5  0.462  0.448  0.459  0.436  0.430 
10  0.605  0.588  0.601  0.565  0.564 
20  0.720  0.710  0.727  0.692  0.690 
(b) Time range (i.e., days of the gap between the first and the last physician visiting).  
1  0.191  0.207  0.194  0.216  0.215 
5  0.445  0.449  0.454  0.459  0.474 
10  0.595  0.590  0.584  0.598  0.613 
20  0.727  0.718  0.698  0.714  0.711 
Our initial experiments illustrate that features derived from network science and time series analysis for the nodes on an information walk greatly boost HR at the cost of only a slightly larger MPR. We believe it is more desirable and necessary to present the ground truth node to users than the comparative ranking within the list. Therefore, BPRIW performs the best in our experimental settings. The classical link structure based metrics do not predict as well as BPRIW, since they do not consider the feature p of the whole observed information walk. In addition, they are able to find candidates from the connected or other nearby nodes only, according to the network in history. The BPR framework does not predict the next node directly with a state transition probability. However, the output of relative ranking is enough for the users who do not want to figure out the quantitative reasons behind the prediction. From the perspective of network research, we greatly recommend the application of network measures and the derived information walk features for further related projects. In addition, metadata also provides important features, since the dataspecific features (e.g., physician profile similarity) appear presumably to help with successful prediction in the BPRnoIW model.
Patterns in network of information walks
 Citing the notion of pathhomotopy from algebraic topology, we focus on a pair of homotopic information walks as two information walks which share the common starting and ending nodes in the physician collaboration network. Because of the existence of two guaranteed common nodes, the homotopic information walks are more closely connected in the network of information networks than a pair of nonhomotopic walks. Table 5 shows the comparison, in which all the measures are found to be significantly different by a twosample ttest.Table 5
Comparison of three edge weights in the network of information walks, between the edges connecting homotopic walks and the others connecting two nonhomotopic walks
Jaccard index
Number of visiting records
Sum of RVU
Homotopic pairs
0.552
24.48
45.00
Nonhomotopic pairs
0.234
10.28
19.65

“Lifting” refers to a shortcut of a longer information walk. Assuming a longer information walk contains three consecutive nodes A→X→B, another shorter walk contains A→B, and the rest of the nodes are the same, we treat the two walks as a pair of lifting walks. In the first quarter of 2011, there are 76K pairs of homotopic walks, and the shorter base walks have an average PageRank value of 1.07×10^{−5} while the longer extended walks have an average PageRank value of 1.20×10^{−5}. Meanwhile, when putting the middle node X between A and B in the originating physician collaboration network, we find a significant difference in the resulting PageRank centrality of the nodes. The order is X < A < B.

Information walk composition exists among three groups of information walks. The first group ends with two nodes A→B, the second one starts with two nodes B→C, and the third contains the three nodes A→B→C in the middle of the corresponding physician (node) sequence. Those three groups of information walks have significantly different PageRank values in the network of information walks, which are: the first group 1×10^{−5}, the second group 9.8×10^{−6}, the third 1.09×10^{−5}.
Simulation test of outlier detection
Since the information walks in our physician collaboration network do not have a gold standard comparator, we evaluate the framework of outlier detection and five distance metrics on a mixed set of the originating observed information walks and the simulated outliers. We exploit the training set at a time interval of observation defined by Fig. 2 to get the neighbor (i.e., directly connected) list of every past node (physician). We then take all the information walks beginning within one month of the focal observation to sample from in order to form mixed set. Taking the observation date as 20100301 as an example, from the IWs beginning in April 2010 we randomly pick up 2500 IWs as the normal cases and the other 2500 IWs to generate outliers. To simulate an outlier, we keep the original starting and ending nodes of an IW but randomly replace all the middle nodes with others from the set of nodes located on a pool of IWs. The analysis period begins in the month following the observation period to provide the pool of IWs for node replacement. In this way, for a general test without a specific definition of an outlier information walk, the replacement operation at least alters the track of the whole information walk to some degree, but retains the basic source and target nodes.
 (1)
Number of connected nodes in the network of information walk. Represent the set of nodes (walks) with the ongoing IW as the walksubnetwork.
 (2)
Number of physicians which are the neighbor of at least one physician on the ongoing information walk.
 (3)
Number of physicians which are the neighbor of at least one physician on a walk in the walksubnetwork.
 (4)
Average number of covered physicians: the value of measure (3) over that of measure (1).
 (5)
Average Jaccard index weight of those edges within the walksubnetwork.
 (6)
Network strength of the walksubnetwork, in terms of the weight of the number of common physicians.
 (7)
Variance of the edge weights in the walksubnetwork centralization, using the number of common physicians as weights.
 (8)
Transitivity of the walksubnetwork using the binary undirected edge.
 (9)
Survival rate of edges in the walksubnetwork if the current IW (i.e. a node) is removed. Denote the left edges and their connected nodes as the remaining walksubnetwork.
 (10)
Edge density in the remaining walksubnetwork.
 (11)
Size of the largest connected component in the remaining walksubnetwork.
Conclusions
In this paper, we exploit the sequence of referrals in a physician collaboration network to solve the problem of nextnode prediction on singletrack information walks from a network science perspective, explore the network of multiple information walks, and implement a simulation test of information walks outlier detection to support the general idea of an information walk network.
We consider both newly derived information walk features and classical node centrality features to build a BPRIW model of preference/attraction. The networkbased measures yield a flexible BPRIW model that identifies more possible candidate nodes than the traditional static link prediction method, because in BPRIW it is not necessary for the last observed node to be directly connected with a candidate. BPRIW works well on the TDI referral dataset according to a sensitivity analysis which tests both hitrate and mean percentile ranking across multiple factors, such as the time point (within and crossyear) of observation and the number of nodes in the returned list. BPRIW could be conveniently applied to other datasets, where network science measures will probably successfully model the structures and relationships among a set of items and nodes.
The network of information walks have several significant patterns (e.g., high clustering coefficient) and provide several features for the simulation test of outlier detection, in which the Edit Distance/Dynamic Time Wrapping based metric performs the best over all metrics in a general proximity based unsupervised framework. Anticipated future work includes the prediction of real outliers defined by domain experts and the subsequent deployment of such an intelligent information walk prediction and detection system.
Footnotes
Notes
Acknowledgements
The authors thank Jonathan S. Skinner at Geisel School of Medicine of Dartmouth College for help in obtaining the data.
Funding
Research for the paper was supported by NIH grants U01 AG046830 and P01 AG019783.
Availability of data and materials
The datasets analyzed for this paper are made available by The Dartmouth Institute for Health Policy and Clinical Practice (TDI). The use and analysis of these data followed the stipulations of a Data Use Agreement and the privacy restrictions of Protected Health Information (PHI).
Authors’ contributions
All authors discussed the features & models, edited and approved the final manuscript. CA implemented the models.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
 Adamic, LA, Adar E (2003) Friends and neighbors on the web. Soc Netw 25(3):211–230.CrossRefGoogle Scholar
 Aggarwal, CC, Zhao Y, Philip SY (2011) Outlier detection in graph streams In: IEEE 27th International Conference on Data Engineering, 399–409.. IEEE, New Jersey.Google Scholar
 An, C, O’Malley AJ, Rockmore DN (2018a) Referral paths in the US physician network. Appl Netw Sci 3(1):20.Google Scholar
 An, C, O’Malley AJ, Rockmore DN (2018b) Walk prediction in directed networks In: International Conference on Complex Networks and Their Applications, 15–27.. Springer, Cham.Google Scholar
 An, C, O’Malley AJ, Rockmore DN, Stock CD (2018c) Analysis of the us patient referral network. Stat Med 37(5):847–866.Google Scholar
 Barnett, V, Lewis T (1974) Outliers in StatisticalData. Wiley, New Jersey.Google Scholar
 Bourigault, S, Lagnier C, Lamprier S, Denoyer L, Gallinari P (2014) Learning social network embeddings for predicting information diffusion In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, 393–402.. ACM, New York.CrossRefGoogle Scholar
 Choi, E, Bahadori MT, Searles E, Coffey C, Thompson M, Bost J, TejedorSojo J, Sun J (2016) Multilayer representation learning for medical concepts In: Proceedings of the 22nd ACM SIGKDD Conference, 1495–1504.. ACM, New York.Google Scholar
 Efron, B (1992) Bootstrap methods: another look at the jackknife In: Breakthroughs in Statistics, 569–593.. Springer, Germany.CrossRefGoogle Scholar
 Eswaran, D, Faloutsos C (2018) Sedanspot: Detecting anomalies in edge streams In: 2018 IEEE ICDM, 953–958.. IEEE, New Jersey.Google Scholar
 Eswaran, D, Faloutsos C, Guha S, Mishra N (2018) Spotlight: Detecting anomalies in streaming graphs In: Proceedings of the 24th ACM SIGKDD, 1378–1386.. ACM, New York.Google Scholar
 GomezRodriguez, M, Balduzzi D, Schölkopf B (2011) Uncovering the temporal dynamics of diffusion networks In: Proceedings of the 28th International Conference on International Conference on Machine Learning, 561–568.. Omnipress, Wisconsin.Google Scholar
 Gupta, M, Gao J, Aggarwal CC, Han J (2014) Outlier detection for temporal data: A survey. IEEE Trans Knowl Data Eng 26(9):2250–2267.CrossRefGoogle Scholar
 He, R, Kang WC, McAuley J (2017) Translationbased recommendation In: Proceedings of the 11th RecSys Conference, 161–169.. ACM, New York.Google Scholar
 He, X, Song G, Chen W, Jiang Q (2012) Influence blocking maximization in social networks under the competitive linear threshold model In: Proceedings of the 2012 SIAM International Conference on Data Mining, 463–474.. SIAM, Pennsylvania.CrossRefGoogle Scholar
 Hido, S, Tsuboi Y, Kashima H, Sugiyama M, Kanamori T (2011) Statistical outlier detection using direct density ratio estimation. Knowl Inf Syst 26(2):309–336.CrossRefGoogle Scholar
 Hochreiter, S, Schmidhuber J (1997) Long shortterm memory. Neural Comput 9(8):1735–1780.CrossRefGoogle Scholar
 Hodge, V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126.CrossRefGoogle Scholar
 Jaccard, P (1901) Étude comparative de la distribution florale dans une portion des alpes et des jura. Bull Soc Vaudoise Sci Nat 37:547–579.Google Scholar
 James, C, Pappalardo L, Sirbu A, Simini F (2018) Prediction of next career moves from scientific profiles. arXiv preprint arXiv:1802.04830.Google Scholar
 Kimura, M, Saito K (2006) Tractable models for information diffusion in social networks In: European Conference on Principles of Data Mining and Knowledge Discovery, 259–271.. Springer, Cham.Google Scholar
 Leibon, G, Rockmore DN (2013) Orienteering in knowledge spaces: The hyperbolic geometry of Wikipedia mathematics. PLoS ONE 8(7):67508.CrossRefGoogle Scholar
 LibenNowell, D, Kleinberg J (2007) The linkprediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031.CrossRefGoogle Scholar
 Li, L, Jing H, Tong H, Yang J, He Q, Chen BC (2017) Nemo: Next career move prediction with contextual embedding In: Proceedings of the 26th International Conference on World Wide Web, 505–513, Australia.Google Scholar
 Lorrain, F, White HC (1971) Structural equivalence of individuals in social networks. J Math Sociol 1(1):49–80.CrossRefGoogle Scholar
 Martínez, V, Berzal F, Cubero JC (2017) A survey of link prediction in complex networks. ACM Comput Surv (CSUR) 49(4):69.Google Scholar
 Myers, S, Leskovec J (2010) On the convexity of latent social network inference In: Advances in Neural Information Processing Systems, 1741–1749.. Neural Information Processing Systems, San Diego.Google Scholar
 Nakagawa, Y, Shaw R (2004) Social capital: A missing link to disaster recovery. Int J Mass Emergencies Disasters 22(1):5–34.Google Scholar
 Park, Y, Priebe C, Marchette D, Youssef A (2009) Anomaly detection using scan statistics on time series hypergraphs In: Link Analysis, Counterterrorism and Security (LACTS) Conference, 9.. SIAM, Pennsylvania.Google Scholar
 Pasricha, R, McAuley J (2018) Translationbased factorization machines for sequential recommendation In: Proceedings of the 12th RecSys Conference, 63–71.. ACM, New York.Google Scholar
 Raj, A, Kuceyeski A, Weiner M (2012) A network diffusion model of disease progression in dementia. Neuron 73(6):1204–1215.CrossRefGoogle Scholar
 Ranshous, S, Shen S, Koutra D, Harenberg S, Faloutsos C, Samatova NF (2015) Anomaly detection in dynamic networks: a survey. Wiley Interdiscip Rev Comput Stat 7(3):223–247.MathSciNetCrossRefGoogle Scholar
 Rendle, S (2010) Factorization machines In: 2010 IEEE ICDM, 995–1000.. IEEE, New Jersey.Google Scholar
 Rendle, S, Freudenthaler C, Gantner Z, SchmidtThieme L (2009) Bpr: Bayesian personalized ranking from implicit feedback In: Proceedings of The TwentyFifth Conference on Uncertainty in Artificial Intelligence, 452–461.. AUAI Press, New Jersey.Google Scholar
 Rendle, S, Freudenthaler C, SchmidtThieme L (2010) Factorizing personalized Markov chains for nextbasket recommendation In: Proceedings of the 19th World Wide Web Conference, 811–820.. ACM, New Jersey.CrossRefGoogle Scholar
 Savage, D, Zhang X, Yu X, Chou P, Wang Q (2014) Anomaly detection in online social networks. Soc Netw 39:62–70.CrossRefGoogle Scholar
 Takahashi, T, Tomioka R, Yamanishi K (2011) Discovering emerging topics in social streams via link anomaly detection In: 2011 IEEE ICDM, 1230–1235.. IEEE, New Jersey.Google Scholar
 Theil, H (1992) A rankinvariant method of linear and polynomial regression analysis In: Henri Theil’s Contributions to Economics and Econometrics, 345–381.. Springer, Germany.CrossRefGoogle Scholar
 WikipediaPath (graph Theory). https://en.wikipedia.org/wiki/Path_(graph_theory). Accessed Jan 2019.
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.