Encyclopedia of Social Network Analysis and Mining

Living Edition
| Editors: Reda Alhajj, Jon Rokne

Paths in Complex Networks

• Mareike Bockholt
• Katharina A. Zweig
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4614-7163-9_110183-1

Glossary

Path

A path is a sequence of nodes and edges in a graph such that each node and edge of the path is contained in the graph

Polygonal curve or polygonal chain

A sequence of connected line segments (in geometry, usually in the Euclidean plane). It is also uniquely determined by the sequence of points at which the line segments are connected

Sequence

A sequence is an ordered list of elements in which elements are of arbitrary type and repetitions of elements are allowed

Trajectory

A trajectory describes the position of a moving object through space. A discrete trajectory is usually a sequence of (possibly time-stamped) locations in two- or three-dimensional space, for example, given as GPS coordinates

Definition

Given a simple, undirected, and unweighted graph G = (V, E) with a set of nodes V = {v 1,  … , v n } and a set of edges E ⊆ V × V, a path in G is defined as finite sequence P = (p 1 e p1 p 2 … p k−1 e pk _1 p k ) with p i  ∈ V for all i ∈ {1,  … , k} and $${e}_{p_i}=\left({p}_i,{p}_{i+1}\right)\in E$$ for all i ∈ {1,  … , k − 1} and k ∈ IN. There are usually two different terminologies: if the nodes and edges are not required to be distinct, P is called either a path or walk. If the edges of P are distinct, P is called a simple path or a trail. If the nodes and the edges of P are distinct, P is called an elementary path or a path. In this entry, the former terminology is used, i.e., when using the term path, no assumptions, whether its nodes and edges are distinct, are made.

Since the considered graphs are simple, a path is uniquely determined by its node sequence and the notation can be simplified to P = (p 1 p 2 … p k ) which is used in the following.

Let V(P) = {p 1,  … , p k } and E(P) = {e p1, … , e pk−1} denote the set of nodes and edges which are contained in a path P, respectively. The lengthP∣  = k − 1 of a path P is defined as the number of (not necessarily distinct) edges.

A similarity measure on a set of elements X is a function σ : X × X  → IR which indicates how similar two objects of X are. The more similar two objects are, the higher the value of the similarity function should be.

A distance measure on a set X is a function δ : X × X → IR which indicates the dissimilarity or the distance of two objects.

A distance measure δ is said to satisfy non-negativity if for all x , y ∈ X, it holds that δ(x, y) ≥ 0. δ satisfies coincidence, if for all x, y ∈ X, it holds that δ(x, y) = 0 ⇔ x = y. δ satisfies symmetry, if for all x, y ∈ X, it holds that δ(x, y) = δ(y, x). δ satisfies the triangle inequality if for all x, y , z ∈ X, it holds that δ(x, z) ≤ δ(x, y) + δ(y, z).

A distance metric on a set X is a distance measure satisfying non-negativity, coincidence, symmetry, and the triangle inequality.

A metric space (X,  d) is a set of elements X on which there is a distance metric d defined for any pair of elements of X.

Introduction

While network analysis is almost 70 years old, the structure of paths in complex networks is rarely considered. This is particularly surprising as in many real-world networks, entities use the structure of the network to navigate in it by moving from node to node. Examples include the Internet users surfing the web – with or without destination; travelers using a transportation network based on flights (see, e.g., Guimerá and Amaral 2004), based on trains (Sen et al. 2003), based on roads, or based on ships (Ducruet and Notteboom 2012; Kaluza et al. 2010) in order to reach their destination; students using an e-learning platform by clicking through the interlinked documents and resources; players solving a puzzle by traversing its problem space (Jarušek and Pelánek 2011; Newell 1980); or information seekers navigating in an information network (West et al. 2009; West and Leskovec 2012). Not only humans traverse networks on paths, also other entities might use a network’s structure and traverse it on paths (an illustration is given in Fig. 1 which shows a network representing the problem space of a board game and which paths humans have taken in it while solving the board game).

Key Points

When humans or other entities traverse a complex network, they usually do not take the shortest path, but they also do not move randomly. The structure of these paths is an important research area which has just started. First studies have been published exploring which ways humans or other entities take in complex networks. Of particular relevance are methods for summarizing their structure. Such methods will help reducing (possibly large) data sets of paths into a few representative groups of paths. For developing such methods, an appropriate similarity measure for paths is needed.

Historical Background

Human navigation in spatial navigation has been subject to numerous studies in cognitive sciences for decades (e.g., McDonald and Pellegrino 1993). These studies focus primarily on human orientation in a (possibly unknown) spatial environment (Moeser 1988) and the mental representations humans have from their environment (Aginsky et al. 1997).

The observation that humans are able to find surprisingly short path also in other environments although they do not know the complete environment has been illustrated by Milgram (1967): in his famous experiment, people were asked to send a letter to a target person via one of their acquaintances. Although the structure of the social network was not known to any of the involved persons, the letters which arrived at their destination were routed over only five intermediate persons on average which is a remarkable small number (to be fair, over all different runs of the experiment, the percentage of letters which actually arrived at the target person ranges only from 15 to 35%). A similar result was found by Sudarshan Iyengar et al. (2012), who looked at players of a word game and found that the paths taken by the participants were on average around 1.7 times longer than the shortest paths. The results of West and Leskovec (2012) support these observations: they analyzed the paths taken by humans while seeking for information in the Wikipedia network. Based on almost 30,000 distinct paths for different pairs of articles, their analysis revealed that human wayfinding in information networks is surprisingly efficient although not the complete network structure is known to the players.

Although this has been known for decades, how humans actually find these short paths was not investigated until Kleinberg (2000a) posed the question in which way the structure of the network has an effect on the performance of decentralized algorithms (i.e., algorithms for finding paths from a given source to a given target in a network, using only local information of the nodes). Many networks, also the social network and the information network are examples of small world networks, i.e., networks with a strongly local structure (a high clustering coefficient) and a few long-range connections (which lead to a short average path length) at the same time (Watts and Strogatz 1998). Kleinberg generalizes the small world model of Watts and Strogatz and can prove that the capability of any decentralized algorithm of finding short paths (short compared to the diameter of the network which is not necessarily the shortest path) is crucially dependent on the value of one parameter of the model: there is exactly one parameter value for which a decentralized algorithm is able to find a short path with high probability. However, for other values of the parameter, no decentralized algorithm can find short paths (Kleinberg 2000a, b). This might also explain why in Milgram’s experiment, the majority of the letters actually did not reach their target.

Structure of Paths

Not only the length of found paths is interesting to study but also the structure of the found paths. In context of information networks, West and Leskovec (2012) analyze the structure and the properties of the paths taken by the human players by considering different qualities of the nodes of the paths, for example, the degree of the node, their distance to the target node, or the lucrative degree (i.e., the number of outgoing links which decrease the distance to the target node). They can show that the paths taken by humans share the same structure: the players first aim at reaching a hub node, i.e., a node with a large number of outgoing links. After that the players narrow down their search again, meaning that the articles get more specific again, their degrees decrease and the textual similarity to the target article increases. Similarly, Sudarshan Iyengar et al. (2012) found that all participants of the word game selected an individual “landmark” node with a low closeness value, through which they navigated in almost all paths.

A further approach is presented by González et al. (2008) who consider human trajectories created by a large set of individual travels, collected by the recorded locations of individuals’ mobile phones over a period of 6 months. Although single travel paths seem to be very individual, González et al. can show that merging all trajectories will result in single spatial probability distribution which means that human travel patterns show a high degree of regularity. They find that most individuals travel mostly short distances, but there are also several individuals who travel longer distances. A similar result is found by Cho et al. (2011) who aim at quantifying the factors which lead to these regularities. They observe that the majority of short-distance travels can be explained by the daily routines of the people which therefore show a high spatial and temporal periodicity. Long-distance travels do not show this behavior; however, when taking the social network of the user into account, the long-distance travels can be explained.

Similarity Measures for Paths

In order to summarize several paths, a good similarity measure for paths is needed. However, when developing a similarity measure for an object, the basic question to answer is which features of the object need to be incorporated in the similarity measure. Answering this question is difficult, sometimes rather a matter of taste or experience, and often dependent on the context. In order to formulate a similarity measure in a mathematical way, the objects of interest need to be represented by a structure on which operations can be made. This is why the formulation of similarity measure and the modeling of the objects are two closely related steps. For paths in networks, the modeling as well as the formulation of a similarity is not obvious at all. This is why this section provides a classification of how paths in networks can be modeled. For each modeling approach, a small selection of similarity and distance measures is presented as well as its connection to paths in complex networks. An overview of the approaches and similarity measures can be found in Table 1.
Table 1

Similarity and distance measures for paths, depending on how a path is modeled. The references either refer to the authors who introduced the original measure or to authors who applied it in the corresponding context

Modeling

Measure

References

Sets

Number of common elements

Jaccard index

Jaccard (1912)

Sequences

Longest common substring distance

Gusfield (1997)

Longest common subsequence distance

Needleman and Wunsch (1970)

Levenshtein distance

Levenshtein (1966)

Further edit distances

Navarro (2001)

Sets of points in metric space

Hausdorff distance

Hausdorff (1914)

Sum of minimum distances

Niiniluoto (1987)

(Fair) Surjection distance

Oddie (1986)

Eiter and Mannila (1997)

Matching distance

Ramon and Bruynooghe (2001)

(Discrete) Curves in metric space

Fréchet distance

Fréchet (1906), Alt and Godau (1995), Eiter and Mannila (1994)

Discrete Fréchet distance

Eiter and Mannila (1994)

LCSS distance

Vlachos et al. (2002)

Euclidean distance

A systematic evaluation of similarity measures for paths and their properties has been proposed by Bockholt (2015).

Paths as Sets

The most simple approach for modeling a path is to understand a path as a set of nodes and edges. For the comparison of two paths, well-known similarity and distance measures for sets can be applied, for example, the Jaccard index (Jaccard 1912). The Jaccard index for two sets A and B is defined as
$$\sigma \left( A, B\right)=\frac{\mid A\cap B\mid }{\mid A\cup B\mid }.$$
For paths P and Q, the sets A and B be might be replaced by V(P) and V(Q) or by E(P) and E(Q). The measure is easy to compute, and it satisfies symmetry and the triangle inequality. However, it does not satisfy coincidence in the sense that two nonidentical paths can yield a similarity of 1 as it can be seen in Figs. 2 and 3.

Furthermore, when modeling paths as sets of nodes or edges, several pieces of information contained in a path are discarded, for example, the order in which the nodes and edges occur in the paths as well as the information whether nodes or edges are contained once or multiple times in the path. This information can be considered when modeling paths as sequences of nodes.

Paths as Sequences

A path P can be modeled as sequence of nodes for which many similarity and distance measures have been developed, for example, in the community of bioinformatics to quantify the similarity of genetic sequences. Most of the existing similarity measures for sequences can be formulated as edit distance: for a set of allowed edit operations (as deletion, insertion, or substitution of elements) and costs associated to each operation, the edit distance between two sequences is the minimum costs which are necessary for transforming the one sequence into the other. Allowing insertion and deletion as edit operations with the costs of 1 for each operation yields the longest common subsequence distance (Needleman and Wunsch 1970). The name is due to the following connection: for a sequence A = (a 1,  … , a k ), a subsequence of A is defined as any sequence of elements which can be obtained by deleting elements from A. For A and another sequence B = (b 1,  … , b ), let lcs(A,  B) denote the length of the longest common subsequence, i.e., the maximal number of elements which occur both in A and B in the same order. The longest common subsequence distance for A and B is then ∣A ∣ + ∣B∣ − 2lcs(A, B) (which can be normalized by the length of the longer sequence which allows the comparison of sequences of different orders of length).

Sequences are not only of interest in the area of computational biology but also in other research fields. Researchers from data mining analyze sequences of events, as sequences of events in telecommunication data, ordered lists of courses a student has taken during studies, or sequences of stock prices from financial data (see, e.g., Kumar et al. 2010; Das et al. 1997; Mannila and Ronkainen 1997; Moen 2000; Laasonen 2005a). Another example is the analysis of clickstream data in order to investigate and predict the behavior of a user in a web environment (Gündüz and Özsu 2003; Wang and Zaiane 2002).

The analysis of sequences is of interest for the comparison of paths in complex networks. The longest common subsequence distance can directly be applied to paths as sequences of nodes and measure the number of nodes which occur in both paths in the same order. Therefore, the paths shown in Figs. 2 and 3 yield large values for this distance measure which seems more intuitive than the set-based measures. Note furthermore that it satisfies all properties of a distance metric. However, this measure is designed for comparing sequences of letters where the length of the sequences are long in relation to the size of the alphabet from which the letters are taken from. For paths in complex networks, this is rather the opposite case: the alphabet, i.e., the node set of the graph, is usually much larger than the length of the sequences, i.e., the considered paths. This implies that when comparing two arbitrary paths in a network, they usually share only a small fraction of nodes or are even totally disjoint. In these cases, the distance measure is not able to yield meaningful results, since it cannot distinguish between paths which are disjoint and “close” in the graph and paths which are disjoint but “distant” in the graph. For example, if two people drive from the same city to the same other city, but one on a highway and one using only country roads next to the highway, the two paths should be rated as quite similar. However, if one drives from north to south and the other from east to west, the paths should be rated as very different. The longest common subsequence distance measure however will rather return values which are only dependent on the lengths of the paths if they are disjoint.

Therefore, modeling paths as sequences will not always yield satisfying results since the information of where the paths are situated in the context of the network is not captured by this approach. Incorporating the location of the paths within the network leads to the next two modeling possibilities.

Paths as Sets of Points in Metric Space

Since all nodes of a path are also contained in the underlying graph, for each two nodes, a graph distance can be computed. Since this graph distance d, i.e., the length of the shortest path between the two nodes, is actually a metric, the set of nodes V and d form a metric space (V, d) (if the graph is connected and undirected). This is why a path P can also be modeled as a set of points in a metric space for which there are also several distance measures. Hausdorff introduced a measure for the distance of two nonempty subsets of a metric space in 1914 (Hausdorff 1914) which can be adapted to paths P, Q by
$${\delta}_h\left( P, Q\right)= \max \left\{ \max\limits_{v\in V(P)}\ d\left( v, Q\right),\max\limits_{w\in V(Q)} d\left( w, P\right)\right\}$$
with d(v, P) = min {d(v, w)| w ∈ V(P)}. Although the measure is a metric for the original definition, it violates coincidence for paths (cf. Fig. 2). Furthermore, it is obviously sensitive to extreme points since the longest shortest distance between the two paths is taken as measure. Interesting approaches to overcome this problem were, for example, proposed by the philosopher Oddie (1986) or by Eiter and Mannila (1997). Originally developed by Oddie for comparing theories formulated in logical expressions, Eiter and Mannila rephrased his surjection measure such that it is able to measure the distance between two subsets of points in a metric space. The idea in this measure is to introduce a surjective mapping from the points of the larger to the points of the smaller set which is also a worthwhile idea for paths in networks. For two paths P , Q with ∣P∣  ≥ ∣Q∣, their surjection measure can be adapted to
$$\delta_s (P,Q) = \frac{1}{|P|+1} \min\limits_{\eta \in H} \sum\limits_{(v,w) \in \eta} d(v,w)$$
where H is the set of all surjective functions from the nodes of P to the nodes of Q. They refine this measure to a new measure called minimum link distance in which H is not a set of functions anymore but a set of relations where each relation needs to contain each node of P and each node of Q at least once (see Fig. 4, for a formal description see Eiter and Mannila 1997).

Clearly, these measures consider the distance between the elements of the paths and allow a comparison of totally disjoint paths. They furthermore include all elements of both paths in the computation which is why extreme points do not have such a big impact. However, since they are set based, these measures ignore the order of the nodes in path which is also why these measures fail at satisfying coincidence as well. A further main drawback of them is the fact that it violates the triangle inequality (an interesting approach for fixing this issue is presented by Ramon and Bruynooghe (2001)).

Paths as Polygonal Chains

It seems natural to combine the latter two approaches of modeling paths as sequences and as set of points in a metric space by modeling paths as sequences of points in a metric space. In computational geometry, a sequence of points in a metric space is called a polygonal chain or polygonal curves. Polygonal curves often result from the discretization or approximation of continuous curves and are, informally speaking, a sequence of points in the metric space where the gaps between two consecutive points are assumed to be straight lines. Researchers from computational geometry proposed several methods and algorithms for measuring how much two polygonal curves resemble each other (Alt and Godau 1995). In cases in which the Hausdorff distance is not appropriate, it is often the Fréchet distance which is used. The Fréchet distance, introduced by Fréchet (1906) for continuous curves, is often illustrated by the picture of a man walking with his dog on a leash: the man walks on the one curve, the dog on the other. They both may vary their speed independently and arbitrarily, but they only walk forwards, not backwards. The Fréchet distance between the two curves is then the minimal length of the leash which is sufficient to traverse the two curves from the start to the end, under the named conditions. Of interest for the comparison of paths is the discrete Fréchet distance proposed by Eiter and Mannila (1994) which serves as an approximation of the continuous Fréchet distance for continuous as well as for polygonal curves (note that there exist discrete and continuous versions of both the distance measure (yielding the discrete Fréchet distance) and of the curves (yielding polygonal curves)). The illustration for the discrete Fréchet distance contains then two frogs, connected by a leash, one on each curve which jump from stone to stone. This means that the (in the case of polygonal curves, straight) connection between two consecutive points is not considered in the computation of the distance. This variant is of interest for the comparison of paths since usually there is a metric defined on the nodes of the network and not on the edges. For the efficient computation of the discrete Fréchet distance, Eiter and Mannila (1994) introduce a coupling between the points of both sequences as shown in Fig. 5. If the points of both sequences are drawn as in the figure, a coupling can be imagined as follows: the first point of the first sequence is connected to the first point of the second sequence; the same holds for the last points of both sequences. The points of the two sequences are connected to each other such that each point in both sequences is connected to at least one point of the other sequence and the connections between the points do not cross. The cost of such a coupling is then the length of the longest connection, and the discrete Fréchet distance of the two sequences the cost of the cheapest coupling between them. Hence, the discrete Fréchet distance is a measure which takes into account the order of the contained points as well as their distance. Especially interesting seems a variant of the discrete Fréchet distance proposed by Eiter and Mannila (1994) in which the costs of a single coupling are not the length of the longest coupling connection but the sum of all coupling connections. In this way, the distances of all contained points of both sequences are considered.

In recent years, it has become very cheap to equip mobile devices with all kind of sensors able to track the device’s position. This has led to a huge amount of available data containing the tracked movement of individuals (e.g., animals in wildlife (Shamoun-Baranes et al. 2011), mobile phone users (González et al. 2008; Laasonen 2005a, b), taxis (Yuan et al. 2010, 2011), bicycles from a shared bike systems (Vogel et al. 2014; Sener et al. 2009), or shopping cards with RFID chips which track the customers’ way through the supermarket (Larson et al. 2005)). These trajectories, given as sequence of GPS coordinates or in some other form, are basically sequences of points in a metric space such that results from this research field can be applied. Also the (discrete) Fréchet distance has been proven useful in the analysis of GPS trajectories, for example, for finding clusters of similar trajectories (Gudmundsson et al. 2012) or for detecting recurring patterns in the trajectories (e.g., Buchin et al. 2008).

However, for the comparison of GPS trajectories, also other similarity measures are used: although developed for sets and sensitive to outliers, the Hausdorff distance is widely used (e.g., by Junejo et al. 2004). Vlachos et al. (2002) make some effort to develop a similarity measure which is based on the longest common subsequence, but is applicable for trajectories by introducing two parameters. While the classic longest common subsequence lcs of two sequences counts the number of elements which are exactly the same in both sequences (and in the same order), the LCSS distance of Vlachos et al. (2002) counts points as matched if they are “close enough” to each other, i.e., their distance to each other is smaller than the given parameter. This idea can be useful for the comparison of paths in order to overcome the problem described in the section Paths as Sequences: the network usually contains much more nodes than the paths such that most paths cannot be distinguished by the lcs because they yield a similarity of 0. Counting nodes of two paths as matched if they are close enough in the network addresses this problem.

These approaches are also of relevance for the analysis of paths in complex networks: in order to summarize huge amount paths in a network, an appropriate clustering procedure with a meaningful similarity or distance measure is required. Of relevance is also the work of Lee et al. (2007) who point out that clustering trajectories as a whole might often be appropriate since this approach misses similar sub-trajectories. They therefore propose a partition-and-group framework in which the trajectories are first partitioned into line segments and then groups the line segments.

Although many results from the analysis of polygonal curves and trajectories can be applied for the analysis of paths in complex networks, the crucial difference between these concepts is that paths are embedded in the structure of a complex network. This fact provides effectively more information which cannot be used by any of the existing approaches. An open question is here how to adapt the methods from trajectory analysis to paths in directed or weighted networks or networks with multiple edges. In this case, the graph distance is not a distance metric anymore, for example, if there is a path from node v to node w, it does not imply that the shortest path from w to v has the same length or that it exists at all. Thus, adapting distance measures as the (discrete) Fréchet distance or the LCSS distance to paths in directed graphs requires some consideration.

Future Directions

Refining Centrality Indices

Although it has been observed that humans are often able to find surprisingly short paths through a network when only having local knowledge, they still rarely choose shortest paths. At the same time, popular measures for identifying the most central node in a network are based on paths through the network. They are, however, all assuming that the entities move on shortest paths which they usually do not. Therefore, refining centrality indices which take into account the actual taken paths seems to be a promising approach. Dorn et al. (2012), for example, could already show that centrality indices are less prone to artifacts when using actually taken paths.

Developing Realistic Models of Paths

The same holds not only for centrality indices but for all network models. When predictions about the behavior of users or processes in networks are being made, it might be not sufficient to only consider the network structure, but the actual usage of the network should be taken into account. The mere existence of an edge does not imply that this edge is actually used. Realistic models of network usage which is neither on shortest nor on randomly chosen paths are needed.

Using the Knowledge in the Paths

On the other hand, the actual usage of the network by thousands of entities – which paths are often taken, which rarely or never, maybe also dependent of other factors – contains a huge amount of knowledge. This knowledge may be used in order to infer knowledge about the network itself. This has already been done for GPS trajectories in order to provide the effectively shortest path to car drivers – based on the aggregated knowledge of thousands of taxi drivers (Yuan et al. 2010, 2011), or to extract interesting locations from individual travel sequences (Benkert et al. 2010; Zheng et al. 2009).

Visualizing Paths in Complex Networks

The human eye is a powerful tool for identifying common patterns and structure in data. Therefore, the visualization of large path data sets will be an important task for future research.

Compilation of Data Sets

Since the analysis of actual taken paths in complex networks is research area that has just started, researchers face the challenge that only a few data sets of paths in networks are available. Compiling and publishing data sets of complex networks and their actual usage is an important task for the network community – to mine and analyze the paths together with the underlying network structure.

References

1. Aginsky V, Harris C, Rensink R, Beusmans J (1997) Two strategies for learning a route in a driving simulator. J Environ Psychol 17(4):317–331
2. Alt H, Godau M (1995) Computing the Fréchet distance between two polygonal curves. Int J Comput Geom Appl 5:75–91
3. Benkert M, Djordjevic B, Gudmundsson J, Wolle T (2010) Finding popular places. Int J Comput Geom Appl 20(01):19–42
4. Bockholt M (2015) Measures for the similarity of paths in complex networks. Master thesis, TU KaiserslauternGoogle Scholar
5. Buchin K, Buchin M, Gudmundsson J, Löffler M, Luo J (2008) Detecting commuting patterns by clustering subtrajectories. In: Algorithms and computation: 19th international symposium, ISAAC 2008, Gold Coast, 15–17 Dec 2008. Proceedings, September, pp 644–655Google Scholar
6. Cho E, Myers SA, Leskovec J (2011) Friendship and mobility: user movement in location-based social networks. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ‘11). ACM, New York, pp 1082–1090. doi: 10.1145/2020408.2020579Google Scholar
7. Das G, Gunopulos D, Mannila H (1997) Finding similar time series. In: Komorowski J, Zytkow J (eds) Principles of data mining and knowledge discovery, Lecture notes in computer science, vol 1263. Springer, Berlin/Heidelberg, pp 88–100
8. Dorn I, Lindenblatt A, Zweig KA (2012) The trilemma of network analysis. In: Proceedings of the 2012 IEEE/ACM international conference on advances in social network analysis and mining, IstanbulGoogle Scholar
9. Ducruet C, Notteboom T (2012) The worldwide maritime network of container shipping: spatial structure and regional dynamics. Glob Netw 12(3):395–423
10. Eiter T, Mannila H (1994) Computing discrete Fréchet distance. Technical report. Information Systems Department, Technical University of ViennaGoogle Scholar
11. Eiter T, Mannila H (1997) Distance measures for point sets and their computation. Acta Inform 34(2):109–133
12. Fréchet MM (1906) Sur quelques points du calcul fonctionnel. Rend Circ Mat Palermo (1884–1940) 22(1):1–72
13. González MC, Hidalgo CA, Barabási AL (2008) Understanding individual human mobility patterns. Nature 453(7196):779–782. 0806.1256
14. Gudmundsson J, Thom A, Vahrenhold J (2012) Of motifs and goals: mining trajectory data. In: Proceedings of the 20th international conference on advances in geographic information systems – SIGSPATIAL ’12, ACM, New York, pp 129–138Google Scholar
15. Guimerá R, Amaral LAN (2004) Modeling the world-wide airport network. Eur Phys J B 38(2):381–385
16. Gündüz Ş, Özsu MT (2003) A web page prediction model based on click-stream tree representation of user behavior. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, pp 535–540Google Scholar
17. Gusfield D (1997) Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, New York
18. Hausdorff F (1914) Grundzüge der Mengenlehre. Veit and Company, Leipzig
19. Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11(2):37–50
20. Jarušek P, Pelánek R (2011) What determines difficulty of transport puzzles? In: Proceedings of Florida Artificial Intelligence Research Society conference. AAAI Press, pp 428–433Google Scholar
21. Junejo IN, Javed O, Shah M (2004) Multi feature path modeling for video surveillance. In: Proceedings of the international conference on pattern recognition, vol 2, pp 716–719Google Scholar
22. Kaluza P, Kölzsch A, Gastner MT, Blasius B (2010) The complex network of global cargo ship movements. J R Soc Interface 7(48):1093–1103, 1001.2172
23. Kleinberg J (2000a) The small-world phenomenon: an algorithmic perspective. In: Proceedings of the thirty-second annual ACM symposium on theory of computing, STOC 00, ACM, New York, pp 163–170Google Scholar
24. Kleinberg JM (2000b) Navigation in a small world. Nature 406:845–845
25. Kumar P, Raju BS, Radha Krishna P (2010) A new similarity metric for sequential data. Int J Data Warehous Min 6(4):16–32. doi:10.4018/jdwm.2010100102
26. Laasonen K (2005a) Clustering and prediction of mobile user routes from cellular data. In: Knowledge discovery in databases: PKDD 2005. Lecture notes in computer science, vol 3721. Springer, Berlin/Heidelberg, pp 569–576Google Scholar
27. Laasonen K (2005b) Route prediction from cellular data. In: Workshop on Context-Awareness for Proactive Systems (CAPS), vol 1617Google Scholar
28. Larson JS, Bradlow ET, Fader PS (2005) An exploratory look at supermarket shopping paths. Int J Res Mark 22(4):395–414
29. Lee JG, Han J, Whang KY (2007) Trajectory clustering: a partition-and-group framework. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data, ACM, New York, pp 593–604Google Scholar
30. Levenshtein V (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 10(8):707–710. Original in Russian in Dokl Akad Nauk SSSR 163(4):845–848, 1965
31. Mannila H, Ronkainen P (1997) Similarity of event sequences. In: Temporal representation and reasoning, 1997. (TIME ‘97), Proceedings., fourth international workshop on, Dayton Beach, pp 136–139. doi:10.1109/TIME.1997.600793Google Scholar
32. McDonald TP, Pellegrino JW (1993) Psychological perspectives on spatial cognition. In: Gärling T, Golledge RG (eds) Behavior and environment – psychological and geographical approaches, advances in psychology, vol 96. Elsevier Science Publishers, North-Holland, pp 47–82
33. Milgram S (1967) The small world problem. Psychol Today 2(1):60–67Google Scholar
34. Moen P (2000) Attribute, event sequence, and event type similarity notions for data mining. PhD thesis, Department of Computer Science, University of HelsinkiGoogle Scholar
35. Moeser SD (1988) Cognitive mapping in a complex building. Environ Behav 20(1):21–49
36. Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88
37. Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453
38. Newell A (1980) Reasoning, problem solving and decision processes: the problem space as a fundamental category. In: Nickerson R (ed) Attention and performance VIII. Erlbaum, Hillsdale. (Also available as Technical Report, Carnegie Mellon University, Computer Science, Report No. 2482, 1979)Google Scholar
39. Niiniluoto I (1987) Truthlikeness, vol 185. Springer Science & Business Media, Dordrecht
40. Oddie G (1986) Likeness to truth. D. Reidel, Dordrecht
41. Ramon J, Bruynooghe M (2001) A polynomial time computable metric between point sets. Acta Inform 37(10):765–780
42. Sen P, Dasgupta S, Chatterjee A, Sreeram PA, Mukherjee G, Manna SS (2003) Small-world properties of the Indian railway network. Phys Rev E 67(3):036106
43. Sener IN, Eluru N, Bhat CR (2009) An analysis of bicycle route choice preferences in Texas, US. Transportation 36(5):511–539
44. Shamoun-Baranes J, van Loon EE, Purves RS, Speckmann B, Weiskopf D, Camphuysen C (2011) Analysis and visualization of animal movement. Biol Lett 8(1):6–9
45. Sudarshan Iyengar S, Veni Madhavan C, Zweig KA, Natarajan A (2012) Understanding human navigation using network analysis. Top Cogn Sci 4(1):121–134
46. Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: Proceedings 18th international conference on data engineering, San Jose, pp 673–684. doi:10.1109/ICDE.2002.994784Google Scholar
47. Vogel M, Hamon R, Lozenguez G, Merchez L, Abry P, Barnier J, Borgnat P, Flandrin P, Mallon I, Robardet C (2014) From bicycle sharing system movements to users: a typology of Vélo’v cyclists in Lyon based on large-scale behavioural dataset. J Transp Geogr 41:280–291
48. Wang Q, Zaiane OR (2002) Clustering web sessions by sequence alignment. In: Proceedings. 13th international workshop on database and expert systems applications, Aix-en-Provence, pp 394–398. doi:10.1109/DEXA.2002.1045928Google Scholar
49. Watts DJ, Strogatz SH (1998) Collective dynamics of small-world networks. Nature 393(6684):440–442
50. West R, Leskovec J (2012) Human wayfinding in information networks. In: Proceedings of the 21st international conference on World wide web, ACM, New York, pp 619–628Google Scholar
51. West R, Pineau J, Precup D (2009) Wikispeedia: an online game for inferring semantic distances between concepts. In: Kitano H (ed) Proceedings of the 21st international joint conference on artificial intelligence (IJCAI ‘09). Morgan Kaufmann, San Francisco, pp 1598–1603Google Scholar
52. Yuan J, Zheng Y, Zhang C, Xie W (2010) T-Drive: driving directions based on Taxi trajectories. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, pp 99–108Google Scholar
53. Yuan J, Zheng Y, Xie X, Sun G (2011) Driving with knowledge from the physical world. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’11), vol 5, pp 316–324Google Scholar
54. Zheng Y, Zhang L, Xie X, Ma WY (2009) Mining interesting locations and travel sequences from GPS trajectories. In: Proceedings of the 18th international conference on World Wide Web. ACM, pp 791–800Google Scholar