Encyclopedia of Social Network Analysis and Mining

Living Edition
| Editors: Reda Alhajj, Jon Rokne

Detection of Spatiotemporal Outlier Events in Social Networks

Living reference work entry
DOI: https://doi.org/10.1007/978-1-4614-7163-9_324-1



Connected Component of Undirected Graph

A subgraph G’, of an undirected graph G, where any two nodes in G’ are connected to each other by paths

Friendship Network

A friendship network is a connected component of an undirected friendship graph. An undirected friendship graph is an undirected graph where a set of nodes N represents a set of users and an edge represents a friendship between two users

Generative Model

A probabilistic model that uses joint probability distribution for generating observable data given some hidden parameters

Location-Based Social Network

Online social networking service that enables users to “check in” at places from their mobile phones using text messaging or mobile applications. One of the largest location-based social networks is Brightkite. Brightkite also provides an explicit social network that describes friendships among users

Spatiotemporal Outlier Event

An event that occurs at a specific location on a certain time, which significantly contributes to the changes of that location’s topic

Undirected Graph

A set of nodes N and edges E which is denoted as G = (N, E). An edge is a mapping (n i, n j) from one node n i to another node n j, where (n i, n j) and (n j, n i) are identical. An undirected friendship graph is an undirected graph representing friendship among a set of users, where the friendship link between two users is represented by an undirected edge


In this entry, we extract temporal and spatial outlier events from a large online location-based social network. We analyze the check-in patterns and friendship networks of the users to determine the changes of topic of a location from time to time. A topic of a location at a specific time is determined by check-in patterns and number of friendship networks. Dealing with worldwide-scale check-in patterns introduces a new challenge as the exact location information (e.g., location’s specific function, location’s exact geo-position) is difficult to determine. To address this issue, we use a unique combination of generative model and von Mises-Fisher distribution to determine the topic of a location at a specific time, given the check-in patterns and friendship networks information of all users who check in. We showcase our proposed approach by performing experiments using dataset (Leskovec 2012) extracted from Brightkite, which was one of the largest location-based social networking services.


With the advent of Web 2.0 era, online social networking services become more and more popular. These online services have become part of the daily lives of people. Geo-social networking, or commonly called location-based social networking (LBSN), is a type of online social networking that enables the users to share their location-embedded information, such as by checking in at their current location, where the location is usually represented by latitude and longitude coordinates. Some services also allow users to express their network of friends. The information about check-in patterns and friendship networks provides us a new dimension of knowledge about human mobility, relationships among users, and the nature role, or topic, of locations at a specific time.

We are interested to discover spatiotemporal outlier events by analyzing the check-in patterns of the users from many different parts of the world. We extract some features from check-in data to determine the topic of a location at a specific time. Specifically, these features are the number of unique users who check in, the proportion of number of unique users who check in to the number of total check-ins, and the number of friendship networks. We call these features as socio-spatial aspects of a location at a specific time. In this work, we use day granularity; henceforth we use the term date instead of time. In general, our contributions are described as follows:
  1. 1.

    We introduce a generative model to capture the generative process of topic of a location at a specific date based on the check-in patterns and friendship networks of the users.

  2. 2.

    The proposed model combines the power of generative model and von Mises-Fisher distribution to model the location’s topic as directional data. We name the proposed model as STOvmf.

  3. 3.

    To the best of our knowledge, our study is the first that deals with worldwide-scale location-based dataset to discover spatiotemporal outlier events by analyzing the changes of topic of a location.


Historical Background

In this section, we first introduce some related work in extracting outliers from spatially and/or temporally data. We then also present some studies in analyzing human mobility patterns in order to give some backgrounds that we use in our proposed approach.

Liu et al. (2011) proposed a problem of discovering casual relationships among spatiotemporal outliers from traffic data streams. Wu et al. (2008) introduced algorithms to detect the most abnormal discrepancy regions in precipitation data. Wu et al. used four sweep lines to form grids which are treated as regions. Sun and Chawla (2004) proposed a measure called SLOM (spatial local outlier measure) to capture the local behavior of datum in their spatial neighborhood. Lauw et al. (2010) proposed an approach called STEvent and defined a spatiotemporal event by the co-occurrences among individuals that indicate potential associations among them. Our work extends all these studies on dealing with spatiotemporal outliers. In this work, we proposed an approach that combines generative model and directional statistics to discover spatiotemporal outlier events from some places around the globe.

Cho et al. (2011) developed a model to investigate patterns of human mobility. Cho et al. also developed a model of human mobility dynamics. They found that there are many common patterns of human mobility extracted from three large location-based datasets. Yuan et al. (2012) proposed a framework to discover regions of different functions in Beijing using human mobility patterns and points of interests located in a region. Noulas et al. (2011) studied about location-based social networks and found that check-ins by users are usually sporadic.


We first briefly introduce the von Mises-Fisher distribution and then introduce our novel generative model, STOvmf.

The von Mises-Fisher (vMF) Distribution

The von Mises-Fisher (vMF) distribution is a distribution for directional data (Fisher et al. 1987). A probability density function of the vMF distribution for a d-dimensional unit random vector (a unit vector is a vector whose length is 1) x is shown in
$$ f\left(\mathrm{x}|\mu, \kappa \right)={c}_d\left(\kappa \right){e}^{{\kappa \mu}^T x}, $$
where ||μ|| = 1, κ ≥ 0, d ≥ 2, and the normalizing constant c d (κ) is defined as
$$ {c}_d\left(\kappa \right)=\frac{\kappa^{d/2-1}}{{\left(2\pi \right)}^{d/2}{I}_{d/2-1}\left(\kappa \right)}, $$
where I r (·) represents the modified Bessel function of the first kind of order r. The parameter μ is called the mean direction, and the parameter κ is the concentration parameter. The parameter κ characterizes how strongly the unit random vectors drawn based on probability density function f(x|μ, κ) are concentrated about the mean direction μ (if κ = 0, the distribution is uniform, and if κ → ∞, the distribution tends to concentrate on one density).

The STOvmf Model

The graphical model of the STOvmf model is shown in Fig. 1a. The plate in Fig. 1a represents a replication. The circles represent variables used in this model. The shaded circle represents fully observed variable, while the unshaded circle represents nonobservable variable. The directed arrows among circles represent dependency assumptions.
Fig. 1

(a) The graphical model of STOvmf, (b) the generative process of STOvmf

We introduce three features that we use for the proposed model. The first feature is the number of unique users who check in (NUC). The number of unique users who check in could be used as a precursor that this location is attractive to be visited by some people. The second feature is the proportion of number of unique users who check in to the number of total check-ins (PUT). A user might check in several times at a certain location which is part of his/her daily routine schedule, such as home, work, or somewhere else in between those two locations as they commute in between them (Cho et al. 2011). The second feature is useful to characterize how familiar a location is to the users. The third feature is the number of friendship networks (NFN). Intuitively, people tend to share their current locations if they feel there is something interesting on those locations and seminal work on sociology has shown a strong presence of social and geographical homophily (McPherson et al. 2001). We refer to all of these features as the socio-spatial aspect of a location at a specific date. We record this socio-spatial aspect of a location at a specific date as a tuple <NUC, PUT, NFN>. The socio-spatial aspect of a location j, loc j , at a specific date i, date i , is represented as v ij .

Let θ j be the mixture of topic for a location j, generated from a Dirichlet distribution with hyperparameters α(θ j  ~ Dir(α)). Let Z ij denotes the topic assignment, μ k , of the socio-spatial aspect v ij from K different topics μ and follow a vMF distribution with parameter μ k and κ (v ij ~ vMF(μ k , κ)). Figure 1b shows the generative process of STOvmf.

We define the joint distribution of a location loc j as follows:
$$ \begin{array}{c} p\left(\theta, Z, V|\kappa, \alpha, \mu \right)= p\left(\theta |\alpha \right){\prod}_{i=1}^I p\left({Z}_j|\theta \right)\\ {} p\left({V}_i|{Z}_j,\mu, \kappa \right)\end{array} $$
We introduce the variational parameters, γ and ϕ, and the variational distribution that acts as a surrogate distribution is described as follows:
$$ q\left(\theta, Z|\gamma, \phi \right)= q\left(\theta |\gamma \right)\prod_{i=1}^I q\left({Z}_i|{\phi}_i\right) $$
Thus, the lower bound that needs to be maximized becomes
$$ \begin{array}{l} L\left(\gamma, \phi; \alpha, \mu \right)={E}_q\left[ \log p\left(\theta |\alpha \right)\right]\hfill \\ {}+{E}_q\left[ \log p\left( Z|\theta \right)\right]+{E}_q\left[ \log p\left( G| Z,\mu, \kappa \right)\right]\hfill \\ {}-{E}_q\left[ \log q\left(\theta \right)\right]-{E}_q\left[ \log q(Z)\right]\hfill \\ {}= \log \Gamma \left(\sum_{k=1}^K{\alpha}_k\right)-\sum_{k=1}^K \log \Gamma \left({\alpha}_k\right)\hfill \\ {}+\sum_{k=1}^K\left({\alpha}_k-1\right)\Big(\Psi \left({\gamma}_k\right)-\Psi \left(\sum_{k=1}^K{\gamma}_k\right)\hfill \\ {}+\sum_{i=1}^I\sum_{k=1}^K{\phi}_{i k}\left(\Psi \left({\gamma}_k\right)-\Psi \left(\sum_{k=1}^K{\gamma}_k\right)\right)\hfill \\ {}+\sum_{i=1}^I\sum_{k=1}^K{\phi}_{i k} \log p\left({V}_i|{Z}_i= k,\mu, \kappa \right)\hfill \\ {}- \log \Gamma \left(\sum_{k=1}^K{\gamma}_k\right)+\sum_{k=1}^K \log \Gamma \left({\gamma}_k\right)\hfill \\ {}-\sum_{k=1}^K\left({\gamma}_k-1\right)\left(\Psi \left({\gamma}_k\right)-\Psi \left(\sum_{k=1}^K{\gamma}_k\right)\right)\hfill \\ {}-\sum_{i=1}^I\sum_{k=1}^K{\phi}_{i k} \log {\phi}_{i k}\hfill \end{array} $$
We skip the details here and show the end results as follows:
$$ {\upgamma}_k^{\ast }={\alpha}_k+\sum_{i=1}^I{\phi}_{i k} $$
$$ {\phi}_{i k}^{\ast}\alpha \exp \left(\left(\Psi \left({\gamma}_k\right)-\Psi \left({\sum}_{k=1}^K{\gamma}_k\right)\right)+ \log p\left({V}_i|{\mu}_k,\kappa \right)\right) $$
$$ \arg \underset{\mu_k}{ \max}\sum_{j=1}^J\sum_{i=1}^I\sum_{k=1}^K{\phi}_{i j k} \log p\left({V}_{i j}|{\mu}_k,\kappa \right) $$
The inference and parameter estimation process then can be summarized as follows:
  1. 1.

    Initialize γ, ϕ, α, and μ.

  2. 2.

    Compute γ and ϕ using (6) and (7), respectively, for each location.

  3. 3.

    Compute and maximize the model parameters μ using (8). Step 2 is repeated until converged.

The following paragraphs summarize our proposed approach to detect spatiotemporal outlier events:
  • Step 1:

  • Compute the socio-spatial aspect <NUC, PUT, NFN> of each location for every recorded date.

  • Step 2:

  • Run the STOvmf model to get the topic assignments for every recorded date of each location.

  • Step 3:

  • Let θj be the mixture of topics for location j, loc j , and \( \left({\theta}_{j_{smallest}}\right) \) be the smallest mixture proportion in θ j . Let \( \left({date}_{i_{smallest}}\right) \) be the date that has Z ij which contributes to \( \left({\theta}_{j_{smallest}}\right) \), and then save \( \left({\theta}_{j_{smallest}}\right) \) and \( \left(<{loc}_j,{date}_{i_{smallest}}>\right) \) to the database; let’s refer to it as DBstoe.

  • Step 4:

  • Sort DBstoe in an ascending order based on the \( \left({\theta}_{j_{smallest}}\right) \) values, and output the top-P \( \left(<{loc}_j,{date}_{i_{smallest}}>\right) \) as spatiotemporal outlier events.

Experiment and Results

We use Brightkite dataset that contains 4,491,074 check-ins from 772,779 different locations around the world of 50,686 registered users between March 2008 and October 2010. We treat the Brightkite friendship networks as undirected friendship graphs. We exclude from our analysis unpopular locations by first filtering out dates which have less than three users check in from each location and then only include locations whose check-in dates larger than 2 results in 697 locations represented by latitude and longitude coordinates. In this experiments, we set K = 4, κ = 500, and α = 0.1. The distributions of number of check-ins per date per location and number of friendship networks per date per location are shown in Fig. 2a, b, respectively.
Fig. 2

(a) The frequency vs. number of check-ins (bucketized) for each date in a location, (b) the frequency vs. number of friendship networks for each date in a location

We perform the experiments on a machine with Intel(R) Core(TM) Duo CPU T6400 @2.00GHz, 1.75GB of RAM running 32-bit Microsoft Windows XP Professional Service Pack 3. We use Visual C#. Net for the data processing and MATLAB for the implementation of our proposed model. The model takes 859.8462 s to run and give the results.

After we run the steps described in the previously section, we translate the latitude and longitude coordinates into the physical address. This process is known as reverse-geocoding process. In particular, we use the online service (Morse 2012) which basically processes the information from two well-known providers: maps.google.com (Google) and Virtual Earth (Microsoft). We only record the name of regions of the translated addresses if we find different physical address translations from the two providers.

To verify the results, we then search for “special events” that occur at the detected locations and dates. We present the prominent examples of the detected spatiotemporal outlier events as follows:
  • Detected Event 1

  • SXSW 2010 Week, Austin, TX 78701, USA (March 13, 2010, and March 17, 2010). South by Southwest (SXSW) is a set of film, interactive and music festivals, and conferences in Austin, Texas, USA. In 2010, SXSW was held on March 12–21, 2010.

  • Detected Event 2

  • Maker Faire 2009, San Mateo, CA 94401, USA (May 31, 2009). Maker Faire is the world’s largest event to celebrate arts, crafts, engineering, science projects, and the do-it-yourself (DIY) concept. Maker Faire 2009 was held on May 30–31, 2009.

  • Detected Event 3

  • Papa Roach Concert 2009, Denver, CO, USA (October 26, 2009). Papa Roach is a famous American rock band who has sold more than 15 million album copies worldwide. The 2009 concert was performed exactly on October 26, 2009.

  • Detected Event 4

  • Thanksgiving 2009, San Francisco, CA, USA (November 26, 2009). Thanksgiving Day is a special holiday celebrated in USA and Canada. In 2009, the National Retail Federation also announced special weekend Thanksgiving sales was on November 26. This event is also known as Black Friday.

Among the above prominent examples, interestingly proportion-wise, we also find the exact address of Brightkite office, which was at 270 East Ln, Burlingame, CA 94010, USA. We suspect that this bias result is because most of the Brightkite employees used the service for the development.

Key Applications

The research to discover spatiotemporal outlier events is interesting, yet it is important to give more understanding on how people interact with each other and share their location-related information. This goal is potentially useful for developing an event recommendation system that enables people to know any interesting events around them, which most of their friends also attend. Furthermore, this direction could also be extended to develop a friendship recommendation system that fosters the friendship networks among the users based on events they attend.

Future Directions

There are several interesting and challenging research directions that we could aim in the future. It is interesting to investigate how the friendship networks evolve from time to time if the users tend to attend the same events: Does the spatiotemporal outlier events trigger users to expand their friendship networks? Do the people who attend the same events tend to create a new friendship? It would also be interesting to perform a longitudinal study about the evolution of spatiotemporal outlier events.



  1. Cho E, Myers SA, Leskovec J (2011) Friendship and mobility: user movement in location-based social networks. In: International conference on knowledge discovery and data mining (SIGKDD). ACM, San DiegoGoogle Scholar
  2. Fisher N, Lewis T, Embleton B (1987) Statistical analysis of spherical data. Cambridge University Press, Cambridge MATHCrossRefMATHGoogle Scholar
  3. Lauw HW, Lim EP, Pang H, Tan TT (2010) Spatiotemporal event model for social network discovery. ACM Trans Inf Syst (TOIS) 28(3):1–32CrossRefGoogle Scholar
  4. Leskovec J (2012) Snap: network datasets: Brightkite. html. Accessed July 2012Google Scholar
  5. Liu W, Zheng Y, Chawla S, Yuan J, Xie X (2011) Discovering spatio-temporal causal interactions in traffic data streams. In: International conference on knowledge discovery and data mining (SIGKDD). ACM, San DiegoGoogle Scholar
  6. McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27(1):415–444CrossRefGoogle Scholar
  7. Morse SP (2012) Converting addresses to/from latitude/longitude/altitude in one step. http://stevemorse.org/jcal/latlon.php. Accessed July 2012Google Scholar
  8. Noulas A, Scellato S, Mascolo C, Pontil M (2011) An empirical study of geographic user activity patterns in foursquare. In: International conference on weblogs and social media (ICWSM), BarcelonaGoogle Scholar
  9. Sun P, Chawla S (2004) On local spatial outliers. In: IEEE international conference on data mining (ICDM), BrightonGoogle Scholar
  10. Wu E, Liu W, Chawla S (2008) Spatio-temporal outlier detection in precipitation data. In: KDD workshop on knowledge discovery from sensor data, Las VegasGoogle Scholar
  11. Yuan J, Zheng Y, Xie X (2012) Discovering regions of different functions in a city using human mobility and pois. In: International conference on knowledge discovery and data mining (SIGKDD). ACM, BeijingGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.School of Information TechnologiesUniversity of SydneySydneyAustralia