1 Introduction

With the development of smart mobile devices, location acquisition technologies and social media, a large amount of event data with spatio-temporal information has become available and offers an opportunity to better understand people’s location preferences and mobility patterns in a sightseeing city [3]. In location-based social networking services (LBSNs) such as Foursquare and Facebook Places, check-in sequences of users to points-of-interest (POIs) are observed, where a finite number of venues are listed as POIs in advance. Clearly, there exist infinitely many attractive places in the city, including various geographical points giving beautiful views and street spots with artistic atmosphere. In photo-sharing services such as Flickr, observations of where and when people took photos are obtained. In order to take into account infinitely many attractive places on a continuous spatial domain, we consider extending the definition of POI. In particular, we also refer to the geographical locations of such photos that were taken on sightseeing tours and uploaded to a photo-sharing site as POIs, and aim at precisely investigating people’s experiences in visiting attractive places in the city. Namely, in our definition, it is supposed that any POI offers an attractive place in a sense. Note that a complete list of all POIs cannot be obtained in advance.

Recently, researchers [2, 8, 21] have examined the next POI recommendation problem, that is, the problem of predicting which POI a user is most likely to visit at the next discrete time-step given the current check-in POI, where it is assumed that a finite set of POIs is specified in advance and historical check-in sequences of users in an LBSN are provided. However, these studies were unable to fully capture the continuous structure of space-time, and thus Liu et al. [14] extended them to the case of a continuous time-axis by integrating temporal interval assessment. On the other hand, since online items posted on social media sites such as Facebook and Twitter gain their popularity by the amount of attention received (e.g., the number of Facebook shares and the number of retweets), several studies have been made on modeling the attention dynamics of online items in a continuous time-axis [11, 16, 19, 22]. Zhou et al. [24] presented a point process model in a discretized time-axis and a continuous spatial domain by fusing a time-varying Gaussian mixture model with a non-homogeneous Poisson process, and successfully estimated the spatial distribution of Toronto’s ambulance demand at a specified discrete time-step (i.e., each two-hour interval), where each Gaussian component shows a representative geographical area. In the case of dealing with events of visiting POIs in a sightseeing city, such a component may correspond to a major sightseeing area. However, this study is unable to extract the influence structure among components from the viewpoint of visiting POIs, while such knowledge can become important for tourism marketing.

Fig. 1.
figure 1

An illustration of geographical attention dynamics. Down-pointing triangles indicate the time points or geographical locations for POI visit events. Arrows in (d) indicates triggering relations between events. For example, (d) illustrates that the event in \(C_1\) at time \(t_{1, 2}\) triggered the event in \(C_2\) at time \(t_{2, 1}\). Arrows in (e) represent the main influence relations among latent components (major sightseeing areas).

For a given sightseeing city, we consider the problem of modeling the occurrence process of events for visiting POIs in a continuous time-axis and a continuous spatial domain, which is referred to as that of modeling the geographical attention dynamics, and aim to provide deep insights into the properties of people’s location preferences and mobility patterns on sightseeing tours in the city. What we observe is both a time-sequence of events (see Fig. 1a) and their locations (see Fig. 1b). Given a season, the sightseeing city should have a finite number of major sightseeing areas \(C_1, \dots , C_K\) (see Fig. 1c), where these represent major tourism topics, and are allowed to geographically intersect each other. In the same way as the attention dynamics of online items in social media, we first assume that the occurrences of previous events increase the possibility of future events. In particular, POI-visit events should exhibit a geographically self-exciting nature, where an event that happened in an area \(C_k\) may cause its subsequent events in the same area \(C_k\). Also, it is natural to suppose that the geographical attention dynamics has a geographically mutually-exciting nature, where an event that happened in an area \(C_k\) can trigger the subsequent events in any other area \(C_\ell \) (see Fig. 1d). Moreover, the temporal decay rate of such effect should vary according to area \(C_k\). Thus, based on the data of many people’s POI visit events in the season, it is desirable to identify major sightseeing areas \(C_1, \dots , C_K\) and find the spatio-temporal influence relations among \(C_1, \dots , C_K\) in terms of geographical mutual-excitation (see Fig. 1e).

In this paper, we propose a probabilistic model for discovering the spatio-temporal influence structure among major sightseeing areas from the perspective of geographical attention dynamics in a continuous space-time, and aim at accurately predicting the future POI visit events. To this end, we combine a Hawkes process [13], which is a counting process [1] frequently utilized to capture mutual excitations between events, with a time-varying Gaussian mixture model in a novel way. Also, we incorporate the influence structure depending on time slots into our model since it is known that users’ activities in LBSNs are often influenced by time [10, 20] and such temporal properties may rely on sightseeing cities as well as seasons. We develop an efficient method of inferring the parameters in the proposed model from the observed sequence of POI visit events, and provide an analysis method for the geographical attention dynamics in terms of spatio-temporal influence relations among major sightseeing areas. Using real data of POI visit events in Japanese sightseeing city “Kyoto” obtained from a photo-sharing site, we evaluate the proposed method. First, for predicting the future POI visit events, we show the effectiveness of the proposed model compared to a baseline model and such a conventional spatio-temporal point process model that simply integrates a Hawkes process with a time-varying Gaussian mixture model (see [24]), which is hereafter referred to as HG model. Next, by applying the proposed model, we uncover the spatio-temporal influence relation structure among major sightseeing areas in Kyoto in view of geographical attention dynamics.

The rest of the paper is organized as follows: In Sect. 2, we briefly summarize the related work. In Sect. 3, we introduce the HG model, and present the proposed model as its extension. In Sect. 4, we develop a probabilistic inference method for the proposed model, and present an analysis method for geographical attention dynamics. In Sect. 5, we report the experimental results. We conclude the paper by summarizing the main results in Sect. 6.

2 Related Work

Several studies have been made on predicting POI visit events in the near future. As described in Sect. 1, Chen et al. [2], Feng et al. [8] and Zhang et al. [21] investigated the next POI recommendation problem for a finite number of given POIs in a discretized time-axis. To address the problem in a continuous time-axis, Liu et al. [14] presented a method of exploiting temporal interval assessment. To construct an accurate predictive model of event data with mark information such as POI in a continuous time-axis for a finite number of given marks, Du et al. [5] extended a marked Hawkes process, and proposed such a marked temporal point process that incorporates a recurrent neural network. However, unlike our current approach, these works have a limitation in treating a continuous spatial domain, that is, it is difficult to handle the situation where there may be infinitely many POIs and a complete list of all POIs is unavailable. To model ambulance demand in a discretized time-axis and a continuous spatial domain, Zhou et al. [24] integrated a time-varying Gaussian mixture model with a non-homogeneous Poisson process. Note that this model can simply be extended to the HG model, which is a model for a continuous space-time. However, as already mentioned, the HG model also has a limitation in analyzing the spatio-temporal influence structure among latent components. In this paper, by properly extending the HG model, we propose a probabilistic model for the geographical attention dynamics in a continuous space-time, and aim at discovering the spatio-temporal influence structure among major sightseeing areas in a given season.

There have been many investigations related to modeling continuous-time events generated by users in social media. As described in Sect. 1, Wang et al. [19], Shen et al. [16], Gao et al. [11] and Zhao et al. [22] considered individually modeling the attention dynamics of online items posted on social media sites in order to predict their future popularity. Thus, unlike our current approach, these works are unable to properly analyze the relations among all the online items involved. On the other hand, Gomez-Rodriguez et al. [12] and Daneshmand et al. [4] examined the problem of extracting the social influence network structure among users from the observed information cascades (i.e., the observed sequences of events for sharing the same online items). Multivariate Hawkes processes are often leveraged to model event sequences forming information cascades in social networks [6, 23]. They are also extended to model coevolution dynamics of information diffusion and social network growth [7]. However, these studies exploited who shared which online item, and assume that a finite set of online items and users is given in advance. In this paper, we also focus on a Hawkes process, but unlike the studies mentioned above, we try to infer the spatio-temporal influence relation structure among POIs and predict the future POI visit events without knowing who visited which POI from the viewpoint of privacy protection. We also note that multivariate Hawkes processes cannot simply be applied to our problem since a complete list of all POIs and users is unavailable in advance.

3 Model

For predictive modeling of geographical attention dynamics, we consider modeling the occurrence process of events for visiting POIs in a sightseeing city during a time period \([0, T')\) corresponding to one of its tourist seasons in the setting of a continuous space-time, where \(T'\) \((> 0)\) is assumed to be a few months, and the corresponding continuous spatial domain is denoted by \(\varOmega \subset {\mathbb R}^2\).

3.1 Preliminaries

For any \(t \in (0, T')\), let \(N_t\) be the total number of events during time period [0, t), and for each \(n = 1, \dots , N_t\), we represent the nth event as tuple \((t_n, \varvec{x}_n)\), meaning that location \(\varvec{x}_n = (x_{n,1}, x_{n,2}) \in \varOmega \) was visited and registered as a POI at time \(t_n \in [0, T')\) on a sightseeing tour. We also denote the sequence of events (i.e., the history) up to but not including time t as

$$ \mathcal{H}_t \ = \ \left\{ (t_n, \varvec{x}_n); \, n = 1, \dots , N_t \right\} . $$

Based on the previous work [24], we focus on modeling the event occurrence process as a spatio-temporal point process with intensity function \(\lambda (t) \, f (\varvec{x}\, | \, t)\) for \(\forall t \in (0, T')\) and \(\forall \varvec{x}= (x_1, x_2) \in \varOmega \), where \(\lambda (t)\) is the intensity function of a temporal point process and \(f(\varvec{x}\, | \, t)\) is a time-varying Gaussian mixture for the spatial distribution. Namely, \(\lambda (t) \, f (\varvec{x}\, | \, t) \, dt \, d\varvec{x}\) is the conditional probability of observing an event within a small domain \([t, t + dt)\) \(\times \) \(\left\{ [x_1, x_1 + dx_1) \times [x_2, x_2 + dx_2) \right\} \) given the history \(\mathcal{H}_t\) (see [1, 5, 7]). Note that for \(0 < \forall T \le T'\), the probability density of \(\mathcal{H}_T\) is given by

$$\begin{aligned} p(\mathcal{H}_T) \ = \ \exp \left\{ - \int _0^T \lambda (t) \, dt \right\} \, \prod _{n = 1}^{N_T} \left\{ \lambda (t_n) \, f(\varvec{x}_n \, | \, t_n) \right\} . \end{aligned}$$
(1)

3.2 Spatial Distribution

We begin with defining \(f (\varvec{x}\, | \, t)\) for any \((t, \varvec{x}) \in (0, T') \times \varOmega \). Since people’s travel behaviors should vary by time of day, several studies [10, 20] separated each day into different time slots to improve POI recommendations in LBSNs. We also adopt this idea, and decompose each day into such M time slots \(TS_1, \ldots , TS_M\) that are appropriate for the city and the season to be consideredFootnote 1. Let \(h : [0, T') \rightarrow \{1, \dots , M \}\) be the time slot function, meaning that each time \(t \in [0, T')\) belongs to time slot \(TS_{h(t)}\).

Previous work [24] fixed the mixture component distributions across all time slots to overcome data sparsity issues, and tried to capture an accurate spatial structure. In the same way as [24], we define the spatial distribution \(f (\varvec{x}\, | \, t)\) by

$$\begin{aligned} f(\varvec{x}\, | \, t, \varTheta ) \ = \ \sum _{k=1}^K \phi ^{}_{h(t),k} \, g(\varvec{x}\, | \, \varvec{\mu }_{k}, \varSigma _{k}), \ \ \ \forall \varvec{x}\in \varOmega , \end{aligned}$$
(2)

where K is the number of components, and \(g(\varvec{x}\, | \, \varvec{\mu }_{k}, \varSigma _{k})\) is the 2-dimensional Gaussian density with mean vector \(\varvec{\mu }_{k}\) and covariance matrix \(\varSigma _{k}\) for \(k = 1, \dots , K\). The mixing coefficients \(\{ \phi _{m,k} \}\) for time slot \(TS_m\) satisfy \(0< \phi _{m,k} < 1\) together with \(\sum _{k=1}^K \phi _{m,k} = 1\) for \(m = 1, \dots , M\). Also, the parameters for \(f (\varvec{x}\, | \, t)\) are aggregated into the parameter set \(\varTheta \, = \, \left\{ \phi _{m, k}, \, \varvec{\mu }_k, \, \varSigma _k; \, m = 1, \dots , M, \, k = 1, \dots , K \right\} . \) We can consider that each Gaussian component \(C_k\) essentially represents a geographical area corresponding to a major tourism topic, and is identified with a major sightseeing area. Thus, we leverage this identification and try to analyze the influence relations among those major sightseeing areas.

3.3 Spatio-Temporal Point Process

Next, we consider modeling \(\lambda (t)\) for any \(t \in (0, T')\).

Baseline Model. One of the simplest models for a temporal point process is a Poisson process, where \(\lambda (t)\) is assumed to be independent of history \(\mathcal{H}_t\) and given by

$$\begin{aligned} \lambda (t \, | \, \alpha ) \ = \ \alpha , \ \ \ \forall t \in (0, T'). \end{aligned}$$
(3)

Here, \(\alpha \) is a positive constant. Thus, the spatio-temporal point process model defined by intensity function \(\lambda (t \, | \, \alpha ) \, f(\varvec{x}\, | \, t, \varTheta )\) (see Eqs. (2) and (3)) is regarded as a baseline.

Conventional Model. As described in the previous sections, a Hawkes process is frequently used to model continuous-time events and capture mutually-exciting interactions between events, and has been investigated for various applications (see [13, 22]). Thus, \(\lambda (t)\) can be modeled as a Hawkes process,

$$\begin{aligned} \lambda (t \, | \, \alpha , \beta , \gamma ) \ = \ \alpha + \beta \sum _{ \left( t_n, \, \varvec{x}_n \right) \, \in \, \mathcal{H}_t} \exp \left\{ -\gamma \, (t - t_n) \right\} , \ \ \ \forall t \in (0, T'). \end{aligned}$$
(4)

where \(\alpha \), \(\beta \) and \(\gamma \) are positive constants. Here, the spatio-temporal point process defined by intensity function \(\lambda (t \, | \, \alpha , \beta , \gamma ) \, f(\varvec{x}\, | \, t, \varTheta )\) (see Eqs. (2) and (4)) is referred to as HG model. Note that the HG model can be regarded as a conventional model presented in the previous work [24].

Proposed Model. By incorporating both component dependent temporal influence decay (see Sect. 1) and time-slot varying influence degree (see Sect. 3.2), we extend the HG model, and aim to discover the spatio-temporal influence structure among components from the viewpoint of geographical attention dynamics and to more accurately predict POI visit events in the near future. The proposed model is defined as the spatio-temporal point process with intensity function \(\lambda (t \, | \, Z_t, \alpha , \varvec{\beta }, \varvec{\gamma }) \, f(\varvec{x}\, | \, t, \varTheta )\) (see Eqs. (2) and (5)). Here, \(\lambda (t)\) is modeled as

$$\begin{aligned} \lambda (t \, | \, Z_t, \alpha , \varvec{\beta }, \varvec{\gamma }) \ = \ \alpha + \sum _{ \left( t_n, \, \varvec{x}_n \right) \, \in \, \mathcal{H}_t} \beta ^{}_{h(t_n)} \exp \left\{ -\gamma ^{}_{z(\varvec{x}_n \, | \, t_n)} \, (t - t_n) \right\} , \ \ \ \forall t \in (0, T'), \end{aligned}$$
(5)

where \(z(\varvec{x}_n \, | \, t_n)\) denotes the component ID of location \(\varvec{x}_n\) drawn from Gaussian mixture \(f(\varvec{x}\, | \, t_n, \varTheta )\) at time \(t_n\), i.e., \(z(\varvec{x}_n \, | \, t_n) = k\) if and only if \(\varvec{x}_n \in C_k\) at time \(t_n\), for \(n = 1, \dots , N_t\). \(Z_t\) is defined as

$$ Z_t = \left\{ z(\varvec{x}_n \, | \, t_n); \, n = 1, \dots , N_t \right\} . $$

Also, for the city during the current season, \(\alpha > 0\) expresses its underlying attractiveness, \(\beta _m > 0\) represents the influence degree of time slot \(TS_m\) for \(m = 1, \dots , M\), and \(\gamma _k > 0\) indicates the temporal influence decay rate of component \(C_k\) for \(k = 1, \dots , K\). Parameters \(\varvec{\beta }\) and \(\varvec{\gamma }\) are defined as \(\varvec{\beta }= (\beta _1, \dots , \beta _M)\) and \(\varvec{\gamma }= (\gamma _1, \dots , \gamma _K)\), respectively. Here, based on the additivity for independent Poisson processes (see [6, 13, 15]), for any \(t \in (0, T')\), we introduce a set of latent variables,

$$ Y_t = \left\{ y_n; \, n = 1, \dots , N_t \right\} , $$

such that the nth event \((t_n, \varvec{x}_n)\) was triggered by the \(y_n\)th event \((t_{y_n}, \varvec{x}_{y_n})\), where \(y_n = 0, 1, \dots , n-1\), and \(y_n = 0\) means that the nth event was triggered by the underlying attractiveness, i.e., the background intensity \(\alpha \). Namely, it is known that the point process with intensity function \(\lambda (t_n \, | \, Z_{t_n}, \alpha , \varvec{\beta }, \varvec{\gamma })\) at time \(t_n\) is the superposition of the Poisson processes with intensity functions \(\lambda ( t_{n}; \, y_n \, | \, Z_{t_{n}}, \alpha , \varvec{\beta }, \varvec{\gamma })\), (\(y_n = 0, 1, \dots , n - 1\)), where

$$\begin{aligned} \lambda (t_n; \, y_n \, | \, Z_{t_n}, \alpha , \varvec{\beta }, \varvec{\gamma }) \ = \ {\left\{ \begin{array}{ll} \alpha &{} \mathrm {if} \quad y_n = 0\\ \beta ^{}_{h(t_{y_n})} \, \exp \left\{ -\gamma ^{}_{z(\varvec{x}_{y_n} \, | \, t_{y_n})} \, (t_n - t_{y_n}) \right\} &{} \mathrm {if} \quad 1 \le y_n < n \end{array}\right. } \end{aligned}$$
(6)

for \(n = 1, \dots , N_t\). We consider extracting the influence relation \(R_{k, \ell }\) from component \(C_\ell \) to component \(C_k\) for \(k, \ell = 1, \dots , K\) by leveraging \(Z_T\) and \(Y_T\).

4 Learning Method

For the observed data \(\mathcal{H}_T\) with \(0< T < T'\), we develop a method of inferring the parameters \(\varTheta \), \(Z_T\), \(\alpha \), \(\varvec{\beta }\), \(\varvec{\gamma }\) and \(Y_T\) in the proposed model, and provide a method for prediction and analysis of the geographical attention dynamics.

4.1 Inference

We present an inference method of the proposed model from \(\mathcal{H}_T\).

First, we estimate \(\varTheta \) by maximizing the likelihood function \(p(\mathcal{H}_{T} \, | \, \varTheta , Z_{T}, \alpha , \varvec{\beta }, \varvec{\gamma })\). By Eq. (1), it is sufficient to maximize function \(\mathcal{L} (\varTheta ) = \prod _{(t_n, \varvec{x}_n) \in \mathcal{H}_{T}} f(\varvec{x}_n \, | \, t_n, \varTheta ).\) We employ an EM algorithm. Note that the number K of components is assumed to be fixed in this paper although it can also be estimated from the observed data by exploiting some techniques such as affinity propagation [9] and birth-and-death Markov chain Monte Carlo [18, 24]. Let \(\bar{\varTheta }\) be the current estimate of \(\varTheta \). Then, the update rule “\({\hat{\varTheta }} = \left\{ \{ {\hat{\phi }}_{m,k} \}, \{ {\hat{\varvec{\mu }}}_{k} \}, \{ {\hat{\varSigma }}_{k} \} \right\} \leftarrow {\bar{\varTheta }} = \left\{ \{ {\bar{\phi }}_{m,k} \}, \{ {\bar{\varvec{\mu }}}_{k} \}, \{ {\bar{\varSigma }}_{k} \} \right\} \)” is obtained as followsFootnote 2:

$$\begin{aligned} {\hat{\phi }}_{m,k}= & {} \frac{1}{| \mathcal{H}_{T}^m |} \sum _{(t_n, \varvec{x}_n) \, \in \, \mathcal{H}_{T}^m} \frac{{\bar{\phi }}_{m,k} \, g(\varvec{x}_n \, | \, {\bar{\varvec{\mu }}}_k, {\bar{\varSigma }}_k)}{f(\varvec{x}_n \, | \, t_n, {\bar{\varTheta }})}, \\ {\hat{\varvec{\mu }}}_k= & {} \frac{1}{\sum _{n=1}^{N_{T}} {\bar{a}}_{n,k}} \sum _{n=1}^{N_{T}} {\bar{a}}_{n,k} \, \varvec{x}_n, \quad {\hat{\varSigma }}_k = \frac{1}{\sum _{n=1}^{N_{T}} {\bar{a}}_{n,k}} \sum _{n=1}^{N_{T}} {\bar{a}}_{n,k} \, \left( \varvec{x}_n - {\hat{\varvec{\mu }}_k} \right) \, {\left( \varvec{x}_n - {\hat{\varvec{\mu }}_k} \right) }^\mathrm{T} \end{aligned}$$

for \(m = 1, \dots , M\) and \(k = 1, \dots , K\), where the superscript T stands for a matrix transpose, each 2-vector is treated as a \(2 \times 1\) matrix, and

$$ \mathcal{H}_T^m = \{ (t, \varvec{x}) \in \mathcal{H}_T; \, h(t) = m \}, \quad {\bar{a}}_{n,k} = \frac{{\bar{\phi }}_{h(t_n), k} \, g(\varvec{x}_n \, | \, {\bar{\varvec{\mu }}}_k, {\bar{\varSigma }}_k)}{f(\varvec{x}_n \, | \, t_n, {\bar{\varTheta }})}. $$

Also, |S| denotes the number of elements in a set S. With this method, we get the estimate \(\varTheta ^*\) of \(\varTheta \). Then, for each \((t_n, \varvec{x}_n) \in \mathcal{H}_{T}\) and \(k = 1, \dots , K\), the posterior probability \(\psi _k(\varvec{x}_n \, | \, t_n)\) of location \(\varvec{x}_n\) at time \(t_n\) is given by

$$\begin{aligned} \psi _k(\varvec{x}_n \, | \, t_n) = P( z(\varvec{x}_n \, | \, t_n) = k \ | \ t_n, \varvec{x}_n, \varTheta ^*) \ = \ \frac{\phi _{h(t_n), k}^* \, g(\varvec{x}_n \, | \, \varvec{\mu }_k^*, \varSigma _k^*)}{{f(\varvec{x}_n \, | \, t_n, \varTheta ^*)}}, \end{aligned}$$
(7)

and thus the estimate \(Z^*_{T}\) of \(Z_{T}\) can be obtained by

$$ z^*(\varvec{x}_n \, | \, t_n) = \mathop {\mathrm {argmax}}\limits _{1 \le k \le K} \psi _k(\varvec{x}_n \, | \, t_n). $$

Next, we develop a Bayesian method of estimating \(\alpha \), \(\varvec{\beta }\) and \(\varvec{\gamma }\) based on Eq. (1). To this end, we introduce the latent variables \(Y_T\) and try to infer \(Y_T\) as well (see Sect. 3.3). We consider leveraging the joint likelihood \(p(\mathcal{H}_{T}, Y_{T} \, | \, \varTheta ^*, Z^*_{T}, \alpha , \varvec{\beta }, \varvec{\gamma })\),

(8)

where

$$\begin{aligned} G_m (\varvec{\gamma }\, | \, Z^*_{T}) \ = \ \sum _{n=1}^{N_{T}} \frac{\displaystyle 1}{\displaystyle \gamma ^{}_{z(\varvec{x}_n \, | \, t_n)}} \left( 1 - \exp \left\{ -\gamma ^{}_{z(\varvec{x}_n \, | \, t_n)} \, (T - t_n) \right\} \right) \, I( h(t_n) = m ). \end{aligned}$$

Here, I(v) is an indicator function such that \(I (v) = 1\) if v is true, \(I (v) = 0\) otherwise. Suppose that \(\alpha \), \(\varvec{\beta }\) and \(\varvec{\gamma }\) are independently generated from the following priors (i.e., gamma distributions):

$$\begin{aligned} \alpha \, \sim \, \mathrm {Gamma}(\nu _\alpha , \eta _\alpha ), \quad \beta _m \, \sim \, \mathrm {Gamma}(\nu _\beta , \eta _\beta ), \quad \gamma _k \, \sim \, \mathrm {Gamma}(\nu _\gamma , \eta _\gamma ), \end{aligned}$$
(9)

for \(m = 1, \dots , M\) and \(k = 1, \dots , K\), where \(\nu _\alpha , \eta _\alpha , \nu _\beta , \eta _\beta , \nu _\gamma , \eta _\gamma >0\) are hyper-parameters. Then, \(p(\mathcal{H}_{T}, Y_{T} \, | \, \varTheta ^*, Z^*_{T}, \alpha , \varvec{\beta }, \varvec{\gamma })\) can be analytically marginalized over \(\alpha \) and \(\varvec{\beta }\) for priors (see Eqs. (8) and (9)), and we have

(10)

where \({\mathbb {R}}_{+}\) denotes the space of positive real numbers, \(\mathrm {\Gamma }(s)\) is the gamma function,

$$ L_0 \, = \, \sum _{n=1}^{N_{T}} I (y_n = 0) $$

indicates the number of events triggered by the background intensity, and

$$ L_m \, = \, \sum _{n=2}^{N_{T}} I ( h(t_{y_n}) = m ) \, I (y_n \ge 1) $$

indicates the number of events triggered by the preceding events within time slot \(TS_m\). By iterating the following three steps, we obtain the estimates \(\alpha ^*, \varvec{\beta }^*\) and \(\varvec{\gamma }^*\) of \(\alpha , \varvec{\beta }\) and \(\varvec{\gamma }\), respectively: (1) Gibbs sampling for \(Y_T\). (2) Metropolis-Hastings sampling for \(\varvec{\gamma }\). (3) Sampling for \(\alpha \) and \(\varvec{\beta }\), and updating of hyper-parameters. Moreover, based on the superposition theorem of independent Poisson processes (see [6, 15]), we estimate the posterior probability \(\xi _{n,i}\) \(=\) \(P(y_n = i \, | \, \mathcal{H}_{T}, \varTheta ^*, Z^*_{T}, \alpha ^*, \varvec{\beta }^*, \varvec{\gamma }^*)\) as

$$\begin{aligned} \xi _{n,i} \ = \ \frac{ \lambda (t_n; \, y_n=i \, | \, Z^*_{T}, \alpha ^*, \varvec{\beta }^*, \varvec{\gamma }^*)}{\sum _{j=0}^{n-1} \lambda (t_n; \, y_n=j \, | \, Z^*_{T}, \alpha ^*, \varvec{\beta }^*, \varvec{\gamma }^*)} \end{aligned}$$
(11)

for \(n = 1, \dots , N_T\), \(i = 0, 1, \dots , n-1\) (see Eq. (6)). Note that \(\{ \xi _{n,i} \}\) provide the posterior distribution of \(Y_T\). Below, we will describe the above three steps (1), (2) and (3) in detail.

Gibbs Sampling for \(\varvec{Y_T}\): Given the current samples of \(Y_T\), a new value of \(y_n\) for \(n = 1, \dots , N_T\) is sampled from \(\{ 0, \dots , n-1 \}\) using the Gibbs sampler of the conditional probability (see Eq. (10)),

$$\begin{aligned} P( y_n = i \, | \, \mathcal{H}_{T}, Y^{-n}_{T}, \varTheta ^*, Z^*_{T}, \varvec{\gamma }, \nu _\alpha , \eta _\alpha , \nu _\beta , \eta _\beta ) \propto p(y_n = i, \mathcal{H}_{T} \, | \, Y^{-n}_{T}, \varTheta ^*, Z^*_{T}, \varvec{\gamma }, \nu _\alpha , \eta _\alpha , \nu _\beta , \eta _\beta )\\ \propto {\left\{ \begin{array}{ll} \frac{\displaystyle L_0^{-n} + \nu _\alpha }{\displaystyle T + \eta _\alpha } &{} \mathrm {if} \quad i = 0\\ \frac{\displaystyle L_{h(t_i)}^{-n} + \nu _\beta }{\displaystyle G_{h(t_i)}(\varvec{\gamma }\, | \, Z^*_{T}) + \eta _\beta } \, \exp \left\{ -\gamma ^{}_{z(\varvec{x}_i \, | \, t_i)} \, (t_n - t_i) \right\} &{} \mathrm {if} \quad i = 1, \dots , n, \end{array}\right. } \end{aligned}$$

where the superscript \(-n\) stands for the set or value excluding the nth event.

Metropolis-Hastings Sampling for \(\varvec{\varvec{\gamma }}\): Due to the nonconjugacy of \(\varvec{\gamma }\), we consider leveraging a Metropolis-Hastings algorithm to obtain the invariant distribution of \(\varvec{\gamma }\) for current samples of \(Y_T\). Here, we exploit a normal distribution \(q(\varvec{\gamma }^{\prime } \, | \, \varvec{\gamma })\) as a proposal distribution for candidate \(\varvec{\gamma }^{\prime }\), Using the symmetric property, \(q(\varvec{\gamma }^{\prime } \, | \, \varvec{\gamma }) = q(\varvec{\gamma }\, | \, \varvec{\gamma }^{\prime })\), the acceptance probability of \(\varvec{\gamma }^{\prime }\) is obtained by

$$\begin{aligned} Q(\varvec{\gamma }^{\prime } \, | \, \varvec{\gamma }) \ = \ \mathrm {min} \left\{ 1, \, \frac{\displaystyle p(\varvec{\gamma }' \, | \, \mathcal{H}_{T}, Y_{T}, \varTheta ^*, Z^*_{T}, \nu _\alpha , \eta _\alpha , \nu _\beta , \eta _\beta ) }{\displaystyle p(\varvec{\gamma }\, | \, \mathcal{H}_{T}, Y_{T}, \varTheta ^*, Z^*_{T}, \nu _\alpha , \eta _\alpha , \nu _\beta , \eta _\beta ) } \right\} \end{aligned}$$

Note that \(Q(\varvec{\gamma }^{\prime } \, | \, \varvec{\gamma })\) is easily computed by using the relation,

$$ p(\varvec{\gamma }\, | \, \mathcal{H}_{T}, Y_{T}, \varTheta ^*, Z^*_{T}, \nu _\alpha , \eta _\alpha , \nu _\beta , \eta _\beta ) \ \propto \ p( \mathcal{H}_{T}, Y_T \, | \, \varvec{\gamma }, \varTheta ^*, Z^*_{T}, \nu _\alpha , \eta _\alpha , \nu _\beta , \eta _\beta ) \, p(\varvec{\gamma }\, | \, \nu _\gamma , \eta _\gamma ) $$

(see Eqs. (9) and (10)). We accept \(\varvec{\gamma }^{\prime }\) according to \(Q(\varvec{\gamma }^{\prime } \, | \, \varvec{\gamma })\). By iterating these operations, we obtain a sample for \(\varvec{\gamma }\).

Sampling for \(\varvec{\alpha }\) and \(\varvec{\varvec{\beta }}\), and Updating of Hyper-Parameters: Given the current samples for \(Y_T\) and \(\varvec{\gamma }\), we sample \(\alpha \) and \(\varvec{\beta }\) by the expected values of the posterior distributions \(p(\alpha | \mathcal{H}_{T}, Y_{T}, \nu _\alpha , \eta _\alpha ) = \mathrm {Gamma}(L_0 + \nu _\alpha , \, T + \eta _\alpha )\) and \(p(\beta _m | \mathcal{H}_{T}, Y_{T}, Z^*_{T}, \varvec{\gamma }, \nu _\beta , \eta _\beta ) = \mathrm {Gamma} (L_m + \nu _\beta , G_m (\varvec{\gamma }\, | \, \, Z^*_{T}) + \eta _\beta )\) as follows:

$$\begin{aligned} \alpha = \frac{\displaystyle L_0 + \nu _\alpha }{\displaystyle T + \eta _\alpha }, \quad \beta _m = \frac{\displaystyle L_m + \nu _\beta }{\displaystyle G_m(\varvec{\gamma }\, | \, Z^*_{T}) + \eta _\beta } \end{aligned}$$

for \(m = 1, \dots , M\) (see Eqs. (8) and (9)). Next, we update the hyperparameters \(\nu _\alpha \), \(\eta _\alpha \), \(\nu _\beta \), \(\eta _\beta \), \(\nu _\gamma \) and \(\eta _\gamma \) through the maximum likelihood estimations (see Eqs. (8), (9) and (10)). Here, the objective function for \(\nu _\alpha \) and \(\eta _\alpha \) is given by

$$\begin{aligned} \mathcal{L}_\alpha (\nu _\alpha , \eta _\alpha ) \ = \ \ln \frac{\displaystyle \mathrm {\Gamma } (L_0 + \nu _\alpha )}{\displaystyle (T + \eta _\alpha )^{L_0 + \nu _\alpha }} \, + \, \ln \frac{\displaystyle \eta _\alpha ^{\nu _\alpha }}{\displaystyle \mathrm {\Gamma } (\nu _\alpha )}. \end{aligned}$$

Also, the objective functions for \(\nu _\beta \) and \(\eta _\beta \) is given by

$$\begin{aligned} \mathcal{L}_\beta (\nu _\beta , \eta _\beta ) \ = \ \sum _{m=1}^M \ln \frac{\displaystyle \mathrm {\Gamma } (L_m + \nu _\beta )}{\displaystyle \left\{ G_m(\varvec{\gamma }\, | \, Z^*_{T}) + \eta _\beta \right\} ^{L_m + \eta _\beta }} \, + \, M \ln \frac{\displaystyle \eta _\beta ^{\nu _\beta }}{\displaystyle \mathrm {\Gamma } (\nu _\beta )}, \end{aligned}$$

and the objective function for \(\nu _\gamma \) and \(\eta _\gamma \) is given by

$$\begin{aligned} \mathcal{L}_\gamma (\nu _\gamma , \eta _\gamma ) \ = \ \sum _{k=1}^K \left( \ln \gamma _k^{\nu _\gamma - 1} \, - \, \eta _\gamma \, \gamma _k \right) \, + \, K \ln \frac{\displaystyle \eta _\gamma ^{\nu _\gamma }}{\displaystyle \mathrm {\Gamma }(\nu _\gamma )}. \end{aligned}$$

Based on Newton’s method, we obtain the update rules for these hyper-parameters.

4.2 Prediction and Analysis

Using the proposed model inferred from \(\mathcal{H}_T\), we provide a framework for predicting the future events and analyzing the geographical attention dynamics.

We predict the events occurring in \([T, T')\) by simulating the proposed model under Ogata’s thinning algorithm [15] based on the intensity function given by

(12)

for any \(t \in [T, T')\) (see Eqs. (2), (5) and (7)).

We analyze the geographical attention dynamics in the following way. We first examine the estimated parameters \(\alpha ^*\), \(\varvec{\beta }^*\), \(\varvec{\gamma }^*\) and \(\varTheta ^*\) in detail. Next, we extract the influence relation \(R_{k, \ell }\) from latent component \(C_\ell \) to latent component \(C_k\) by

$$ R_{k,\ell } \ = \ \sum _{n=2}^{N_T} \sum _{i=1}^{n-1} \xi _{n,i} \, \psi _k(\varvec{x}_n \, | \, t_n) \, \psi _\ell (\varvec{x}_i \, | \, t_i) $$

for \(k, \ell = 1, \dots , K\) (see Eq. (11)), and analyze it. Here, note that each component \(C_k\) is identified with a major sightseeing area, and each \(\xi _{n,i}\) measures the spatio-temporal influence from the ith event \((t_i, \varvec{x}_i)\) to the nth event \((t_n, \varvec{x}_n)\).

5 Experiments

Using real data of POI visit events in “Kyoto”, the ancient capital of Japan (a famous sightseeing city), we first evaluate the proposed model in terms of prediction performance. Next, by applying the proposed analysis method, we try to examine the properties of the geographical attention dynamics in Kyoto.

5.1 Datasets

We collected such photos that were taken within Kyoto city in 2014 and uploaded to photo-sharing site FlickrFootnote 3. By regarding those photos as a set of photos taken on sightseeing tours, we constructed real data for POI visit events in Kyoto. The total number of those photos was 78, 239. By taking into account Kyoto’s attractive seasons represented by cherry blossoms and autumn leaves, we focus on the spring data from March 1 to May 7 and the autumn data from October 1 to December 7. Also, from the perspective of Kyoto’s sightseeing, we divide one day into \(M = 4\) time slots, and set time slots \(TS_1\), \(TS_2\), \(TS_3\) and \(TS_4\) as 6 am to 11 am, 11 am to 4 pm, 4 pm to 9 pm and 9 pm to 6 am, respectively. Figure 2a indicates the number of events within each \(TS_m\). Unsurprisingly, it is seen that many events occurred in daytime \(TS_2\), and a relatively small number of events occurred in night-time and early-morning \(TS_4\).

Fig. 2.
figure 2

Statistical analysis for the number of events within each time slot.

For each of the spring and autumn data, we constructed seven datasets \(\mathcal{D}_1, \dots , \mathcal{D}_7\) in the following way: We let the training period [0, T) and the test period \([T, T')\) be two months and one day, respectively. In the case of the spring data, for example, for dataset \(\mathcal{D}_1\), training period [0, T) is March 1 to April 30 and test period \([T, T')\) is May 1, and for dataset \(\mathcal{D}_2\), training period [0, T) is March 2 to May 1 and test period \([T, T')\) is May 2. In the case of the autumn data, for example, for dataset \(\mathcal{D}_1\), training period [0, T) is October 1 to November 30 and test period \([T, T')\) is December 1, and for dataset \(\mathcal{D}_2\), training period [0, T) is October 2 to December 1 and test period \([T, T')\) is December 2.

5.2 Evaluation of Prediction Performance

For predicting future POI visit events, we compared the proposed model (see Eq. (5)) with the HG model (see Eq. (4)) and the baseline model (see Eq. (3)). Here, the parameters \(\alpha \), \(\beta \) and \(\gamma \) for the HG model were estimated by a commonly used method for learning a Hawkes process (i.e., a maximal likelihood method based on an EM algorithm (see [7])), and the parameter \(\alpha \) for the baseline model was also estimated in the same way. For inferring \(\alpha \), \(\varvec{\beta }\), \(\varvec{\gamma }\) and \(\{ \xi _{n,i} \}\) in the proposed model, we in particular implemented 1, 000 iterations with 200 burn-in. In view of Kyoto’s sightseeing, the number of components was set as \(K = 8\), and eight representative tourist spots were always used as the initial positions of parameters \(\{ \varvec{\mu }_k; \, k = 1, \dots , 8 \}\) in parameter inference, for all three models.

Fig. 3.
figure 3

Predictive accuracy for the spring data.

Fig. 4.
figure 4

Predictive accuracy for the autumn data.

By taking the issue of spatial resolution limitation into consideration, we decompose an appropriate rectangular region covering Kyoto’s spatial domain \(\varOmega \) into a collection of \(250 \times 400\) consecutive tiles \(\{ \varOmega (b) ; \, b = 1, \dots , 250 \times 400 \}\) (see [17]), where each tile \(\varOmega (b)\) is a 100 m\(^{2}\) region, and we consider evaluating the predictive accuracy for future POI visit events in terms of these tiles. For each \(TS_m\), we counted the number of events occurring within every tile \(\varOmega (b)\). Figure 2b shows the distribution of the number of events for each time slot \(TS_m\) in terms of the number of tiles. Interestingly, it can be seen that the distribution for \(TS_m\), (\(m = 1, 2, 3, 4\)), exhibits a power law with almost the same scaling exponent. We evaluate the predictive accuracy of an inferred spatio-temporal point process model with intensity function \(\lambda ^* (t) \, f^*(\varvec{x}\, | \, t)\) by

$$\begin{aligned} PA \! = \! \frac{1}{\left| \mathcal{H} (T, T') \right| } \left( \sum _{(t_n, \, \varvec{x}_n) \, \in \, \mathcal{H} (T, T')} \ln \left\{ \lambda ^* (t_n) \, \int _{\varOmega (b(\varvec{x}_n))} f(\varvec{x}\, | \,t_n) \, d\varvec{x}\right\} \, - \, \int _{T}^{T'} \lambda ^* (t) \, dt \right) \end{aligned}$$
(13)

(see [24])Footnote 4, where \(\mathcal{H}(T,T') = \mathcal{H}_{T'} \setminus \mathcal{H}_T\) stands for the set of events occurring in test period \([T, T')\), and \(\varOmega (b(\varvec{x}_n))\) denotes the tile to which location \(\varvec{x}_n\) belongs. Here, note that PA measures the average prediction log-likelihood of \(\mathcal{H}(T,T')\) (see Eq. (1)).

For all three models, the parameters other than \(\{ \varvec{\mu }_k \}\) were randomly initialized in parameter inference. Figures 3 and 4 show the average prediction performance on five trials for the spring and autumn data, respectively. Here, the proposed model is evaluated in terms of metric PA (see Eq. (13)), compared with the HG and baseline models. Figures 3a and 4a indicate the value of PA for each dataset \(\mathcal{D}_j\), and Figs. 3b and 4b indicate the average value of PA restricted to each time slot \(TS_m\). We see that the proposed model performs the best, the conventional HG model follows, and the baseline model is always worse than these two models. Unlike the other two models, the prediction performance of the proposed model was stable, and did not heavily depend on datasets and time slots. These results imply that it is significant to incorporate both component dependent temporal influence decay and time-slot varying influence degree, and demonstrate the effectiveness of the proposed model.

5.3 Analysis of Geographical Attention Dynamics

By applying the proposed method, we examine the properties of the geographical attention dynamics in Kyoto. Here, we only report the analysis results for the spring data (see Fig. 5).

Fig. 5.
figure 5

Analysis results for the spring data.

Figure 5a displays the geographical locations of the latent Gaussian components \(C_1, \dots , C_8\) estimated, which represent the major sightseeing areas of Kyoto in the spring. Figure 5b gives a visualization result of the estimated parameters \(\{ \phi ^*_{m,k} \}\), where each \(\varvec{\phi }^*_m = (\phi ^*_{m, 1}, \dots , \phi ^*_{m,8})\) indicates the popularity distribution among components within time slot \(TS_m\). We observe that \(C_1\), \(C_2\) and \(C_3\) are always popular, and \(C_5\) and \(C_6\) are also popular in some time slots to a certain degree. Here, \(C_1\), \(C_2\) and \(C_3\) correspond to neighborhoods of Kyoto Imperial Palace (Kyoto Gosho)Footnote 5. Heian-jingu ShrineFootnote 6 and Kiyomizu-dera TempleFootnote 7, respectively, and they are located near Kyoto’s downtown. Also, \(C_5\) corresponds to a neighborhood of Kinkaku-ji TempleFootnote 8 featuring a shining golden pavilion, which is located in a suburban area of Kyoto city. \(C_6\) corresponds to Arashiyma areaFootnote 9, which is a touristy district on the western outskirts of Kyoto, and famous as a place of scenic beauty.

Figure 5c shows the estimated parameters \(\varvec{\beta }^* = (\beta ^*_1, \dots , \beta ^*_4)\), which represent the influence structure depending on time slots. We can see that events occurred during morning \(TS_1\) were the most influential, while events occurred during night-time and early-morning \(TS_4\) were the least influential. For the estimated parameters \(\{ \gamma ^*_k \}\), there was little difference among \(C_1\), \(C_2\), \(C_3\) and \(C_5\). Figure 5d displays the temporal influence decay functions estimated for components \(C_1\) and \(C_6\). This implies that the influence of events occurred in \(C_6\) more rapidly decayed than that of events occurred in \(C_1\).

Figure 5e shows the estimated influence relation \(R_{k, \ell }\) from \(C_\ell \) to \(C_k\) for \(k, \ell = 1, \dots , 8\), where the color of the entry in the kth row and the \(\ell \)th column indicates the value of \(R_{k, \ell }\). We see that the influence relations among \(C_1\), \(C_2\) and \(C_3\) were substantially strong compared to the others. Figure 5f displays the main influence relations among components, where for \(k, \ell = 1, \dots , 8\), an arrow from \(C_{\ell }\) to \(C_k\) is drawn if \(R_{k, \ell }\) is greater than the average value of \(\{ R_{k', \ell '} \}\). This reveals people’s primary movement patterns for Kyoto’s sightseeing in the spring. Here, note that the spatio-temporal influence relations \(\{R_{k,\ell } \}\) significantly changed for the autumn data. Like these, the proposed method can provide interesting analysis results for Kyoto’s sightseeing during a specified season. These analysis results are expected to contribute a foundation for tourism marketing.

6 Conclusion

We dealt with modeling of geographical attention dynamics, that is, the problem of modeling the occurrence process of POI visit events for a sightseeing city in the setting of a continuous space-time. We have proposed a novel probabilistic model for discovering the spatio-temporal influence structure among major sightseeing areas, and attempted to accurately predict POI visit events in the near future. The proposed model is constructed by combining a Hawkes process with a time-varying Gaussian mixture model in a novel way and incorporating the influence structure depending on time slots as well. We developed an efficient method of inferring the parameters in the proposed model from the observed sequence of POI visit events, and provided an analysis method for the geographical attention dynamics. Using real data of Kyoto, a Japanese sightseeing city, we demonstrated that the proposed model significantly outperforms the conventional HG model and the baseline model in terms of predictive accuracy, and revealed the spatio-temporal influence relation structure among major sightseeing areas in Kyoto from the viewpoint of geographical attention dynamics.

In this paper, we focused on Kyoto’s data obtained from Flickr, a photo-sharing site. Clearly, it is possible to apply the proposed method to other sightseeing cities and geographical regions including several sightseeing cities. Our immediate future work is to evaluate the proposed method for various sightseeing cities around the world and to explore POIs of variable geographical scales. We also supposed that the latent Gaussian components extracted by the proposed method represent major tourism topics and can be identified with major sightseeing areas. Our future work includes exploring spatial distributions other than Gaussian mixture. In several photo-sharing services, there are many photos that are annotated not only with GPS locations and time-stamps but also with text documents, and a method of detecting spatio-temporally exclusive topics from those data is investigated (see [17]). By applying such a method, we also plan to develop a framework of easily interpreting those latent components in terms of tourism topics.