# Probabilistic movement models and zones of control

- 529 Downloads
- 1 Citations

**Part of the following topical collections:**

## Abstract

Coordinated movements of players are key to success in team sports. However, traditional models for player movements are based on unrealistic assumptions and their analysis is prone to errors. As a remedy, we propose to estimate individual movement models from positional data and show how to turn these estimates into accurate and realistic zones of control. Our approach accounts for characteristic traits of players, scales with large amounts of data, and can be efficiently computed in a distributed fashion. We report on empirical results.

## Keywords

Positional data Movement models Zones of control Soccer## 1 Introduction

Player coordination is perhaps the most important aspect in team sports. In soccer, for example, collective movements are inalienable for controlling the midfield, counter attacks, or effective pressing (Taki and Hasegawa 2000; Fonseca et al. 2012; Gudmundsson and Wolle 2014; Horton et al. 2015). Therefore, models that quantify the probability that a player attains a certain position in a given time are crucial. Such models are called *movement models*.

Traditional movement models ground on the assumption that players are able to move in all directions equally fast and ignore velocities (Taki et al. 1996; Taki and Hasegawa 2000; Fonseca et al. 2012), leading to implausible Voronoi-like tessellations (Voronoi 1908) of the pitch. More sophisticated models incorporate some basic laws of physics but either suffer from unrealistic assumptions or remain intellectual pastimes (Taki and Hasegawa 2000; Fujimura and Sugihara 2005; Gudmundsson and Wolle 2014). All existing approaches treat every player the same by assuming that a single movement model serves all players equally well, hence, ignoring individual differences between players.

*zones of control*(sometimes also called dominant regions). The zone that is controlled by a player is characterized by her being the person on the pitch to attain any position within this region first (Taki and Hasegawa 2000). The underlying idea is that if the ball falls inside a player’s zone of control, she will likely be able to bring the ball under control after receiving it and the more space a team controls, the more dominant they are.

In this paper, we propose to estimate individual movement models from positional data. Our probabilistic approach leverages positions, directions, and velocities of a player at observed timestamps and returns a distribution of all reachable positions in a given time. Figure 1 shows an example. We present an efficient computational schema for processing positional data at large scales and show how to turn the probabilistic movement models into zones of control. Compared to traditional one-serves-all methods, our approach leads to realistic movement models, which in turn lead to realistic zones of control.

The remainder is organized as follows. Section 2 reviews related work. Section 3 presents the estimation of individual movement models and Sect. 4 the computation of the resulting zones of control. Section 5 provides a discussion and Sect. 6 concludes.

## 2 Related work

Trajectory analyses are often carried out for wearable devices like smart phones, accelerometers, or gyroscopes (Zheng 2015; Mazimpaka and Timpf 2016). Often, the trajectories serve only as proxies for a higher level research question such as the identification of road defects (see, e.g., Byrne et al. 2013; Mohan et al. 2008), discrimination of drivers by insurance companies (Paefgen et al. 2011), or activity recognition (Avci et al. 2010; Lasek and Gagolewski 2015).

Similarly, trajectory data in sports is used to identify movement patterns. At an individual level, Zhao et al. (2016) use Gaussian processes to model velocity (flow) of athletes in ski races. Laube et al. (2005) propose to analyze relative motions and different temporal patterns across many subjects. As an exemplary application, the authors analyze positional data to retrieve patterns from coordinated team motions. The problem of pattern identification in groups of moving objects is also studied by Gottfried (2008, 2011). The author proposes qualitative descriptions of motion patterns using a set of atomic motions as building blocks to analyze and describe more complex behaviors; Sprado and Gottfried (2009) apply this idea to robocup and soccer games. Knauf et al. (2016) propose spatio-temporal convolution kernels as a similarity measure over time and space and identify game initiations and offensive patterns using a clustering approach. Similarly, Janetzko et al. (2014) group attacking patterns of strikers. Generally, frequent patterns in multi-trajectory data can also be found using episode mining algorithms (Haase and Brefeld 2014).

Zhang et al. (2016) visualize time interval data to analyze player and team performance. They include a variety of features ranging from player velocities and ball possession as the team dominance metrics. Other methods include, for example, estimating the probabilities of a shot being made (Link et al. 2016; Harmon et al. 2016). Generally, the application of neural networks to player trajectories, represented either as sequences or images, render the need for engineering hand-crafted features unnecessary and may thus be beneficial in situations where sufficient statistics are unknown or difficult to obtain, as for analyzing player positioning. For instance, Zheng et al. (2016) and Le et al. (2017) propose to model player trajectories with recurrent neural networks for player positioning in basketball and soccer. Similarly, convolutional neural networks are used by Harmon et al. (2016) to estimate the probability of scoring opportunities. Memmert et al. (2016) and Gudmundsson and Horton (2017) provide a general overview of positional data applications in team sports. Other interesting applications include pass quality evaluation (Brooks et al. 2016) or injury prediction (Rossi et al. 2017).

Taki and Hasegawa (2000) propose a movement model that is based on a player’s current speed, her direction, and an acceleration profile along different directions. The authors discuss the dependency of acceleration on velocity and direction and also emphasize that the acceleration decreases with increasing speed. Unfortunately, the authors ignore physical details and focus on a very basic and unrealistic version of their model, in which a player is able to move in all directions with the same acceleration; hence, accepting the consequence of unbounded velocities. Fujimura and Sugihara (2005) extend this approach by adding a resistive force to prevent velocities to grow infinitely. Thus, the two approaches drastically simplify physical laws to model player movements. Note that both also constitute one-serves-all approaches as the model is not personalized to account for individual differences between players. Recently, Gudmundsson and Wolle (2014) sketch how such an individual movement model could be estimated from data. They suggest approximating a player’s *reachable region* at time *t* by constructing a convex polygon for all historic points she reached within this time given her actual position. However, they leave it a play of thoughts and do not present technical or algorithmic details of their approach.

Once a movement model is established, it serves as a foundation for various applications in the analysis of matches. Perhaps the most important one being the computation of *zones of control*, or, alternatively, *dominant regions*. This concept has been introduced by Taki and Hasegawa (2000) as the part of the pitch that can be attained by a player before all others. Consequently, zones of control are necessary to compute and evaluate pass quality and success (Taki and Hasegawa 2000; Nakanishi et al. 2009; Gudmundsson and Wolle 2014; Horton et al. 2015), pressing (Taki and Hasegawa 2000), as well as the analysis of team behavior and interaction (Fonseca et al. 2012), or organization and positioning in both offense and defense (Ueda et al. 2014).

## 3 Estimating individual movement models

### 3.1 Preliminaries

*k*describing her position in area \(F \subset \mathbb {R}^2\) and let \(\mathbf {v}^k_t \in \mathbb {R}^2\) be her velocity vector at time \(t \in \mathbb {R}_{\ge 0}\) with its magnitude (speed) \(v_t^k = \Vert \mathbf {v}^k_t \Vert _2\), where \(\Vert \cdot \Vert _2\) denotes the \(\ell _2\)-norm.

^{1}The time index

*t*is typically discrete as samples are generated with equidistant timestamps \(t_1,\ t_2,\ \ldots ,\ t_n\), where \(t_{i+1} - t_i = \tau > 0\) is fixed. The trajectories and the associated velocities form the dataset \(\mathcal {D} = \{ ( \mathbf {p}_{t_i},v_{t_i} ) \}_{i=1}^n\). The goal is to generate a probabilistic model of the player’s whereabouts in time horizon \(t_\varDelta > 0\) given her current position \(\mathbf {p}^k_t\) and velocity \(\mathbf {v}^k_t\):

*k*whenever possible and focus on data from a single player.

### 3.2 Existing approaches

Before we introduce the estimation of probabilistic movement models from positional data, we briefly review existing approaches. The simplest model assumes that all players are able to move in all directions equally fast at a constant speed. Thus, there is no acceleration or direction of movement and the resulting zones of control are equal to Voronoi tessellations (Voronoi 1908) of the pitch using the players as center points. This model is referred to as *Voronoi*.

*v*in the direction of the

*x*-axis, in time

*t*her position \(\mathbf {p}= (x, y)\) is given by

*t*forms a circle centered at \(\mathbf {c} \in \mathbb {R}^2\) with radius \(r>0\), where

*Taki & Hasegawa*.

*t*is given by

*v*is the initial velocity in the direction of the

*x*-axis and the parameter \(\alpha > 0\) is responsible for the resistive force. Hence, the set of points within reach of the player in time

*t*forms a circle centered at \(\mathbf {c} \in \mathbb {R}^2\) with radius \(r>0\), where

*Fujimura & Sugihara*.

Figure 2 visualizes the existing movement models obtained by Voronoi-based approaches (from left to right). While all models realize similar circular-shaped movements for slowly moving players, differences become significant with increasing velocities. While the Voronoi-based approach yields perfect circles for any velocity, the approach by Fujimura & Sugihara leads to a conical structure assembled by nested circles. Finally, Taki & Hasegawa-based movement models become drop-shaped and oblique conical. Simply by being intrinsically circular for arbitrary velocities, it becomes obvious that the existing models serve only as crude approximations of reality. Intuitively, one would expect an elliptically shaped movement model, and, we will show in the next section that the data-driven models take on elliptical shapes.

### 3.3 Estimating individual movement models from positional data

*x*-axis. This way, the transformed position \(\mathbf {p}_u\) describes the point the player reaches assuming her current position is the origin, moving in direction of

*x*-axis with a given speed \(v_t = \Vert \mathbf {v}_t \Vert _2\). Figure 3 provides an overview of this approach.

*x*,

*y*) coordinates,

*r*the distance. The angle \(\theta \) is computed via the following direct calculation

*x*,

*y*) and the positive

*x*–axis. The distance is given by

^{2}

The model relies on a particular discretization of the speed range \(\mathcal {V}\) denoted by \({\tilde{\mathcal {V}}}\). Analogously, different models are obtained for different values of the time horizon parameter \(t_\varDelta \). In fact, we are interested in several values of this parameter for different time horizons (of about one second) in a given interval \(\mathcal {T}\).

In some cases, the triplets of points used to estimate the model can contain outliers. They may stem from an interruption during a match (e.g., due to a foul or corner kick) or errors in the data collecting process. Hence, triplets containing outliers should be discarded. Finally, given that a player’s ability to move should be symmetric with respect to the direction she is facing, the set can be augmented with \(({\bar{\mathbf {p}}}, v)\) using \({\bar{\mathbf {p}}} = (x, -y)\) for each sample \( (\mathbf {p}, v) \in {\mathcal {S}}_{t_\varDelta ,V}\).

### 3.4 Large-scale movement models

*i*th cell is \([ {{\mathcal {Z}}}_{i-1}, {{\mathcal {Z}}}_i) \subseteq {\mathcal {Z}}\) and \({\mathcal {Z}}= [ {{\mathcal {Z}}}_{0}, {{\mathcal {Z}}}_1) \cup [ {{\mathcal {Z}}}_{1}, {{\mathcal {Z}}}_2) \cup \dots \cup [ {{\mathcal {Z}}}_{n_{\mathcal {Z}}-1}, {{\mathcal {Z}}}_{n_{\mathcal {Z}}})\). The space \(\mathcal {X}\times \mathcal {Y}\) covers the possible whereabouts of a player in a given time horizon. Interval \(\mathcal {V}\) contains all possible velocities as introduced in the previous section. Finally, \({\tilde{\mathcal {T}}}\) denotes a sequence of all time horizons of interest within an interval \(\mathcal {T}\). Let \({\tilde{\mathcal {X}}}\), \({\tilde{\mathcal {Y}}}\), and \({\tilde{\mathcal {V}}}\) be the discretizations of \(\mathcal {X}\), \(\mathcal {Y}\), and \(\mathcal {V}\) with sizes \(n_\mathcal {X}\), \(n_\mathcal {Y}\), and \(n_\mathcal {V}\), respectively. Furthermore, let

*A*be a \(n_\mathcal {X}\times n_\mathcal {Y}\times n_\mathcal {V}\times n_\mathcal {T}\) matrix containing the counts of points. Here, entry \(A_{abcd} \in \mathbb {N}\) contains the counts for all points within the

*a*th cell in \(\mathcal {X}\), the

*b*th cell in \(\mathcal {Y}\), the

*c*th speed range in \(\mathcal {V}\), and the

*d*th time delta from \(\mathcal {T}\). Given a time delta \(t_\varDelta \), we compute the indices

*a*,

*b*,

*c*, and

*d*for the trajectory point corresponding to the transformed position \(\mathbf {p}=(x,y)\) and the speed

*v*as discussed in the previous section. The assignment of the indices is done by the following function:

*a*,

*b*,

*c*,

*d*), we increment \(A_{abcd}\). This is repeated for every triplet within the dataset. The approach is summarized in Algorithm 2. In order to obtain the movement model, i.e., the two-dimensional histogram, we need to condition on a specific speed value \(v_t\) as well as a time delta of interest \(t_\varDelta \) and normalize the resulting slice:

### 3.5 Distributed computation

*a*,

*b*,

*c*,

*d*). The resulting key-value pair of the mapper consists of the concatenated indices, which serve as the key and a static

*one*as the value. We suggest one reducer per matrix entry indexed by (

*a*,

*b*,

*c*,

*d*). Each reducer obtains the concatenated indices as a key and a list of

*ones*. The count for \(A_{abcd}\) is simply obtained by summing up the ones within the list. The pseudo codes for the mapper and reducer are depicted in Algorithms 3 and 4. After executing the MapReduce procedure, we obtain the counts of

*A*for a fixed time delta \(t_\varDelta = \mathcal {T}_d\). Multiple runs over all time deltas are needed to fill the entire matrix. The movement model is then given as in Eq. (8).

### 3.6 Complexity

*v*, i.e., \(m_{t_\varDelta ,v} = | {\mathcal {S}}_{t_\varDelta ,V} |\). First, we consider movement models based on kernel density estimates as introduced in Eq. (6). The complexity of training a KDE is equivalent to the cardinality of the set \({\mathcal {S}}_{t_\varDelta ,V}\) and thus equal to \(\mathcal {O}( m_{t_\varDelta ,v} )\). Since there is a separate KDE for every time delta and speed, the complexity of training all KDEs for a single player is \(\mathcal {O}( \sum _{v \in {\tilde{\mathcal {V}}}} \sum _{t_\varDelta \in {\tilde{\mathcal {T}}}} m_{t_\varDelta ,v})\) and thus linear in the player’s trajectory data. The complexity of predicting, i.e., obtaining the probability for a given position, using a KDE is \(\mathcal {O}( m_{t_\varDelta ,v} )\). Clearly it holds that the larger the training set, the better the model. However, increasing the size of samples \(m_{t_\varDelta ,v}\) makes it prohibitive to use the individual movement models based on KDEs in real-time scenarios. Considering the memory demand of the KDE-based approach, it becomes obvious that all samples are needed as the KDE is a non-parametric method. Hence, \(\mathcal {O}\left( \sum _{v \in {\tilde{\mathcal {V}}}} \sum _{t_\varDelta \in {\tilde{\mathcal {T}}}} m_{t_\varDelta ,v} \right) \) points need to be stored.

Complexity overview

Approach | Training | Predicting | Memory |
---|---|---|---|

KDE-based | \(\mathcal {O}\left( \sum _{v \in {\tilde{\mathcal {V}}}} \sum _{t_\varDelta \in {\tilde{\mathcal {T}}}} m_{t_\varDelta ,v} \right) \) | \(\mathcal {O}( m_{t_\varDelta ,v} )\) | \(\mathcal {O}\left( \sum _{v \in {\tilde{\mathcal {V}}}} \sum _{t_\varDelta \in {\tilde{\mathcal {T}}}} m_{t_\varDelta ,v} \right) \) |

count-based | \(\mathcal {O}\left( \sum _{v \in {\tilde{\mathcal {V}}}} \sum _{t_\varDelta \in {\tilde{\mathcal {T}}}} m_{t_\varDelta ,v} \right) \) | \(\mathcal {O}( 1 )\) | \(\mathcal {O}( n_\mathcal {X}\cdot n_\mathcal {Y}\cdot n_\mathcal {V}\cdot n_\mathcal {T})\) |

Second, we report on the complexities regarding the count-based movement model as introduced in Eq. (8). The learning procedure is outlined in Algorithm 2 and has the same complexity of constructing all sets \({\mathcal {S}}_{t_\varDelta ,V}\) and that is \(\mathcal {O}\left( \sum _{v \in {\tilde{\mathcal {V}}}} \sum _{t_\varDelta \in {\tilde{\mathcal {T}}}} m_{t_\varDelta ,v} \right) \). Hence, the complexity of building the KDE-based model and the count matrix *A* is identical. Predicting using the count matrix *A* when conditioning on a time delta \(t_\varDelta \) and a speed value *v* is \(\mathcal {O}( 1 )\), assuming no online training. This holds true if the denominator in Eq. (8) is applied not at prediction time but right after the learning. Computing the probability of attaining a given position then boils down to a simple look up within a table. The memory demand of the count matrix *A* is \(\mathcal {O}( n_\mathcal {X}\cdot n_\mathcal {Y}\cdot n_\mathcal {V}\cdot n_\mathcal {T})\). This means that the finer the discretization, the larger the matrix and hence more memory is needed. The complexities of both approaches are summarized in Table 1. It shows that training complexities are equal in both cases. However, evaluating the probability of a position (and thus the most probable location of a player) is constant for the counting-based approach while it depends on the number of samples for the KDE-based approach; hence, a trade-off between accuracy and speed is possible. Furthermore, the memory requirements grow for the KDE-based approach as more samples are collected. By contrast, the space demand for the counting-based approach is constant in the number of samples but large when high precision and accuracy is necessary.

### 3.7 Empirical results

There are two typical ways of collecting positional data in sports. The first way is to attach sensors to players and ball to monitor their positions (Grün et al. 2011; Mutschler et al. 2013). The second way is to use computer vision algorithms for retrieving players’ and ball’s trajectories in consecutive frames (Barris and Button 2008; D’Orazio and Leo 2010). The positional data we use in the experiments stem from the latter and is recorded at 25 Hz. For a single match, this usually yields over \(25 \cdot 60 \cdot 90 = 1{,}35{,}000\) samples (due to possible extra time by the end of each half). The dimensions of a soccer field are 105.0 and 68.0 m and the coordinates of positions in the trajectory data are given relative to the origin of the field, which is set to (0, 0). Hence, player coordinates (*x*, *y*) are within \(F = [-52.5, +52.5] \times [-34.0, +34.0] \subset \mathbb {R}^2\).

Except for the Voronoi-based approach, the models discussed in Sect. 3 involve user-defined parameters that need to be specified. For Taki & Hasegawa, the acceleration parameter \(a_{\max }\) can be derived from the corresponding speed samples \(v_t\) using \(a_t = \frac{1}{h}(v_{t+h} - v_{t})\). Here, these are computed for a time horizon of \(h=1\) s using data from a single match. Based on this, we set \(a_{\max } = 4.2\,\mathrm{{m/s^2}}\), which is equal to the 0.999-quantile of the derived values. The quantile instead of the maximum acceleration observed is used to ignore outliers. The model by Fujimura & Sugihara includes two parameters, \(\alpha \) and \(v_{\max }\). We use \(\alpha = 1.3\), which is the value proposed in the original paper (Fujimura and Sugihara 2005), and \(v_{\max } = 8.0\) m/s, where the latter corresponds to the 0.999-quantile of the observed speed values (analogously as in the case of \(a_{\max }\) parameter in the previous model).

To compute the individual movement models presented in Sect. 3.3, we use \(t_{\delta } = 0.2\) s and \(t_{\varDelta } =1\) s in Algorithm 1. We use five different speed intervals shown in Table 2. Note that such a discretization is a common way to bin velocities to account for sparseness in real data, as the number of samples per speed interval may vary significantly (Lago-Peñas et al. 2009; Coutts et al. 2010; Gudmundsson and Wolle 2014). Table 2 also presents speed distributions for three different players: a goalkeeper, a defender, and an attacking midfielder. On average, field players walk and jog and save their energy for only a few sprints.

^{3}For example, the goalkeeper has a significantly lower probability to reach distant positions compared to the field players. The reason lies, however, not in her ability to move but in the lack of corresponding observations: Goalkeepers hardly push forward and usually cover a smaller radius than field players. The figures clearly show that the midfielder covers a wider area and is, on average, moving faster than her peers. The few data samples collected for the goalkeeper could be balanced with an average model, see discussion in Sect. 5.

Distribution of speed classes for three different players

Speed | Range (km/h) | Goalkeeper (%) | Defender (%) | Midfielder (%) |
---|---|---|---|---|

Stand | \( < 1\) | 27.19 | 11.25 | 12.01 |

Walk | \(1{-}7\) | 66.68 | 53.22 | 50.90 |

Jog | \(7 {-} 14\) | 5.44 | 27.57 | 28.11 |

Run | \(14 {-} 20\) | 0.57 | 5.97 | 6.36 |

Sprint | \(> 20\) | 0.12 | 2.00 | 2.62 |

## 4 Zones of control

### 4.1 Motivation

Movement models can be used to compute zones of control (or dominant regions) of individual players and teams as a whole (Taki and Hasegawa 2000; Gudmundsson and Wolle 2014; Horton et al. 2015). Below we formally define dominant regions for the models presented in the previous section. To do so, it is beneficial to recall the definition of the traditional movement models that are inspired by physical laws. The definition of probabilistic models is analogous and discussed later.

Let function \(\varGamma \,{:}\,\mathbb {R}^2 \rightarrow \mathbb {R}_{\ge 0}\) yield the time needed to reach position \(\mathbf {p}\in \mathbb {R}^2\) for a player *k* at position \(\mathbf {p}^k_t\) moving with velocity \(v^k_t\) in a given direction, i.e., \(\varGamma \big (\mathbf {p}\ |\ \mathbf {p}^k_t, v^k_t\big )\). This function is specific to a given physical model governing player movements. In other words, for a given player, function \(\varGamma \) yields the minimal time that satisfies Eq. (1) and (2) for the Taki & Hasegawa and the Fujimura & Sugihara models, respectively. In Taki and Hasegawa (2000), the concept of a player’s zone of control is defined as follows.

### Definition 1

The *zone of control* of player *i* is defined as the subset \(D^i\) of the playing area field *F*, where player *i* can arrive before any other player \(k \ne i\).

### 4.2 Problem formulation

*k*th player as introduced in Eq. (6). It quantifies the likelihood of player

*k*to reach position \(\mathbf {p}\) given her current \(\mathbf {p}_{t}\) and last position \(\mathbf {p}_{t-t_\delta }\), velocity \(v^k_t\), and time horizon \(t_\varDelta \). The position \(\mathbf {p}\) is controlled by player

*i*having the highest likelihood:

*i*is given as the set of all points \(D^i = \{ \mathbf {p}\in F \ | \ \phi _{t_\varDelta }(\mathbf {p}) = i \}\) that are controlled by her. It should be noted that ties may occur if the likelihood of two or more players is equal, especially in the counting-based setting. If ties are broken, then the set \(\{D^1, D^2, \ldots , D^K \}\) is a partition of

*F*. The procedure is summarized in Algorithm 5.

### 4.3 Approximating zones of control

*F*is not iterable since it is uncountable. A typical workaround is to use a finite approximation of the playing area (Nakanishi et al. 2009; Lucey et al. 2012; Narizuka et al. 2014; Franks et al. 2015). Let \(G \subset F\) be a finite grid over

*F*containing \(n_x \cdot n_y\) equally spaced points in

*F*with (axis-aligned) distance \(\varDelta \) to each other. Player domination is then computed using

*G*rather than

*F*, which yields a finite approximation of the zones of control with precision \(\varDelta \). The smaller \(\varDelta \) is, the better the approximation. The procedure is presented in Algorithm 6. For visualization purposes, the set \(B = \{ ( \mathbf {g}, \phi _{t_\varDelta }(\mathbf {g}) ) \ | \ \mathbf {g}\in G \}\) can then be used to compute zones of control by assigning each position \(\mathbf {p}\in F\) the same label as its closest neighbor from the grid

*G*.

### 4.4 Empirical results

We now compare zones of control obtained by a Voronoi tessellation, the movement models by Taki & Hasegawa and Fujimura & Sugihara, respectively, and the proposed data-driven movement model for the same situation. Figure 6 shows the resulting regions where arrows indicate directions and velocities of movements.

The top left shows a Voronoi tessellation and implements the assumption that every player is able to run in any direction equally fast, hence ignoring actually observed movements. In other words: the closest player always wins and borders of controlled zones are half cuts between players. The assumption leads to implausible zones of control as we will showcase on the example of the white team playing right to left. The white player on the right wing, for example, has a large zone although she is running towards the center of the pitch. Most of the controlled area of that player lies in her back and she would need to turn before being able to head in that direction. The Voronoi model clearly overestimates the right wing of the white team. By contrast, their left wing is underestimated. Although the left winger pushes forward and although her direct opponents only move slowly and head towards the center of the pitch, her zone is small. In contrast to Voronoi tessellations, the proposed approach in the upper right part of the figure clearly eliminates the depicted limitations. For the white team, the zone of the right winger is realistically small and the zone of the left winger realistically large.

Computing controlled zones using the movement model by Taki & Hasegawa leads to the bottom left figure. Borders between zones are often curly as a direct consequence of the nested circles that originate from the assumption that players may accelerate in any direction unbounded (see Fig. 2). The zone of the white left winger evolves drop-like from the actual player position. The underlying movement model also assigns a big part of the right half of the pitch to the black team although white players are closer positioned and some of them even move into this direction. Figure 6 exhibits the limitations of the approach by Taki & Hasegawa.

The movement model by Fujimura & Sugihara corrects some of the limitations of the model by Taki & Hasegawa and, correspondingly, the bottom right part of Fig. 6 appears more realistic. For instance, similar to the proposed approach, the zone of the white left winger seems more appropriate than the Voronoi-based zone. Nevertheless, there are other problems with this model as can be seen on the right wing of the white team. The zone of the winger has shrunk to almost zero although her opponent is still far away and both are moving slowly.

To sum up, out of the four movement models, only the proposed approach leads to realistic controlled zones that are in line with player movements and distances. Either of the competitors suffer from oversimplified assumptions in the movement models and yield unrealistic zones of control. Analyses that build upon one of the three competitors are likely to be crude as they rely on rough approximations of reality. We include more examples of the methods in the “Appendix”.

## 5 Discussion

The previous sections show theoretically and empirically that existing movement models suffer from implausible assumptions. Particularly in the previous section, we observe the clear influence of such oversimplifications in the resulting zones of control for Voronoi tessellations and underlying movement models by Taki & Hasegawa and Fujimura & Sugihara.

The idea of this paper is to avoid cumbersome definitions of complex physics (and possibly oversimplifications) by simply observing player movements. We propose a purely data-driven movement model that intelligently combines all movements of a player into either a probabilistic model or grid-based frequencies. Depending on the application at hand, either the full distribution, some quantile thereof, or the convex hull of observed positions can be processed to compute reachable positions in a predetermined time. Further exploiting the probabilistic nature of the model (or turning the frequencies into probabilities) may provide confidences to possible movements. Empirically, the resulting zones of control are intuitive and can be straightforwardly interpreted with player movements and, hence, constitute a realistic picture of a situation.

As a remark, we note that the zones of control for the three baseline approaches are identical when no player is moving. This can be seen by setting \(v = 0\) in Eq. (1) and (2) for Taki & Hasegawa and Fujimura & Sugihara, respectively, which then reduces the resulting zones of control to a Voronoi tessellation. The time needed to reach an arbitrary position is now a strictly increasing function of the distance to that position. As Fig. 2 shows, the greater the velocities of the players, the greater the differences of the resulting zones.

However, also note that using positional data for estimating movements of players also comes with limitations. The angle estimation from trajectory data via Eq. (4), for instance, is based on the assumption that players always move forward. In other words, the model assumes that the direction a player is facing is in line with her movement. This is not always the case as particularly goalkeepers often move backwards. The model would thus over- or underestimate the time needed for turning around depending on the actual change of direction. A possible remedy could be a better approximation of the angle \(\theta \) rather than using Eq. (4) or devising the angle from an auxiliary data source. Using positional data alone is, however, not sufficient to solve this matter.

*cold-start problem*and similar instances occur in recommendation scenarios (see, e.g., Son 2016). To overcome this problem, a two-component mixture model can be used. The first component utilizes the actual (and continuously updated) movement model \(\mathbb {P}^k_{n_k}\) of the new player

*k*, which is learned on \(n_k\) points. The second component is an

*average*model \(\mathbb {P}^\text {avg}\) over all players (with a similar role) and their data points. The idea is to blend the personalized component with the average component until the former is accurate enough to be used alone. Hence, the model is given by a convex combination of movement models

*N*, \(\lambda = 1\) and the average model is weighted by zero and hence automatically deactivated as desired. The required number of observations depends both on the domain and a player’s speed. In case of soccer and for a given speed range, several thousands of samples appear sufficient to produce satisfactory results. For a field player, those samples can, for instance, be collected in a single match. However, in case of a goalkeeper, it is recommended to always maintain an additional average movement model due to the small number of samples for higher values of initial velocities as she mostly stands or walks during a match. Note that this mixture-approach works for both, the movement models based on KDEs and based on the count matrix

*A*.

Along these lines is also the prediction of pass outcomes (Nakanishi et al. 2009). The idea is to split the trajectory of the ball into small units that are processed one after another. For every unit, the probability that a player reaches the position of the ball during the lifespan of the unit is computed. If an opposing player fulfills this criterion, she intercepts the ball and the computation terminates. If no player intercepts the ball, the pass is completed after processing the final unit. A preliminary 10-fold cross validation on 1194 passes shows that underlying probabilistic movement models lead to prediction accuracies of \(97.5\%\) for pass interception and \(88.5\%\) for pass completion. The approaches by Taki & Hasegawa and Fujimura & Sugihara perform similarly and achieve accuracies of about 93.7 and \(94.5\,\%\) for interception and 69.1 and \(79.5\,\%\) for completion, respectively. Voronoi tesselations perform worst and yield only a correct interception in \(50.9\%\). Note that the underlying Voronoi model always predicts an interception since there is always a player closer to the ball trajectory than the player making the pass in at least one unit. However, a thorough evaluation is necessary to confirm these promising results.

## 6 Conclusion

We proposed a novel data-driven method for estimating individual movement models using positional data. The model is generated by conditioning a player’s whereabouts after a given time on her initial position and velocity. We obtained tables of reachable (*x*, *y*) coordinates for every velocity and time interval and proposed to turn these tables into a probabilistic movement model using kernel density estimation. Alternatively, the tables may be discretized using a grid (\(\epsilon \)-net) to work directly with counts instead of probabilities and hence speed-up computation for (near) real-time scenarios. Movement models were computed for every player individually and the computation could be distributed on many machines to compute movement models for many players and process many games at once. Empirically, we showed the limitations of existing movement models and exemplified the usefulness of the contribution on the example of zones of control. Computing these zones using existing approaches led to crude approximation due to oversimplified assumptions in the respective models. By contrast, the proposed movement models led to realistic and intuitive zones of control.

## Footnotes

- 1.
Note that the velocity can be estimated from positional data in case it is not provided directly.

- 2.
In the figure, a Gaussian-based KDE is used with the bandwidth parameter set to 0.7 [see, e.g., Turlach (1993) for an overview of bandwidth selection methods].

- 3.
Differences in Table 2 between the defender and the midfielder are significant according to a \(\chi ^2\)-test.

## Notes

### Acknowledgements

The authors would like to thank Hendrik Weber and Deutsche Fußball Liga (DFL) and Sportcast GmbH for providing positional data.

## References

- Avci, A., Bosch, S., Marin-Perianu, M., Marin-Perianu, R., & Havinga, P. (2010). Activity recognition using inertial sensing for healthcare, wellbeing and sports applications: A survey. In
*23th International conference on architecture of computing systems 2010*, pp. 1–10.Google Scholar - Barris, S., & Button, C. (2008). A review of vision-based motion analysis in sport.
*Sports Medicine*,*38*(12), 1025–1043.CrossRefGoogle Scholar - Brooks, J., Kerr, M., & Guttag, J. (2016). Using machine learning to draw inferences from pass location data in soccer.
*Statistical Analysis and Data Mining: The ASA Data Science Journal*,*9*(5), 338–349.MathSciNetCrossRefGoogle Scholar - Byrne, M., Parry, T., Isola, R., & Dawson, A. (2013). Identifying road defect information from smartphones.
*Road & Transport Research*,*22*(1), 39–50.Google Scholar - Coutts, A. J., Quinn, J., Hocking, J., Castagna, C., & Rampinini, E. (2010). Match running performance in elite Australian rules football.
*Journal of Science and Medicine in Sport*,*13*(5), 543–548.CrossRefGoogle Scholar - Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters.
*Communications of the ACM*,*51*(1), 107–113.CrossRefGoogle Scholar - D’Orazio, T., & Leo, M. (2010). A review of vision-based systems for soccer video analysis.
*Pattern Recognition*,*43*(8), 2911–2926.CrossRefGoogle Scholar - Fonseca, S., Milho, J., Travassos, B., & Araújo, D. (2012). Spatial dynamics of team sports exposed by voronoi diagrams.
*Human Movement Science*,*31*(6), 1652–1659.CrossRefGoogle Scholar - Franks, A., Miller, A., Bornn, L., & Goldsberry, K. (2015). Characterizing the spatial structure of defensive skill in professional basketball.
*The Annals of Applied Statistics*,*9*(1), 94–121.MathSciNetCrossRefzbMATHGoogle Scholar - Fujimura, A., & Sugihara, K. (2005). Geometric analysis and quantitative evaluation of sport teamwork.
*Systems and Computers in Japan*,*36*(6), 49–58.CrossRefGoogle Scholar - Gottfried, B. (2008). Representing short-term observations of moving objects by a simple visual language.
*Journal of Visual Languages & Computing*,*19*(3), 321–342.CrossRefGoogle Scholar - Gottfried, B. (2011). Interpreting motion events of pairs of moving objects.
*GeoInformatica*,*15*(2), 247–271.CrossRefGoogle Scholar - Grün, Tvd, Franke, N., Wolf, D., Witt, N., & Eidloth, A. (2011).
*A real-time tracking system for football match and training analysis*(pp. 199–212). Berlin Heidelberg: Springer.Google Scholar - Gudmundsson, J., & Horton, M. (2017). Spatio-temporal analysis of team sports.
*ACM Computing Surveys*,*50*(2), 22:1–22:34.CrossRefGoogle Scholar - Gudmundsson, J., & Wolle, T. (2014). Football analysis using spatio-temporal tools.
*Computers, Environment and Urban Systems*,*47*, 16–27.CrossRefGoogle Scholar - Haase, J. & Brefeld, U. (2014). Mining positional data streams. In
*International workshop on new frontiers in mining complex patterns*, pp. 102–116. Springer.Google Scholar - Harmon, M., Lucey, P., & Klabjan, D. (2016). Predicting shot making in basketball learnt from adversarial multiagent trajectories.
*ArXiv e-prints*.Google Scholar - Horton, M., Gudmundsson, J., Chawla, S., & Estephan, J. (2015). Automated classification of passing in football. In
*Pacific-Asia conference on knowledge discovery and data mining*, pp. 319–330. Springer.Google Scholar - Janetzko, H., Sacha, D., Stein, M., Schreck, T., Keim, D. A., & Deussen, O. (2014). Feature-driven visual analytics of soccer data. In
*2014 IEEE conference on visual analytics science and technology (VAST)*, pp. 13–22.Google Scholar - Knauf, K., Memmert, D., & Brefeld, U. (2016). Spatio-temporal convolution kernels.
*Machine Learning*,*102*(2), 247–273.MathSciNetCrossRefzbMATHGoogle Scholar - Lago-Peñas, C., Rey, E., Lago-Ballesteros, J., Casais, L., & Domínguez, E. (2009). Analysis of work-rate in soccer according to playing positions.
*International Journal of Performance Analysis in Sport*,*9*(2), 218–227.CrossRefGoogle Scholar - Lasek, J. & Gagolewski, M. (2015). The winning solution to the AAIA’15 data mining competition: Tagging firefighter activities at a fire scene. In
*2015 Federated conference on computer science and information systems (FedCSIS)*, pages 375–380.Google Scholar - Laube, P., Imfeld, S., & Weibel, R. (2005). Discovering relative motion patterns in groups of moving point objects.
*International Journal of Geographical Information Science*,*19*(6), 639–668.CrossRefGoogle Scholar - Le, H. M., Carr, P., Yue, Y., & Lucey, P. (2017). Data-driven ghosting using deep imitation learning. In
*MIT sloan sports analytics conference*.Google Scholar - Link, D., Lang, S., & Seidenschwarz, P. (2016). Real time quantification of dangerousity in football using spatiotemporal tracking data.
*PLoS ONE*,*11*(12), 1–16.CrossRefGoogle Scholar - Lucey, P., Bialkowski, A., Carr, P., Foote, E., & Matthews, I. (2012). Characterizing multi-agent team behavior from partial team tracings: Evidence from the English Premier League. In
*Proceedings of the twenty-sixth AAAI conference on artificial intelligence*, AAAI’12, pp. 1387–1393. AAAI Press.Google Scholar - Mazimpaka, J. D., & Timpf, S. (2016). Trajectory data mining: A review of methods and applications.
*Journal of Spatial Information Science*,*2016*(13), 61–99.Google Scholar - Memmert, D., Lemmink, K. A. P. M., & Sampaio, J. (2016). Current approaches to tactical performance analyses in soccer using position data.
*Sports Medicine*,*47*, 1–10.CrossRefGoogle Scholar - Mohan, P., Padmanabhan, V. N., & Ramjee, R. (2008). Nericell: Rich monitoring of road and traffic conditions using mobile smartphones. In
*Proceedings of the 6th ACM conference on embedded network sensor systems, SenSys ’08*, pp. 323–336. ACM.Google Scholar - Mutschler, C., Ziekow, H., & Jerzak, Z. (2013). The DEBS 2013 grand challenge. In
*Proceedings of the 7th ACM international conference on distributed event-based systems, DEBS ’13*, pp. 289–294, New York, NY: ACM.Google Scholar - Nakanishi, R., Maeno, J., Murakami, K., & Naruse, T. (2009). An approximate computation of the dominant region diagram for the real-time analysis of group behaviors. In
*Robot soccer world cup*, pp. 228–239. Springer.Google Scholar - Narizuka, T., Yamamoto, K., & Yamazaki, Y. (2014). Statistical properties of position-dependent ball-passing networks in football games.
*Physica A: Statistical Mechanics and its Applications*,*412*, 157–168.MathSciNetCrossRefzbMATHGoogle Scholar - Paefgen, J., Michahelles, F., & Staake, T. (2011). GPS trajectory feature extraction for driver risk profiling. In
*Proceedings of the 2011 international workshop on trajectory data mining and analysis, TDMA ’11*, pp. 53–56, New York, NY: ACM.Google Scholar - Rossi, A., Pappalardo, L., Cintia, P., Fernandez, J., Iaia, F. M., & Medina, D. (2017). Who is going to get hurt? Predicting injuries in professional soccer. In
*Proceedings the machine learning and data mining for sports analytics workshop (MLSA’17), ECML/PKDD*, CGI ’00, pp. 227–235.Google Scholar - Son, L. H. (2016). Dealing with the new user cold-start problem in recommender systems: A comparative review.
*Information Systems*,*58*, 87–104.CrossRefGoogle Scholar - Sprado, J., & Gottfried, B. (2009).
*What motion patterns tell ss about soccer teams*(pp. 614–625). Heidelberg: Springer.Google Scholar - Taki, T. & Hasegawa, J. (2000). Visualization of dominant region in team games and its application to teamwork analysis. In
*Proceedings of the international conference on computer graphics, CGI ’00*, pp. 227–235, Washington, DC: IEEE Computer Society.Google Scholar - Taki, T., Hasegawa, J., & Fukumura, T. (1996). Development of motion analysis system for quantitative evaluation of teamwork in soccer games. In
*Proceedings of 3rd IEEE international conference on image processing*, vol. 3, pp. 815–818.Google Scholar - Turlach, B. A. (1993). Bandwidth selection in kernel density estimation: A review. In
*CORE and institut de statistique*.Google Scholar - Ueda, F., Masaaki, H., & Hiroyuki, H. (2014). The causal relationship between dominant region and offense-defense performance—Focusing on the time of ball acquisition.
*Football Science*,*11*, 1–17.Google Scholar - Voronoi, G. (1908). Nouvelles applications des paramètres continus à la théorie des formes quadratiques. premier mémoire. Sur quelques propriétés des formes quadratiques positives parfaites.
*Journal für die reine und angewandte Mathematik*,*133*, 97–178.MathSciNetzbMATHGoogle Scholar - Zhang, P., Beernaerts, J., Zhang, L., & de Weghe, N. V. (2016). Visual exploration of match performance based on football movement data using the continuous triangular model.
*Applied Geography*,*76*(Supplement C), 1–13.CrossRefGoogle Scholar - Zhao, Y., Yin, F., Gunnarsson, F., Hultkratz, F., & Fagerlind, J. (2016). Gaussian processes for flow modeling and prediction of positioned trajectories evaluated with sports data. In
*2016 19th international conference on information fusion (FUSION)*, pp. 1461–1468.Google Scholar - Zheng, S., Yue, Y., & Hobbs, J. (2016). Generating long-term trajectories using deep hierarchical networks.
*In Advances in Neural Information Processing Systems*,*29*, 1543–1551.Google Scholar - Zheng, Y. (2015). Trajectory data mining: An overview.
*ACM Transactions on Intelligent Systems and Technology*,*6*(3), 29:1–29:41.CrossRefGoogle Scholar