Skip to main content
Log in

Group spatiotemporal pattern queries

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

Group spatiotemporal patterns are certain formations, in space and time, shown by groups of moving objects, such as flocks, concurrence, encounter, etc. A large number of recent applications focus on the collective behavior of moving objects, rather than the individual movements. Therefore finding such groups in moving object databases is crucial. There exist, in the literature, smart algorithms for matching some of these patterns. These solutions, however, address specific patterns and require specialized data representation and indexes. They share too little to be integrated into a single system. There is a need for a generic query method that allows users to fill in pattern descriptions, and retrieve the set of matches. In this paper, we propose generic query operators that can consistently express and match a wide range of group spatiotemporal patterns. We formally define these operators, illustrate the evaluation algorithms, and discuss the issues of their integration with moving object database (MOD) systems. These operators have been implemented in the context of Secondo MOD system, and the implementation is available online as open source. Several examples are given to showcase the expressive power of the operators. We have made available scripts that can be invoked from the Secondo interface to automatically repeat some of the experiments in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. The set operations ⊂ , ⊆ read as ispropersubset, issubset. We intentionally avoid denoting the in operation using the symbol ∈ to differentiate between usetmset and the lifted data in mset.

  2. We use U as input to the crosspattern operator instead of S to be able to do some pre-filtering. For example, interesting pairs of candidates from S may be those coming close to each other during their lifetime, and they can possibly be determined efficiently using indexes. Evaluating instead all pairs from S may be prohibitively expensive.

  3. http://www.behaviorworks.com/people/ckline/cornellwww/boid/boids.html

  4. Secondo uses FLOBs (Faked Large OBjects) to store moving objects. FLOBs may be stored in memory or on disk as decided by the FLOB manager. The decision is made based on FLOB size. Large FLOBs are written to disk.

  5. Copying the whole history of the active component is expensive. Some of these copies might not appear in the result if their components disappear from the pattern graph early before their duration reaches d. In our implementation, we represent R by two lists: a list that holds parts of the component history, and another list that holds indexes to the first list. Using these two lists, the parts of the history that are shared by several active components are stored only once in the first list, and referenced several times in the second list. At the time of moving an active component from R to R, the parts that constitute this active component are concatenated together.

  6. To guarantee a unique minimal representation of msets, which is a required feature in the base moving objects model, adjacent units having the same set of elements are merged together into a single unit, whose time interval is the union of their time intervals.

References

  1. Geopkdd website geographic privacy-aware knowledge discovery and delivery. http://www.geopkdd.eu

  2. Secondo web site. http://dna.fernuni-hagen.de/secondo.html/

  3. Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843. doi:10.1145/182.358434

    Article  Google Scholar 

  4. Andrienko G, Andrienko N, Wrobel S (2007) Visual analytics tools for analysis of movement data. SIGKDD Explor Newsl 9:38–46. doi:10.1145/1345448.1345455

    Article  Google Scholar 

  5. Asur S, Parthasarathy S, Ucar D (2009) An event-based framework for characterizing the evolutionary behavior of interaction graphs. ACM Trans. Knowl. Discov. Data 3(4):16:1–16:36. doi:10.1145/631162.1631164

    Google Scholar 

  6. Benkert M, Gudmundsson J, Hübner F, Wolle T (2008) Reporting flock patterns. Comput Geom Theory Appl 41(3):111–125. doi:10.1016/j.comgeo.2007.10.003

    Article  Google Scholar 

  7. Bui-Xuan BM, Ferreira A, Jarry A (2003) Computing shortest, fastest, and foremost journeys in dynamic networks. Int J Found Comput Sci 14(2):267–285. doi:10.1142/S0129054103001728. http://www-apr.lip6.fr/~buixuan/files/BFJ03.pdf

    Article  Google Scholar 

  8. Cotelo Lema JA, Forlizzi L, Güting RH, Nardelli E, Schneider M (2003) Algorithms for moving objects databases. Comput J 46(6):680–712

    Article  Google Scholar 

  9. Dodge S, Weibel R, Lautenschütz AK (2008) Towards a taxonomy of movement patterns. Inf Vis 7(3):240–252. doi:10.1057/palgrave.ivs.9500182

    Article  Google Scholar 

  10. Düntgen C, Behr T, Güting RH (2009) Berlinmod: a benchmark for moving object databases. VLDB J 18(6):1335–1368. doi:10.1007/s00778-009-0142-5

    Article  Google Scholar 

  11. Eppstein D, Galil Z, Italiano GF (1999) Dynamic graph algorithms. In: Atallah MJ (ed) Algorithms and theory of computation handbook, chap 8. CRC Press. http://www.info.uniroma2.it/italiano/Papers/dyn-survey.ps.Z

  12. Forlizzi L, Güting RH, Nardelli E, Schneider M (2000) A data model and data structures for moving objects databases. In: SIGMOD ’00: proceedings of the 2000 ACM SIGMOD international conference on management of data. ACM, New York, pp 319–330. doi:10.1145/342009.335426

    Book  Google Scholar 

  13. Güting RH (1993) Second-order signature: a tool for specifying data models, query processing, and optimization. SIGMOD Rec 22(2):277–286. doi:10.1145/170036.170079

    Article  Google Scholar 

  14. Giannotti F, Nanni M, Pedreschi D, Renso C, Rinzivillo S, Trasarti R (2009) Geopkdd–geographic privacy-aware knowledge discovery. In: The European future technologies conference (FET 2009)

  15. Giannotti F, Nanni M, Pinelli F, Pedreschi D (2007) Trajectory pattern mining. In: KDD’07, pp 330–339

  16. Gudmundsson J, van Kreveld M, Speckmann B (2004) Efficient detection of motion patterns in spatio-temporal data sets. In: GIS ’04: proceedings of the 12th annual ACM international workshop on geographic information systems. ACM, New York, pp 250–257. doi:10.1145/032222.1032259

    Google Scholar 

  17. Güting RH, Almeida V, Ansorge D, Behr T, Ding Z, Höse T, Hoffmann F, Spiekermann M, Telle U (2005) secondo: an extensible DBMS platform for research prototyping and teaching. In: ICDE ’05: proceedings of the 21st international conference on data engineering. IEEE Computer Society, Washington, DC, pp 1115–1116

    Google Scholar 

  18. Güting RH, Behr T, Almeida V, Ding Z, Hoffmann F, Spiekermann M (2004) secondo: an extensible DBMS architecture and prototype. Tech Rep Informatik-Report 313 FernUniversität Hagen

  19. Güting RH, Böhlen MH, Erwig M, Jensen CS, Lorentzos NA, Schneider M, Vazirgiannis M (2000) A foundation for representing and querying moving objects. ACM Trans Database Syst 25(1):1–42. doi:10.1145/352958.352963

    Article  Google Scholar 

  20. Jeung H, Shen HT, Zhou X (2008) Convoy queries in spatio-temporal databases. In: ICDE ’08: proceedings of the 2008 IEEE 24th international conference on data engineering. IEEE Computer Society, Washington, DC, pp 1457–1459. doi:10.1109/ICDE.2008.4497588

  21. Kalnis P, Mamoulis N, Bakiras S (2005) On discovering moving clusters in spatio-temporal data. In: SSTD, pp 364–381

  22. Kamath KY, Caverlee J (2011) Transient crowd discovery on the real-time social web. In: Proceedings of the fourth ACM international conference on Web search and data mining, WSDM ’11. ACM, New York, pp 585–594. doi:10.1145/935826.1935909

    Google Scholar 

  23. Laube P, Imfeld S, Weibel R (2005) Discovering relative motion patterns in groups of moving point objects. Int J Geogr Inf Sci 19(6):639–668

    Article  Google Scholar 

  24. Laube P, Kreveld M, Imfeld S (2004) Finding REMO—detecting relative motion patterns in geospatial lifelines. In: Developments in spatial data handling: proceedings of the 11th international symposium on spatial data handling. Springer, Berlin Heidelberg, pp 201–215. doi:10.1007/b138045

    Google Scholar 

  25. Li Z, Han J, Ji M, Tang LA, Yu Y, Ding B, Lee JG, Kays R (2011) Movemine: mining moving object data for discovery of animal movement patterns. ACM Trans Intell Syst Technol 2(4):37:1–37:32. doi:10.1145/989734.1989741

    Article  Google Scholar 

  26. Li Z, Ji M, Lee JG, Tang LA, Yu Y, Han J, Kays R (2010) MoveMine: mining moving object databases. In: SIGMOD ’10: proceedings of the 2010 international conference on management of data. ACM, New York, pp 1203–1206. doi:10.1145/807167.1807319

    Google Scholar 

  27. Ortale R, Ritacco E, Pelekis N, Trasarti R, Costa G, Giannotti F, Manco G, Renso C, Theodoridis Y (2008) The daedalus framework: progressive querying and mining of movement data. In: GIS, p 52

  28. Pelekis N, Theodoridis Y, Vosinakis S, Panayiotopoulos T (2006) HERMES–a framework for location-based data management. In: Proceedings of EDBT 2006

  29. Ramanathan A, Agarwal PK, Kurnikova M, Langmead CJ (2009) An online approach for mining collective behaviors from molecular dynamics simulations. In: Proceedings of the 13th annual international conference on research in computational molecular biology, RECOMB 2’09. Springer-Verlag, Berlin, Heidelberg, pp 138–154. doi:10.1007/978-3-642-02008-7_10

    Google Scholar 

  30. Ren C, Lo E, Kao B, Zhu X, Cheng R (2011) On querying historical evolving graph sequences. PVLDB 4(11):726–737

    Google Scholar 

  31. Sakr M (2012) Spatiotemporal pattern queries. Ph.D. thesis, Fern Universität Hagen. http://deposit.fernuni-hagen.de/2814/

  32. Sakr M, Güting RH (2011) Spatiotemporal pattern queries. GeoInformatica 15:497–540. doi:10.1007/s10707-010-0114-3

    Article  Google Scholar 

  33. Tang LA, Zheng Y, Yuan J, Han J, Leung A, Hung CC, Peng WC (2012) On discovery of traveling companions from streaming trajectories. In: IEEE 28th international conference on data engineering (ICDE) 2012, pp. 186–197. doi:10.1109/ICDE.2012.33

  34. Trasarti R (2010) Mastering the spatio-temporal knowledge discovery process. Ph.D. thesis, University of Pisa Department of Computer Science, Italy

  35. Trasarti R, Giannotti F, Nanni M, Pedreschi D, Renso C (2011) A query language for mobility data mining. IJDWM 7(1):24–45

    Google Scholar 

  36. Wolfson O, Xu B, Chamberlain S, Jiang L (1998) Moving objects databases: Issues and solutions. In: SSDBM’98: 10th international conference on scientific and statistical database management, pp 111–122

  37. Xiao D, Eltabakh M (2013) Stepq: Spatio-temporal engine for complex pattern queries. In: Nascimento M, Sellis T, Cheng R, Sander J, Zheng Y, Kriegel HP, Renz M, Sengstock C (eds) Advances in spatial and temporal databases, lecture notes in computer science, vol 8098, pp 386–390. Springer Berlin Heidelberg

  38. Zheng K, Zheng Y, Yuan NJ, Shang S, Zhou X (2013) Online discovery of gathering patterns over trajectories. IEEE Trans Knowl Data Eng 99(PrePrints): 1. doi:10.1109/TKDE.2013.160

    Google Scholar 

  39. Zheng Y, Yuan NJ, Zheng K, Shang S (2013) On discovery of gathering patterns from trajectories. In: Proceedings of the 2013 IEEE international conference on data engineering (ICDE 2013), ICDE ’13. IEEE Computer Society, Washington, DC, pp 242–253. doi:10.1109/ICDE.2013.6544829

  40. Zheng Y, Zhou X (eds) (2011) Computing with Spatial Trajectories. Springer

  41. Zhou S, Chen D, Cai W, Luo L, Low MYH, Tian F, Tay VSH, Ong DWS, Hamilton BD (2010) Crowd modeling and simulation technologies. ACM Trans Model Comput Simul 20(4):20:1–20:35. doi:10.1145/842722.1842725

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahmoud Attia Sakr.

Appendix A: The crosspattern evaluation Algorithm

Appendix A: The crosspattern evaluation Algorithm

We discuss the evaluation algorithm of the crosspattern operator in the appendix, because of the many details it incorporates. Abstractly speaking, the function of the crosspattern operator is to search for large connected components within a time-dependent graph (i.e., the pattern graph), which is constructed from the evaluations of the time-dependent predicate on pairs of moving objects. As discussed before, the general form of the crosspattern operator is:

$$U~\mathbf{crosspattern}[id_{1},~id_{2},~\alpha,~d,~n, \mathit{subgraph-type}]. $$

The function of the crosspattern operator can be divided into three sub-functions:

  1. 1.

    Constructing the pattern graph PG(S , α).

  2. 2.

    Searching for the large connected components of type subgraph-type within the pattern graph. Large means that they must fulfill the minimum duration d constraint, and the minimum group cardinality n constraint.

  3. 3.

    Representing the found connected components, and yielding them.

1.1 A.1 Related work (REVIEW)

The problem of the crosspattern operator is indeed a generic one, and it has applications in other fields. Generally it helps analyzing the temporal behavior of evolving networks. Social network users and the evolution of their relationships can be modeled by pattern graphs. The α predicate in such a case might be representing whether a pair of users are friends, or whether they contact one another on a daily basis. Other applications are the modeling of the link availability in communication networks, especially wireless and ad-hoc networks, and the link analysis in the evolving web. Such topics are in the focus of several recent workshops.

Up to our knowledge, this problem has not yet been studied in the context of moving object databases. It seems natural however to have a type for time-dependent graphs (moving graphs) and relevant operations defined on top of it. In the context of communication networks, there are works on the so called Evolving Graphs [7] to model network dynamics. An evolving graph is a sequence of snapshots, each of them representing the graph at a single time instant. A recent work [30] proposed a compression technique, so that the space requirements of evolving graphs are affordable. We see two properties of evolving graphs that makes them unsuitable for representing pattern graphs:

  1. 1.

    The pattern graph is a continuous mapping from time to graph. The evolving graph is a discrete representation. Clearly the two models are not equivalent.

  2. 2.

    An evolving graph requires O(n) storage space, even after applying the compression technique in [30], where n is the number of changes occurring in the graph. In the case of the crosspattern operator, the number of changes normally ranges from 4 to 6 order-of-magnitude. This would easily exceed several GB of memory space.

We are also not informed of works on finding connected components in evolving graphs.

Connectivity queries are studied in the context of the dynamic graphs. A dynamic graph is a standard graph that undergoes a sequence of edge additions or deletions. If both additions and deletions are allowed, it is called a fully dynamic graph. Dynamic graph techniques focus on online graph updates and queries. Only the most recent graph is represented, and no history is kept. A dynamic graph is different from a pattern graph, because the latter needs to represent the history. On the other hand, a dynamic graph can be derived from a pattern graph by means of a temporal scan.

There are smart algorithms for maintaining the minimum spanning forest in fully dynamic graphs [11]. That is, instead of re-evaluating the minimum spanning forest after every graph update (i.e., edge addition or deletion), the algorithm does this evaluation only once in the beginning, and tries to efficiently update the forest after every graph update. Such algorithms can easily be modified to maintain connected components instead of the minimum spanning forest.

There are two main differences that make the problem of the crosspattern operator more complex than the problem of maintaining connected components in fully dynamic graphs:

  1. 1.

    The crosspattern operator searches for large connected components in terms of the two constraints d, n as we explained before.

  2. 2.

    The crosspattern operator is required to store/represent the result (i.e., the large connected components) as time-dependent graphs. The algorithms for maintaining minimum spanning forests allow only for online queries, and keep no history.

Asur et al. [5] proposed a method for identifying the evolution of groups/communities in time-dependent interaction graphs, such as the merging of several groups, the split of a group, the formation of a new group, or the dissolve of an existing group. The continuous mapping from time to graph was modeled in [22]. The model was used to discover and track the groups on social web applications that appear for short times. This work used the localization characteristics of such groups to restrict the search space within the graph to smaller sets of nodes and smaller time intervals.

The problem of the crosspattern operator combines several sub problems that were separately addressed in the works reviewed in this section. It requires: continuous mapping from time to graph, finding connected components and maintaining them through the graph updates, handling multiple graph updates and simultaneous edge additions and deletions, identifying group evolution events (e.g., split, merge, dissolve), representing the history of the group, and the novel problem of finding large connected components. We describe in the following section an algorithm that solves this problem. It shares concepts from these works. But it implements these concepts in a way that reduces the redundant computations among them and that accommodates them in a common data model. We think that there should be a future study towards implementing an algebra of time-dependent graphs in database systems. Such an algebra should propose an abstract data type, and a set of generic operations.

1.2 A.2 The crosspattern evaluation Algorithm

Algorithm 7 illustrates the evaluation of crosspattern. The first argument is a stream of tuples that corresponds to the set U, defined in Section 6. Every tuple in U has a pair of moving objects, whose identifiers can be queried using the two mappings IDFun 1, IDFun 2. The first part of the algorithm (Lines 3–8) constructs the pattern graph, which is represented by the mset instance Accumulator. The Accumulator contains edge identifiers, such that an edge identifier belongs to the Accumulator whenever its corresponding edge exists in the pattern graph. For each tuple in U, the identifiers of the two contained moving objects are used to compute a unique edge identifier by the function NodesToEdge. This function has an inverse EdgeToNodes. The two functions compute the edge identifier given its two nodes’ identifiers, and vice versa in a constant time. The edge itself is computed by evaluating the time-dependent predicate α for the current tuple. Note that the definition time of the edge corresponds to the time periods in which α(u) is true. Finally the edge is constructed using the mbool2mset function (see Algorithm 2), and inserted into the Accumulator.

In the experiments, we have seen that the number of units in the Accumulator, and the number of edge identifiers in every unit are very large. The straightforward implementation of the mset type in Secondo stores the edge identifiers of every unit. Knowing that for a medium input size (e.g., U has 4 order-of-magnitude tuples), the number of units in the Accumulator is 5 order-of-magnitude, and every unit has 2-3 order-of-magnitude edge identifiers. Under such conditions, the Accumulator would require several GB of disk space.Footnote 4 We have implemented a second variant of the mset, that stores in every unit only the changes from the previous unit (i.e., edges that are added or removed). This variant considerably reduces the space requirement, because in most of the time the change is adding or removing a single edge.

We have also implemented the union operation in Line 8 in a different way that works efficiently with this mset implementation. Inside the loop (Lines 3–8), we use a sorted list to buffer events in the form of at time t 1 the edge e j starts to be contained in Accumulator, and at time t 2 the edge e j ends to be contained in Accumulator. There are two such events for every time interval in which α is fulfilled: one event for joining, and the other for leaving the Accumulator. These events are sorted inside the list by increasing time. Buffering an event is O(log(n)) in the buffer size. After the loop ends, the Accumulator is constructed from the buffer in a single linear scan.

We expect that the most expensive part in constructing the pattern graph will be the evaluation of α(u) (Line 7). This has been experimentally confirmed in Section 10.2. The cost varies according to the associated time-dependent predicate α. The predicate distance(. , .) < . between two mpoint objects, for instance, is O( n + m), where n and m are the number of units of the two arguments. More about time-dependent predicates, and their evaluation algorithms can be found in [8].

The rest of the crosspattern algorithm finds the sub graphs that fulfill the user criteria n, d, q. First the Accumulator is split at the locations where it has definition gaps, if any. The result is a list of mset instances that belong to MSetPart. The search proceeds considering each of these parts separately as a pattern graph, finding the sub graphs in each, and concatenating the results into a single output stream R. We have invested a considerable time in developing a fast algorithm for finding large connected components in the pattern graph. Since the possible values of q are all special kinds of connected components (e.g., clique), we start first by finding the large connected components, then search for sub graphs of kind q within them. At the time of writing this paper, only the algorithm for finding connected components is implemented. So queries looking for clique, walk, etc are not yet available in Secondo.

Finding large connected components is done in two steps as shown in Algorithm 8. The first step finds the connected components (must not be large). The second step filters the nodes and edges of each found component according to the thresholds n, d, and keep only the large components. Note that applying the thresholds can only result in edges being removed from the connected component. It is not possible that edges that do not belong to a connected component be there in the large connected component. Thus the function ApplyThresholds does not need to know about the pattern graph. Rather, it is a function in the connected component, and the two thresholds n and d, in order to find the large connected components. While applying the thresholds, a connected component might split into several, or it might disappear completely if too many nodes and edges are filtered out. Thus, the result of ApplyThresholds is a stream of large components, which is possibly empty.

The function FindConnectedComponents is described in Algorithm 9. It iterates over the units of the mset representing the pattern graph. This iteration is equivalent to a temporal scan of the pattern graph. The result of such a scan is a dynamic graph, as mentioned in Section A.1. Every uset in the pattern graph contains two sets that describe the changes to the graph edges: addedEdges, and removedEdges. The algorithm starts with an empty standard graph g (used to maintain the most recent snapshot of the time-dependent graph), and an empty list of connected components Components. While iterating over the units, the algorithm tries to efficiently maintain g and Components, so that they will be up to date with the last visited unit. The Components list holds only the connected components in g whose number of nodes ≥ n. The nodes of g know the labels of their components if they belong to any, otherwise null. In the beginning of an iteration, g is a snapshot of the pattern graph at the end instant of the unit in the previous iteration, and Components holds the connected components in g. During the iteration, the goal is to update g and Components to reflect the recent changes to the pattern graph, represented by the two sets addedEdges and removedEdges.

Most of the time, an update to the pattern graph is an addition or a removal of a single edge. It occurs less often that an update involves multiple edge additions and/or removals. In the experiment in Section 10.2, 27.3 % of the updates involved multiple edges, while 72.7 % were single edge updates. Algorithm 9 is designed to handle multiple edge updates, because it is the general case.

Newly added edges are first inserted into g. Then the algorithm inspects their effect on the existing components, which can be any of the following:

  1. 1.

    Some of the new edges might connect to one another and to other edges in g forming a new component having n nodes or more. This new component is then added to the list Components and marked as NewlyAdded (Line 9 of the algorithm).

  2. 2.

    Some of the new edges might connect to one of the existing components causing it to grow up its number of edges (and probably nodes). Such a component is marked as GotMoreEdges (Line 10).

  3. 3.

    Some of the new edges might connect several existing components together. These components are marked as Merged, as a preparation to merge them together into a single larger component. This large component will consist of the Merged components and the newly added edges.

  4. 4.

    The rest of edges stay in g, and have no effect on the Components list.

These four cases are checked in the same way. A breadth first search that begins from the newly added edges is started to find their entire connected components in g. A search path is terminated if it reaches a node that already belongs to a component in Components. This search is done in linear time in terms of the number of traversed nodes and edges. It traverses at most the newly added edges plus the edges of two components having n − 1 nodes. This bound is reached if it happens that the newly added edge connect two components in the graph having n − 1 nodes each. Notice that the algorithm will not traverse edges of the already maintained components having at least n nodes. Every newly added edge contributes to exactly one of the above cases. After this inspection, every component in the list Components has one of the states: NotChanged, NewlyAdded, GotMoreEdges, or Merged.

The algorithm proceeds to handle edge removals. It iterates over the edges to be removed, and finds in g whether they belong to components. Looking up the component of an edge is O(1) by looking up the component of any of its nodes. The removal of edges might affect the existing components in one of the following ways:

  1. 1.

    Edges that belong to no component are removed from g, and have no further effects.

  2. 2.

    Edges that belong to components are removed from them. This might result in the dissolve or the split of these components.

The loop in Lines 14–16 collects all the components that are affected by edge removals into the set affectedComponents. The connectivity of each of the affected components is checked, to see whether it is still connected. This is done in O(nodes + edges) of the component.

Now we need to update the states of the affected components. Note that these components might have already states assigned to them in the part handling added edges (this is rather rare). Table 6 illustrates the state composition. Column headers list the possible states after handling added edges, and the row headers list the possible states after handling removed edges. The cells of the table show the result of the composition. In the experiment in Section 10.2, we have counted the occurrences of each of these component states. The percentages are listed in Table 7. These states are required for updating.

Table 6 Composing component states
Table 7 Component states occurrence frequencies

Figure 5 illustrates an example of a graph update that causes the state NewlyAdded followed by RemoveNow. The minimum component size in this example is n = 5. Figure 5 (A) displays the graph before the update. The update adds the edges {(3,1), (6,9)}, and removes the edges {(2,5), (2,6)}. The algorithm handles the added edges first, and the result is a NewlyAdded component having 6 nodes (Fig. 5 (B)). Then the removed edges are handled, and the result is the removal of this component (RemoveNow), because it is split into two small components. Composing NewlyAdded with RemoveNow results in the state RemoveNow for the two small components in Fig. 5 (C). Similarly, Fig. 6 illustrates NewlyAdded followed by Split, and Fig. 7 illustrates Merged followed by Split.

Fig. 5
figure 5

State NewlyAdded followed by state RemoveNow

Fig. 6
figure 6

State NewlyAdded followed by state Split

Fig. 7
figure 7

State Merged followed by state Split

Clearly, situations like these examples hardly occur in real world applications. Consider for examples the application of finding moving clusters within a group of animals. The example in Fig. 5 happens when the two pairs of animals (3,1), (6,9) come close to one another (i.e., below the distance threshold), and the pairs (2,5), (2,6) come apart from one another at exactly the same time instant. It is unlikely that the movement is synchronized in this way.

Now as the algorithm knows the updates that occurred to every component, it propagates these updates to the result stream R . Every mset element in R represents the history of a connected component in the given pattern graph. At the end of every iteration, the function Finalize appends a unit (uset) to every element in R representing the change that happened to this component in this iteration.

We do not list here the algorithm of Finalize, but we briefly describe it in the following. It maintains two lists of mset. The list R holds the connected components that do not receive updates anymore. These are connected components that used to exist in the past and are currently dissolved. The list R holds the active components that still receive updates. Elements of R are linked to the elements of the Components list that propagate changes to them. Finalize receives the Components list and iterates over it. According to the state of every component it does one of the following:(FOOTNOTE5)

  1. 1.

    A NewlyAdded component results in creating and adding a new mset instance to R . This reflects that a new connected component started to appear in the pattern graph, and that Finalize has to track it. This new mset instance has a single uset having the same time interval as the unit of the pattern graph that is currently being processed, and the set of edges of the new component.

  2. 2.

    A component whose state is one of GotMoreEdges, LostEdges, or AddRemoveMix results in extending the associated active component in R by an additional uset having the same time interval as the unit of the pattern graph that is being currently processed, and the set edges in the component after the update.

  3. 3.

    A non changed component (i.e., state is NotChanged) results in extending the associated active component in R by an additional uset having the same time interval as the unit of the pattern graph that is being currently processed, and the same set of edges as its last uset instance.Footnote 5

  4. 4.

    If the state of the component is RemoveNow, the associated active component is moved from R to R. This is done under the condition that the duration of this active component be ≥ the minimum duration threshold d, otherwise it is ignored by being removed from R , and not put into R.

  5. 5.

    If several components merge together (i.e., their state is Merged), this means that they have been separate in the past, but starting from now they share the same set of edges. Finalize extends each of the associated active components by an additional uset having the same time interval as the unit of the pattern graph that is being currently processed, and the set of edges that consists of all the edges in the merged components, plus the newly added edges that caused them to merge.

  6. 6.

    If a component is marked as Split, this means that an active component in R splits at this time instant into several smaller components. Finalize makes as many copies of this active component as the number of the split components, and extends every copy with an mset whose time interval is the same as the unit of the pattern graph that is being currently processed, and whose set of edges is the set of edges of one of the split components.Footnote 6

  7. 7.

    If a component is marked as ReDistribute, this means that Finalize does not really understand the update. It has no clue which active components should be updated, and how. This state happens only if a component is marked as Merged then as Split in the same iteration, as illustrated in Fig. 7. In such a case, the links between this component and the associated active components might be invalid. Finalize needs to re-investigate the update. It collects in one hand all the components that are marked as ReDistribute, and in the other hand all their associated active components, and it matches the two hands. A component matches an active component if they have common graph nodes. After the matching is done, Finalize is able to assign to every component one of the states above (e.g., Split, GotMoreEdges), and process it accordingly. The ReDistribute state was never observed in the experiment in Section 10.2.

At the end of Finalize, it resets the state of all components in Components to NotChanged, preparing for the next iteration.

Back to Algorithm 8, the next step after finding the connected components is to apply the thresholds n, d to them, in order to yield the large connected components. Please note that these thresholds apply to the nodes of the connected components, and note that the connected components are represented as mset instances that store the edges. Hence, the ApplyThresholds function needs to transform the component it receives from a representation of graph edges into a representation of graph nodes. Using a linear scan of the received component, it builds a hash table in the form shown in Fig. 8 (A), where every node has one entry. An i n t e r v a l i entry represents one of the intervals on which this NodeID belongs to the connected component. It stores also pointers to the units of the connected components that fall within this interval. ApplyThresholds also builds a map in the form shown in Fig. 8 (B), where Unit is a pointer to one of the units (uset) of the connected component. This map stores the number of nodes in every unit, so that the threshold n can be quickly applied.

Fig. 8
figure 8

Helper indexes for representing the nodes of the pattern graph

The hash table and the map are linked to the units of the mset representing the connected component, in such a way that the three of them can be quickly synchronized if any of them is changed. ApplyThresholds makes sure that the three structures are always synchronized. To apply the d threshold, it scans the hash table and filters out all the intervals whose duration is less than d. Such a change affects the node count of the units, and it is reflected in the map by means of synchronization. To apply the n threshold, ApplyThresholds scans the map and removes the units that have less than n nodes. Such a change affects the hash table, and it inserts definition gaps within the connected component. ApplyThresholds keeps applying the two thresholds iteratively until no more changes occur. It splits the connected component at the locations where it has definition gaps. The pieces that come out of this split are all large connected components, so they are appended to the result stream.

Back to Algorithm 7 Line 12, the function FindSubGraphs receives this stream of large connected components, and searches for the sub graphs of type q within them. Up to the time of writing this paper, this function has not been implemented. Naively, one would search every unit in the large connected components for sub graphs of type q, and concatenate the results to construct the whole history of the sub graph. Smarter algorithms are still missing, and require further research.

Algorithm 7 has one more missing part. It deals only with undirected pattern graphs. This means that the time-dependent predicate α must be commutative. So far, the group patterns that we have seen in the literature can be expressed using commutative predicates. Most of the patterns are expressed based on the distance between objects, or on a derivative of this distance. Still, the crosspattern operator would be more expressive if it allowed for non-commutative predicates as well. This completes our illustration of the crosspattern algorithm.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sakr, M.A., Güting, R.H. Group spatiotemporal pattern queries. Geoinformatica 18, 699–746 (2014). https://doi.org/10.1007/s10707-013-0198-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-013-0198-7

Keywords

Navigation