Map construction algorithms: a local evaluation through hiking data

Abstract

We study five existing map construction algorithms, designed and tested with urban vehicle data in mind, and apply them to hiking trajectories with different terrain characteristics. Our main goal is to better understand the existing strategies and their limitations, in order to shed new light into the current challenges for map construction algorithms. We carefully analyze the results obtained by each algorithm focusing on the local details of the generated maps. Our analysis includes the characterization of 10 types of common artifacts, which occur in the results of more than one algorithm, and 7 algorithmic-specific artifacts, which are consequences of different algorithmic strategies. This allows us to extract systematic conclusions about the main challenges to fully automatize the construction of maps from trajectory data, to detect the strengths and weaknesses of the potential different strategies, and to suggest possible ways to design higher-quality map construction methods. We consider that this analysis will be of help for designing new and better methods that perform well in wider and more realistic contexts, not only for road map or hiking reconstruction, but also for other types of trajectory data.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    See http://openstreetmap.org and http://wikimapia.org, respectively.

  2. 2.

    Unfortunately, we were not able to include two of the seven algorithms analyzed in [3]. This was due to not having the code available in one case [17], and to technical issues when attempting to run the algorithm in the other case [8].

  3. 3.

    http://www.wikiloc.com

  4. 4.

    http://www.wikiloc.com

  5. 5.

    The reasons not to include two of the algorithms were: implementation not available for one of them, and technical issues when trying to reproduce previous experiments in the other case.

  6. 6.

    In the original paper [5] a third step is described, where identified portions can be adjusted. However, the implementation made public ignores this third phase.

  7. 7.

    We note, however, that this is what is done in the implementation, but the criterion described in the original paper [19] is temporal.

  8. 8.

    The computer used was equipped with an Intel i5-2500K CPU and 8GB DDR3 Synchronous 1600 MHz memory.

References

  1. 1.

    Ahmed M, Fasy BT, Gibson M, Wenk C (2015) Choosing thresholds for density-based map construction algorithms. In: Proc 23rd Int Conf on Geographic Information Systems, ACM, pp 24

  2. 2.

    Ahmed M, Fasy BT, Hickmann KS, Wenk C (2015) A path-based distance for street map comparison. ACM Trans Spatial Algorithms Syst 1(1):3:1–3:28

    Article  Google Scholar 

  3. 3.

    Ahmed M, Karagiorgou S, Pfoser D, Wenk C (2015) A comparison and evaluation of map construction algorithms using vehicle tracking data. GeoInformatica 19(3):601–632

    Article  Google Scholar 

  4. 4.

    Ahmed M, Karagiorgou S, Pfoser D, Wenk C (2015) Map Construction Algorithms. Springer

  5. 5.

    Ahmed M, Wenk C (2012) Constructing street networks from GPS trajectories. In: Proc ESA, pp 60–71

  6. 6.

    Alt H, Guibas LJ (2000) Chapter 3 - discrete geometric shapes: Matching, interpolation, and approximation*. In: Sack J-R, Urrutia J (eds) Handbook of Computational Geometry, North-Holland, Amsterdam, pp 121–153

  7. 7.

    Biagioni J, Eriksson J (2012) Inferring road maps from GPS traces: Survey and comparative evaluation. In: Transportation Research Board, 91st Annual, pp 61–71

  8. 8.

    Biagioni J, Eriksson J (2012) Map inference in the face of noise and disparity. In: Proc 20th Int Conf Advances in Geographic Information Systems, SIGSPATIAL ’12. ACM, New York, pp 79–88

  9. 9.

    Buchin K, Buchin M, Duran D, Fasy BT, Jacobs R, Sacristán V, Silveira RI, Staals F, Wenk C (2017) Clustering trajectories for map construction. In: Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2017, Redondo Beach, November 7-10, 2017, pp 14:1–14:10

  10. 10.

    Cao L, Krumm J (2009) From GPS traces to a routable road map. In: Proc 17th ACM SIGSPATIAL, pp 3–12

  11. 11.

    Chen C, Cheng Y (2008) Roads digital map generation with multi-track GPS data. In: Proc Workshops on Education Technology and Training, and on Geoscience and Remote Sensing, IEEE, pp 508–511

  12. 12.

    Davies JJ, Beresford AR, Hopper A (2006) Scalable, distributed, real-time map generation. IEEE Pervasive Computing 5(4):47–54

    Article  Google Scholar 

  13. 13.

    Dey TK, Wang J, Wang Y (2017) Improved road network reconstruction using discrete morse theory. In: Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’17. ACM, New York, pp 58:1–58:4

  14. 14.

    Duran D, Sacristán V, Silveira RI (2016) Map construction algorithms: an evaluation through hiking data. In: Proceedings of the 5th ACM SIGSPATIAL International Workshop on Mobile Geographic Information Systems, MobiGIS 2016, Burlingame, October 31, 2016, pp 74–83

  15. 15.

    Edelkamp S, Schrödl S (2003) Route Planning and Map Inference with Global Positioning Traces. Springer, Berlin, pp 128–151

    Google Scholar 

  16. 16.

    Fathi A, Krumm J (2010) Detecting road intersections from GPS traces. In: Proc 6th Int Conf on Geographic Information Systems, pp 56–69

  17. 17.

    Ge X, Safa I, Belkin M, Wang Y (2011) Data skeletonization via reeb graphs. In: Shawe-taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ (eds) Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain, pp 837–845

  18. 18.

    Guo T, Iwamura K, Koga M (2007) Towards high accuracy road maps generation from massive GPS traces data. In: Proc IEEE Int Geoscience and Remote Sensing Symp, pp 667–670

  19. 19.

    Karagiorgou S, Pfoser D (2012) On vehicle tracking data-based road network generation. In: Proc 20th Int Conf on Advances in Geographic Information Systems, pp 89–98

  20. 20.

    Liu X, Biagioni J, Eriksson J, Wang Y, Forman G, Zhu Y (2012) Mining large-scale, sparse GPS traces for map inference: Comparison of approaches. In: Proc 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12. ACM, New York, pp 669–677

  21. 21.

    Mariescu-Istodor R, Fränti P (2018) Cellnet: Inferring road networks from GPS trajectories. ACM Trans Spatial Algorithms and Systems 4(3):8:1–8:22

    Article  Google Scholar 

  22. 22.

    Niehofer B, Burda R, Wietfeld C, Bauer F, Lueert O (2009) GPS community map generation for enhanced routing methods based on trace-collection by mobile phones. In: Proc 1st Int Conf Advances in Satellite and Space Communications, SPACOMM ’09. IEEE Computer Society, Washington, pp 156–161

  23. 23.

    Pfoser D, Wenk C Map construction portal. http://mapconstruction.org, 2016. [Acc. 17/6/2016]

  24. 24.

    Quddus M, Ochieng W, Noland R (2007) Current map-matching algorithms for transport applications State-of-the art and future research directions. Transportation Research Part C: Emerging Technologies, pp 312–328

  25. 25.

    Shi W, Shen S, Liu Y (2009) Automatic generation of road network map from massive GPS vehicle trajectories. In: Proc 12th Int IEEE Conf on Intelligent Transportation Systems, pp 48–53

  26. 26.

    Steiner A, Leonhardt A (2011) Map generation algorithm using low frequency vehicle position data. In: Proc 90th Ann Meeting of the Transportation Research Board, pp 1–17

  27. 27.

    TopoGraphics. GPX the GPS exchange format. http://www.topografix.com/gpx.asp, 2002. [Acc. 1/6/2015]

  28. 28.

    Wang S, Wang Y, Li Y (2015) Efficient map reconstruction and augmentation via topological methods. In: Proc 23rd Int Conf on Advances in Geographic Information Systems, pp 10

  29. 29.

    Wang S, Wang Y, Li Y (2015) Efficient map reconstruction and augmentation via topological methods. In: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, Bellevue, WA, USA, November 3-6, 2015, pp 25:1–25:10

  30. 30.

    Wang Y, Liu X, Wei H, Forman G, Zhu Y (2013) Crowdatlas: self-updating maps for cloud and personal use. In: The 11th Annual International Conference on Mobile Systems, Applications, and Services, Mobisys’13, Taipei, Taiwan, June 25-28, 2013, pp. 469–470

  31. 31.

    Worrall S, Nebot E (2007) Automated process for generating digitised maps through GPS data compression. In: Proc Australasian Conf on Robotics and Automation

  32. 32.

    Zheng J, Wang Y, Nihan NL (2005) Quantitative evaluation of GPS performance under forest canopies. In: Proc IEEE Int Conf Networking, Sensing and Control, pp 777–782

Download references

Acknowledgments

We are grateful to the anonymous reviewers for their many suggestions that have improved considerably the presentation of this paper. We also thank Kevin Buchin, Maike Buchin, Frank Staals, and Carola Wenk for useful discussions about the topics of this paper.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Rodrigo I. Silveira.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version of this work appeared in [14]

Partially supported by projects MTM2015-63791-R (MINECO/FEDER), Gen. Cat. 2017SGR1640. R.I. S. was also supported by MINECO through the Ramón y Cajal program.

Appendices

Appendix A: Deduction of parameter values for hiking data sets

In this section, we carefully discuss how the parameters of the algorithms were set in order to apply them to the hiking data sets in the best possible way.

A.1 AW

Recall that the main parameter for AW is ε, for which four assumptions are made. However, these assumptions are not satisfied by any of our four hiking data sets. It is clear that condition (iv) is often not met in hiking trails (or in car trajectories, for that matter), since it is common to have hikes that repeat certain parts. However, it is relatively simple to preprocess trajectories to break cycles, as suggested in [5].

A more delicate situation arises with the remaining assumptions. Most notably, conditions (i) and (iii) can easily contradict each other. For instance, in Delta the presence of parallel paths along both sides of irrigation canals forces a value of ε < 3m to satisfy (i). At the same time, the presence of some wider roads in conjunction with condition (iii) require ε > 16m, leading to a contradiction. Similar situations occur in the other three data sets. An example illustrating the effect of varying ε is shown in Fig. 9.

Fig. 9
figure9

Example of the effect of varying the value of ε for the AW algorithm. The image shows an area with parallel paths on both sides of a canal, in Delta. Input trajectories are shown in blue, the generated map is shown red. Background image from Google Earth

Given that no single value can satisfy all theoretical conditions on ε, for our experiments we tried several values for each data set, selected based on the road widths and road separation distances observed in the Google Earth aerial images for each region. The algorithm was run on each setting and data set, and the value qualitatively giving the best results with respect to the paths visible from Google Earth was chosen. These values are shown in Table 6.

A.2 CK

Several parameters need to be adjusted to obtain good results for this algorithm. For the preprocessing, the thresholds used for segmenting based on the spatial or temporal discontinuities within trajectories (d1 and t) are obtained by extrapolating the original values using the ratio between the mean distance (or elapsed time) between trajectory points of the original data sets and each of our data sets. The step for reducing redundancy uses two spatial thresholds (d2 and d3) and an angular one (α). The spatial thresholds were set to the original value multiplied by the ratio of the average speed between each of our data sets and the data set used in the original work. The angular threshold was kept unchanged.

The clarification step requires some information from the input data and the terrain. Recall that the attraction forces are parameterized by two values: M and σ1, while the spring force depends on one parameter k. According to Cao and Krumm [10], the values should be chosen so that the force of one edge attracts all vertices with similar direction within a certain distance, and not those further apart. This implies that there should exist a distance value td at which the attraction force drops considerably, in favor of the conservative spring force. Such distance value should be between the maximum width of a one-way road and the minimum distance between two roads. Once the target distance value is found, Cao and Krumm [10] provide an analysis on how trajectories are affected by the forces taking into account the average number of trajectories on a path (N) and their dispersion due to the expected GPS error (σ2). Using as input the values of N and σ2, they derive the values of the force parameters so that the forces produce the desired change of behavior roughly after the target distance at which the attraction force must drop. In [10, Figure 8], this target distance seems to be implicitly set to 25m, and considering in average 20 trajectories per path with an standard deviation due to GPS error of 5m, the values of the three force parameters are set to σ1 = 5, k = 0.005 and M = 1, producing the desired effect at 25m.

Therefore once the target distance td, together with the N and σ2 are known, the three parameters M, σ1 and k can be derived. Table 13 summarizes the values for each parameter as well as the needed information. Figure 10 presents the graphs of how the different attraction forces behave for the corresponding values.

Fig. 10
figure10

Graphic representation of the resulting attraction forces of algorithm CK, for each data set, using their parameterization method. The axes represent the distance from a trajectory point to the center of the path that it possibly samples. The x-axis represents the original distance, whereas the y-axis represents the final distance after applying the clarification step. Ideally, the function should behave like a piece-wise function with y = 0 for xtd and y = x otherwise. The value between parenthesis is the corresponding td for each data set

Table 13 Values for each parameter used in the clarification step of CK using the original parameterization method. The table also includes the needed information about the terrain and input data, namely the average number of trajectories on a sampled path (N), the expected GPS error (σ2) and the target distance (td) set half way between the maximum path width and the minimum separation between two different paths

Even though the parameters for the clarification phase have been adjusted using the method proposed by the authors, there is still an issue to be addressed. The plots for the Garraf and Montseny data sets (Fig. 10d-e) deviate too much from the ideal shape. Even more, the value of σ1 in all data sets except Aiguamolls is 0 (Table 13), which results into a singularity in the attraction forces. In such cases, the resulting clarified trajectories no longer follow their original shape and the resulting map is indistinguishable from random noise.

Such situations occur when the expected GPS error (σ2) is too close to the target distance td. In the Garraf and Montseny data sets, σ2 was higher than 50% of td. In the Delta data set, it was higher than 41%. Whereas in the original and Aiguamolls data sets it was 20% and 18%, respectively. Therefore, whenever the expected error (σ2) is close to the midpoint between the maximum width of a single path and the minimum distance between two paths, which incidentally is the value of td, the method proposed to adjust the parameters cannot be applied. Therefore, for the Delta, Garraf and Montseny data sets, the values have been empirically adjusted. Table 14 summarizes again the final values used.

Table 14 Final parameter values for the clarification phase of the CK algorithm

Finally, the incremental insertion algorithm has four parameters that need to be adjusted. As the trajectories have been clarified, adapting the original values is straightforward. The distance threshold is set to be the maximum path width, as clarified trajectories are much closer to the center of the paths. The angular threshold, the minimum volume of trajectories and the maximum number of hops are all set to 45°, 3 and 5, respectively, as in the original work.

Refer to Table 7 for the summary of all the parameter values.

A.3 DBH

The (i) grid cell size that Davies et al. [12] propose is half the minimum path width. In our data sets, that is 1m for the flat terrains (Delta, Aiguamolls) and 0.5m for the hilly terrains (Montseny, Garraf ). (ii) The value of σ was taken as the average between the maximum width of a path and half the minimum separation between two different paths. This value ensures that holes within a path will be covered while not interfering in the detection of different paths. Finally, (iii) the mask threshold was empirically adjusted. The value we present is a compromise between the coverage of the generated map and the algorithm’s sensitivity to noise. Table 8 summarizes the values used.

A.4 ES

To choose the value of the three parameters for ES we used as a guideline the explanations in the original work [15], which we reproduce here for completeness. The value of \(d_{{\max \limits }}\) “should be in an order of magnitude such that we ensure not to miss any intersection”. For δ, “we found that the algorithm is not very sensitive to variations of δ”. Finally, “as a conservative lower bound, θ should be at least larger than the maximum lane width, […], plus a considerable fraction of an estimated standard deviation of the GPS error”.

For our data sets, a \(d_{{\max \limits }}\) of 20m is sufficient to detect all intersections. We kept δ at the same value as in the original work (45°). Finally, to set the value of θ we took into account the maximum width and the estimated GPS error of each data set. Table 9 summarizes the values taken for each data set.

A.5 KP

Finding appropriate values for the six parameters of KP, which do not have a clear meaning, was a complex task. Indeed, Karagiorgou and Pfoser mention that the values used in their experiments were obtained “empirically by running a great number of experiments and assessing the quality of the respective results” [19]. We established relationships between the parameters, based on their work, as to minimize the actual number of parameters to be empirically tested. As a result, we concluded that the most critical and independent parame ters were (ii)—the angular threshold to determine turns,—and (iv)—the distance threshold to group turning points. We ignored the speed-related parameter (iii), as pedestrians do not perform significant speed reductions in turns. Parameter (vi) was fixed at 45°, the value used in [19]. The other two parameters were set as specified in Table 10 after experimental testing.

It remains to explain how to find appropriate values for (ii) angular difference and (iv) the turn clustering threshold.

The angular difference threshold determines when a change in direction is considered a turn. Ideally, the value should be set so that all (and only) real intersections have at least one turning point associated. As expected, in none of our data sets such an ideal value exists (note that it does not exist in the original Athens data set used in [19] either).

We found that the punctual angular differences on GPS data are not reliable enough to avoid false turn detections, both in urban and in hiking data sets. In our hiking data sets, the value of the angular difference threshold seems to be even more critical, since the density of trajectory nodes on the sampled paths is of one order of magnitude higher than in the urban context. Therefore, the probability that multiple falsely identified turn samples are considered by the algorithm to be intersections because they are close enough to each other is also much higher. This problem is specially apparent on Garraf and Montseny.

The second free parameter, the turn clustering distance threshold, has a less clear effect in the generated map, but it has a high impact in the final result. Essentially, the turn clustering threshold is used to decide when two detected turns represent the same turn in the ground truth. However, its implications extend further, because links between intersections created later on depend on the positions of the intersections, among other properties influenced by this parameter. Given all these implications, the final effect of varying this parameter is very hard to predict. Figure 11 shows an example.

Fig. 11
figure11

Example showing the effect of varying the turn clustering threshold from 5m to 100m. Input trajectories are shown blue, the generated map in red. Red pinpoints indicate detected turning points, yellow pinpoints show the location of the intersection nodes. Images from Google Earth in Aiguamolls

Based on all these observations, our method to obtain the values for the two parameters consisted in first finding a suitable value for the angular difference, and with this value fixed, looking for a suitable value for the turn clustering threshold, also empirically.

The values that gave the best results can be seen in Table 15. Note that the best values of the angular difference found for the hiking data sets (up to 70°) are much larger than the 15° used in the urban setting.

Table 15 Chosen parameter values for KP. The values for the urban data sets are the ones in [3, 19]

The need for such a larger angle bound can be explained by the trajectory sample points density in our data sets, when compared to the ones used by Karagiorgou and Pfoser. Assuming that the identified turns are uniformly distributed, the Athens data set has an identified turn every 382m along a trajectory (45.11% of the input trajectories points are identified as turns). Using the same angular threshold, Garraf has an identified turn every 24m (37.76%). Although the percentages of identified turns are similar, the distance between two identified turns in Garraf is one order of magnitude smaller. Identifying turns that are too close makes identifying intersections using spatial clusters an even more challenging task.

The chosen angular threshold for our data sets have been selected taking into account the visual apparent density of the identified turns. Figure 12 compares the visual appearance of the identified turns between the original data set, the Garraf data set using the same angular threshold (15°) and Garraf using our chosen threshold (70°). Assuming that the identified turns are uniformly distributed, Garraf with a threshold of 70° has an identified turn every 316m along the trajectories (2.87% of the input trajectories points are identified as turns). Therefore, the density of the identified turns is comparable to the ones in the original data set.

Fig. 12
figure12

Example showing how using the same angular threshold as in the original data set (15°) produces a saturated map due to the density of trajectory points. Using our chosen angular threshold (70°), the results for Garraf are comparable with the original results. The pinpoints in red are the identified turns, trajectories are in blue. The three images are at a similar scale. Images from Google Earth

Appendix B: Parameters used for the urban data sets

To run the algorithms for the urban data sets we tried to stick to the values mentioned in the cross-comparison paper by Ahmed et al. [3]. In most cases this was done, except for some few cases in which the parameters present in the code were different from those in [3], in which case we used those in the code.

AW: ε = 180 (Athens Large), 90 (Athens Small), 170 (Berlin), 80 (Chicago); tgap = 120;

CK: d1 = 100; t = 10; d2 = 10; d3 = 30; α = 10; min_seg= 4; M = 1; σ1 = 5; k = 0.005; d4 = 20; β= 45; v = 3; h = 5;

ES: \(d_{{\max \limits }}\) = 50; δ = 45; θ = 20; N = 80; dmed = 0.01;

DBH: cell_size= 2; mask_threshold= 100; σ = 17; voronoi_sampling_interval= 10;

KP: angular difference= 15; dist= 25; max_m = 1000; mean speed= 40;

Appendix C: Output generated by the different algorithms

In the next pages we present the maps generated for each data set by each of the five algorithms, together with the input trajectories in the background.

C.1 Delta

Fig. 13
figure13

Maps generated (in black) for Delta, with the input trajectories (in gray)

C.2 Aiguamolls

Fig. 14
figure14

Maps generated (in black) for Aiguamolls, with the input trajectories (in gray)

C.3 Garraf

Fig. 15
figure15

Maps generated (in black) for Garraf, with the input trajectories (in gray)

C.4 Montseny

Fig. 16
figure16

Maps generated (in black) for Montseny, with the input trajectories (in gray)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Duran, D., Sacristán, V. & Silveira, R.I. Map construction algorithms: a local evaluation through hiking data. Geoinformatica 24, 633–681 (2020). https://doi.org/10.1007/s10707-019-00386-7

Download citation

Keywords

  • Trajectory data
  • Trajectory analysis
  • Map construction
  • Algorithms