## Abstract

We study five existing map construction algorithms, designed and tested with urban vehicle data in mind, and apply them to hiking trajectories with different terrain characteristics. Our main goal is to better understand the existing strategies and their limitations, in order to shed new light into the current challenges for map construction algorithms. We carefully analyze the results obtained by each algorithm focusing on the local details of the generated maps. Our analysis includes the characterization of 10 types of common artifacts, which occur in the results of more than one algorithm, and 7 algorithmic-specific artifacts, which are consequences of different algorithmic strategies. This allows us to extract systematic conclusions about the main challenges to fully automatize the construction of maps from trajectory data, to detect the strengths and weaknesses of the potential different strategies, and to suggest possible ways to design higher-quality map construction methods. We consider that this analysis will be of help for designing new and better methods that perform well in wider and more realistic contexts, not only for road map or hiking reconstruction, but also for other types of trajectory data.

This is a preview of subscription content, log in to check access.

## Notes

- 1.
See http://openstreetmap.org and http://wikimapia.org, respectively.

- 2.
- 3.
- 4.
- 5.
The reasons not to include two of the algorithms were: implementation not available for one of them, and technical issues when trying to reproduce previous experiments in the other case.

- 6.
In the original paper [5] a third step is described, where identified portions can be adjusted. However, the implementation made public ignores this third phase.

- 7.
We note, however, that this is what is done in the implementation, but the criterion described in the original paper [19] is temporal.

- 8.
The computer used was equipped with an Intel i5-2500K CPU and 8GB DDR3 Synchronous 1600 MHz memory.

## References

- 1.
Ahmed M, Fasy BT, Gibson M, Wenk C (2015) Choosing thresholds for density-based map construction algorithms. In: Proc 23rd Int Conf on Geographic Information Systems, ACM, pp 24

- 2.
Ahmed M, Fasy BT, Hickmann KS, Wenk C (2015) A path-based distance for street map comparison. ACM Trans Spatial Algorithms Syst 1(1):3:1–3:28

- 3.
Ahmed M, Karagiorgou S, Pfoser D, Wenk C (2015) A comparison and evaluation of map construction algorithms using vehicle tracking data. GeoInformatica 19(3):601–632

- 4.
Ahmed M, Karagiorgou S, Pfoser D, Wenk C (2015) Map Construction Algorithms. Springer

- 5.
Ahmed M, Wenk C (2012) Constructing street networks from GPS trajectories. In: Proc ESA, pp 60–71

- 6.
Alt H, Guibas LJ (2000) Chapter 3 - discrete geometric shapes: Matching, interpolation, and approximation*. In: Sack J-R, Urrutia J (eds) Handbook of Computational Geometry, North-Holland, Amsterdam, pp 121–153

- 7.
Biagioni J, Eriksson J (2012) Inferring road maps from GPS traces: Survey and comparative evaluation. In: Transportation Research Board, 91st Annual, pp 61–71

- 8.
Biagioni J, Eriksson J (2012) Map inference in the face of noise and disparity. In: Proc 20th Int Conf Advances in Geographic Information Systems, SIGSPATIAL ’12. ACM, New York, pp 79–88

- 9.
Buchin K, Buchin M, Duran D, Fasy BT, Jacobs R, Sacristán V, Silveira RI, Staals F, Wenk C (2017) Clustering trajectories for map construction. In: Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2017, Redondo Beach, November 7-10, 2017, pp 14:1–14:10

- 10.
Cao L, Krumm J (2009) From GPS traces to a routable road map. In: Proc 17th ACM SIGSPATIAL, pp 3–12

- 11.
Chen C, Cheng Y (2008) Roads digital map generation with multi-track GPS data. In: Proc Workshops on Education Technology and Training, and on Geoscience and Remote Sensing, IEEE, pp 508–511

- 12.
Davies JJ, Beresford AR, Hopper A (2006) Scalable, distributed, real-time map generation. IEEE Pervasive Computing 5(4):47–54

- 13.
Dey TK, Wang J, Wang Y (2017) Improved road network reconstruction using discrete morse theory. In: Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’17. ACM, New York, pp 58:1–58:4

- 14.
Duran D, Sacristán V, Silveira RI (2016) Map construction algorithms: an evaluation through hiking data. In: Proceedings of the 5th ACM SIGSPATIAL International Workshop on Mobile Geographic Information Systems, MobiGIS 2016, Burlingame, October 31, 2016, pp 74–83

- 15.
Edelkamp S, Schrödl S (2003) Route Planning and Map Inference with Global Positioning Traces. Springer, Berlin, pp 128–151

- 16.
Fathi A, Krumm J (2010) Detecting road intersections from GPS traces. In: Proc 6th Int Conf on Geographic Information Systems, pp 56–69

- 17.
Ge X, Safa I, Belkin M, Wang Y (2011) Data skeletonization via reeb graphs. In: Shawe-taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ (eds) Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain, pp 837–845

- 18.
Guo T, Iwamura K, Koga M (2007) Towards high accuracy road maps generation from massive GPS traces data. In: Proc IEEE Int Geoscience and Remote Sensing Symp, pp 667–670

- 19.
Karagiorgou S, Pfoser D (2012) On vehicle tracking data-based road network generation. In: Proc 20th Int Conf on Advances in Geographic Information Systems, pp 89–98

- 20.
Liu X, Biagioni J, Eriksson J, Wang Y, Forman G, Zhu Y (2012) Mining large-scale, sparse GPS traces for map inference: Comparison of approaches. In: Proc 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12. ACM, New York, pp 669–677

- 21.
Mariescu-Istodor R, Fränti P (2018) Cellnet: Inferring road networks from GPS trajectories. ACM Trans Spatial Algorithms and Systems 4(3):8:1–8:22

- 22.
Niehofer B, Burda R, Wietfeld C, Bauer F, Lueert O (2009) GPS community map generation for enhanced routing methods based on trace-collection by mobile phones. In: Proc 1st Int Conf Advances in Satellite and Space Communications, SPACOMM ’09. IEEE Computer Society, Washington, pp 156–161

- 23.
Pfoser D, Wenk C Map construction portal. http://mapconstruction.org, 2016. [Acc. 17/6/2016]

- 24.
Quddus M, Ochieng W, Noland R (2007) Current map-matching algorithms for transport applications State-of-the art and future research directions. Transportation Research Part C: Emerging Technologies, pp 312–328

- 25.
Shi W, Shen S, Liu Y (2009) Automatic generation of road network map from massive GPS vehicle trajectories. In: Proc 12th Int IEEE Conf on Intelligent Transportation Systems, pp 48–53

- 26.
Steiner A, Leonhardt A (2011) Map generation algorithm using low frequency vehicle position data. In: Proc 90th Ann Meeting of the Transportation Research Board, pp 1–17

- 27.
TopoGraphics. GPX the GPS exchange format. http://www.topografix.com/gpx.asp, 2002. [Acc. 1/6/2015]

- 28.
Wang S, Wang Y, Li Y (2015) Efficient map reconstruction and augmentation via topological methods. In: Proc 23rd Int Conf on Advances in Geographic Information Systems, pp 10

- 29.
Wang S, Wang Y, Li Y (2015) Efficient map reconstruction and augmentation via topological methods. In: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, Bellevue, WA, USA, November 3-6, 2015, pp 25:1–25:10

- 30.
Wang Y, Liu X, Wei H, Forman G, Zhu Y (2013) Crowdatlas: self-updating maps for cloud and personal use. In: The 11th Annual International Conference on Mobile Systems, Applications, and Services, Mobisys’13, Taipei, Taiwan, June 25-28, 2013, pp. 469–470

- 31.
Worrall S, Nebot E (2007) Automated process for generating digitised maps through GPS data compression. In: Proc Australasian Conf on Robotics and Automation

- 32.
Zheng J, Wang Y, Nihan NL (2005) Quantitative evaluation of GPS performance under forest canopies. In: Proc IEEE Int Conf Networking, Sensing and Control, pp 777–782

## Acknowledgments

We are grateful to the anonymous reviewers for their many suggestions that have improved considerably the presentation of this paper. We also thank Kevin Buchin, Maike Buchin, Frank Staals, and Carola Wenk for useful discussions about the topics of this paper.

## Author information

### Affiliations

### Corresponding author

## Additional information

### Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version of this work appeared in [14]

Partially supported by projects MTM2015-63791-R (MINECO/FEDER), Gen. Cat. 2017SGR1640. R.I. S. was also supported by MINECO through the Ramón y Cajal program.

## Appendices

### Appendix A: Deduction of parameter values for hiking data sets

In this section, we carefully discuss how the parameters of the algorithms were set in order to apply them to the hiking data sets in the best possible way.

### A.1 AW

Recall that the main parameter for AW is *ε*, for which four assumptions are made. However, these assumptions are not satisfied by any of our four hiking data sets. It is clear that condition (iv) is often not met in hiking trails (or in car trajectories, for that matter), since it is common to have hikes that repeat certain parts. However, it is relatively simple to preprocess trajectories to break cycles, as suggested in [5].

A more delicate situation arises with the remaining assumptions. Most notably, conditions (i) and (iii) can easily contradict each other. For instance, in *Delta* the presence of parallel paths along both sides of irrigation canals forces a value of *ε* < 3*m* to satisfy (i). At the same time, the presence of some wider roads in conjunction with condition (iii) require *ε* > 16*m*, leading to a contradiction. Similar situations occur in the other three data sets. An example illustrating the effect of varying *ε* is shown in Fig. 9.

Given that no single value can satisfy all theoretical conditions on *ε*, for our experiments we tried several values for each data set, selected based on the road widths and road separation distances observed in the Google Earth aerial images for each region. The algorithm was run on each setting and data set, and the value qualitatively giving the best results with respect to the paths visible from Google Earth was chosen. These values are shown in Table 6.

### A.2 CK

Several parameters need to be adjusted to obtain good results for this algorithm. For the preprocessing, the thresholds used for segmenting based on the spatial or temporal discontinuities within trajectories (*d*_{1} and *t*) are obtained by extrapolating the original values using the ratio between the mean distance (or elapsed time) between trajectory points of the original data sets and each of our data sets. The step for reducing redundancy uses two spatial thresholds (*d*_{2} and *d*_{3}) and an angular one (*α*). The spatial thresholds were set to the original value multiplied by the ratio of the average speed between each of our data sets and the data set used in the original work. The angular threshold was kept unchanged.

The clarification step requires some information from the input data and the terrain. Recall that the attraction forces are parameterized by two values: *M* and *σ*_{1}, while the spring force depends on one parameter *k*. According to Cao and Krumm [10], the values should be chosen so that the force of one edge attracts all vertices with similar direction within a certain distance, and not those further apart. This implies that there should exist a distance value *t*_{d} at which the attraction force drops considerably, in favor of the conservative spring force. Such distance value should be between the maximum width of a one-way road and the minimum distance between two roads. Once the target distance value is found, Cao and Krumm [10] provide an analysis on how trajectories are affected by the forces taking into account the average number of trajectories on a path (*N*) and their dispersion due to the expected GPS error (*σ*_{2}). Using as input the values of *N* and *σ*_{2}, they derive the values of the force parameters so that the forces produce the desired change of behavior roughly after the target distance at which the attraction force must drop. In [10, Figure 8], this target distance seems to be implicitly set to 25m, and considering in average 20 trajectories per path with an standard deviation due to GPS error of 5m, the values of the three force parameters are set to *σ*_{1} = 5, *k* = 0.005 and *M* = 1, producing the desired effect at 25m.

Therefore once the target distance *t*_{d}, together with the *N* and *σ*_{2} are known, the three parameters *M*, *σ*_{1} and *k* can be derived. Table 13 summarizes the values for each parameter as well as the needed information. Figure 10 presents the graphs of how the different attraction forces behave for the corresponding values.

Even though the parameters for the clarification phase have been adjusted using the method proposed by the authors, there is still an issue to be addressed. The plots for the *Garraf* and *Montseny* data sets (Fig. 10d-e) deviate too much from the ideal shape. Even more, the value of *σ*_{1} in all data sets except *Aiguamolls* is 0 (Table 13), which results into a singularity in the attraction forces. In such cases, the resulting clarified trajectories no longer follow their original shape and the resulting map is indistinguishable from random noise.

Such situations occur when the expected GPS error (*σ*_{2}) is too close to the target distance *t*_{d}. In the *Garraf* and *Montseny* data sets, *σ*_{2} was higher than 50% of *t*_{d}. In the *Delta* data set, it was higher than 41%. Whereas in the original and *Aiguamolls* data sets it was 20% and 18%, respectively. Therefore, whenever the expected error (*σ*_{2}) is close to the midpoint between the maximum width of a single path and the minimum distance between two paths, which incidentally is the value of *t*_{d}, the method proposed to adjust the parameters cannot be applied. Therefore, for the *Delta*, *Garraf* and *Montseny* data sets, the values have been empirically adjusted. Table 14 summarizes again the final values used.

Finally, the incremental insertion algorithm has four parameters that need to be adjusted. As the trajectories have been clarified, adapting the original values is straightforward. The distance threshold is set to be the maximum path width, as clarified trajectories are much closer to the center of the paths. The angular threshold, the minimum volume of trajectories and the maximum number of hops are all set to 45°, 3 and 5, respectively, as in the original work.

Refer to Table 7 for the summary of all the parameter values.

### A.3 DBH

The (i) grid cell size that Davies et al. [12] propose is half the minimum path width. In our data sets, that is 1m for the flat terrains (*Delta*, *Aiguamolls*) and 0.5m for the hilly terrains (*Montseny*, *Garraf* ). (ii) The value of *σ* was taken as the average between the maximum width of a path and half the minimum separation between two different paths. This value ensures that holes within a path will be covered while not interfering in the detection of different paths. Finally, (iii) the mask threshold was empirically adjusted. The value we present is a compromise between the coverage of the generated map and the algorithm’s sensitivity to noise. Table 8 summarizes the values used.

### A.4 ES

To choose the value of the three parameters for ES we used as a guideline the explanations in the original work [15], which we reproduce here for completeness. The value of \(d_{{\max \limits }}\) “should be in an order of magnitude such that we ensure not to miss any intersection”. For *δ*, “we found that the algorithm is not very sensitive to variations of *δ*”. Finally, “as a conservative lower bound, *θ* should be at least larger than the maximum lane width, […], plus a considerable fraction of an estimated standard deviation of the GPS error”.

For our data sets, a \(d_{{\max \limits }}\) of 20m is sufficient to detect all intersections. We kept *δ* at the same value as in the original work (45°). Finally, to set the value of *θ* we took into account the maximum width and the estimated GPS error of each data set. Table 9 summarizes the values taken for each data set.

### A.5 KP

Finding appropriate values for the six parameters of KP, which do not have a clear meaning, was a complex task. Indeed, Karagiorgou and Pfoser mention that the values used in their experiments were obtained “empirically by running a great number of experiments and assessing the quality of the respective results” [19]. We established relationships between the parameters, based on their work, as to minimize the actual number of parameters to be empirically tested. As a result, we concluded that the most critical and independent parame ters were (ii)—the angular threshold to determine turns,—and (iv)—the distance threshold to group turning points. We ignored the speed-related parameter (iii), as pedestrians do not perform significant speed reductions in turns. Parameter (vi) was fixed at 45°, the value used in [19]. The other two parameters were set as specified in Table 10 after experimental testing.

It remains to explain how to find appropriate values for (ii) angular difference and (iv) the turn clustering threshold.

The angular difference threshold determines when a change in direction is considered a turn. Ideally, the value should be set so that all (and only) real intersections have at least one turning point associated. As expected, in none of our data sets such an ideal value exists (note that it does not exist in the original *Athens* data set used in [19] either).

We found that the punctual angular differences on GPS data are not reliable enough to avoid false turn detections, both in urban and in hiking data sets. In our hiking data sets, the value of the angular difference threshold seems to be even more critical, since the density of trajectory nodes on the sampled paths is of one order of magnitude higher than in the urban context. Therefore, the probability that multiple falsely identified turn samples are considered by the algorithm to be intersections because they are close enough to each other is also much higher. This problem is specially apparent on *Garraf* and *Montseny*.

The second free parameter, the turn clustering distance threshold, has a less clear effect in the generated map, but it has a high impact in the final result. Essentially, the turn clustering threshold is used to decide when two detected turns represent the same turn in the ground truth. However, its implications extend further, because links between intersections created later on depend on the positions of the intersections, among other properties influenced by this parameter. Given all these implications, the final effect of varying this parameter is very hard to predict. Figure 11 shows an example.

Based on all these observations, our method to obtain the values for the two parameters consisted in first finding a suitable value for the angular difference, and with this value fixed, looking for a suitable value for the turn clustering threshold, also empirically.

The values that gave the best results can be seen in Table 15. Note that the best values of the angular difference found for the hiking data sets (up to 70°) are much larger than the 15° used in the urban setting.

The need for such a larger angle bound can be explained by the trajectory sample points density in our data sets, when compared to the ones used by Karagiorgou and Pfoser. Assuming that the identified turns are uniformly distributed, the Athens data set has an identified turn every 382m along a trajectory (45.11% of the input trajectories points are identified as turns). Using the same angular threshold, *Garraf* has an identified turn every 24m (37.76%). Although the percentages of identified turns are similar, the distance between two identified turns in *Garraf* is one order of magnitude smaller. Identifying turns that are too close makes identifying intersections using spatial clusters an even more challenging task.

The chosen angular threshold for our data sets have been selected taking into account the visual apparent density of the identified turns. Figure 12 compares the visual appearance of the identified turns between the original data set, the *Garraf* data set using the same angular threshold (15°) and *Garraf* using our chosen threshold (70°). Assuming that the identified turns are uniformly distributed, *Garraf* with a threshold of 70° has an identified turn every 316m along the trajectories (2.87% of the input trajectories points are identified as turns). Therefore, the density of the identified turns is comparable to the ones in the original data set.

### Appendix B: Parameters used for the urban data sets

To run the algorithms for the urban data sets we tried to stick to the values mentioned in the cross-comparison paper by Ahmed et al. [3]. In most cases this was done, except for some few cases in which the parameters present in the code were different from those in [3], in which case we used those in the code.

AW: *ε* = 180 (*Athens Large*), 90 (*Athens Small*), 170 (*Berlin*), 80 (*Chicago*); *t*_{gap} = 120;

CK: *d*_{1} = 100; *t* = 10; *d*_{2} = 10; *d*_{3} = 30; *α* = 10; min_seg= 4; *M* = 1; *σ*_{1} = 5; *k* = 0.005; *d*_{4} = 20; β= 45; *v* = 3; *h* = 5;

ES: \(d_{{\max \limits }}\) = 50; *δ* = 45; *θ* = 20; *N* = 80; *d*_{med} = 0.01;

DBH: cell_size= 2; mask_threshold= 100; *σ* = 17; voronoi_sampling_interval= 10;

KP: angular difference= 15; dist= 25; max_m = 1000; mean speed= 40;

### Appendix C: Output generated by the different algorithms

In the next pages we present the maps generated for each data set by each of the five algorithms, together with the input trajectories in the background.

### C.1 Delta

### C.2 Aiguamolls

### C.3 Garraf

### C.4 Montseny

## Rights and permissions

## About this article

### Cite this article

Duran, D., Sacristán, V. & Silveira, R.I. Map construction algorithms: a local evaluation through hiking data.
*Geoinformatica* **24, **633–681 (2020). https://doi.org/10.1007/s10707-019-00386-7

Received:

Revised:

Accepted:

Published:

Issue Date:

### Keywords

- Trajectory data
- Trajectory analysis
- Map construction
- Algorithms