Commentary to: a cross-validation-based approach for delimiting reliable home range estimates
Abstract
Background
Continued exploration of the performance of the recently proposed cross-validation-based approach for delimiting home ranges using the Time Local Convex Hull (T-LoCoH) method has revealed a number of issues with the original formulation.
Main text
Here we replace the ad hoc cross-validation score with a new formulation based on the total log probability of out-of-sample predictions. To obtain these probabilities, we interpret the normalized LoCoH hulls as a probability density. The application of the approach described here results in optimal parameter sets that differ dramatically from those selected using the original formulation. The derived metrics of home range size, mean revisitation rate, and mean duration of visit are also altered using the corrected formulation.
Conclusion
Despite these differences, we encourage the use of the cross-validation-based approach, as it provides a unifying framework governed by the statistical properties of the home ranges rather than subjective selections by the user.
Keywords
Time local convex hulls T-LoCoH Home range Visitation Duration Cross-validation Etosha national parkBackground
- 1.
Both cross-validation and information criterion approaches aim to avoid over-fitting. In the case of cross-validation, one attempts to estimate out-of-sample prediction error, so the score used should be a measure of prediction errors of the held-out points. If the model uses k too small or s too large, it is likely to overfit the training data and will predict the testing data poorly. On the other hand, if the model uses k too large or s too small, it will underfit the training data by missing the real variations in space use. Thus, cross-validation naturally penalizes model complexity because excessive complexity (small k) results in poor predictions. Information criteria approaches include a penalty term that increases with model complexity as measured by larger numbers of parameters. Using such an information criterion as a cross-validation score is not necessary since cross-validation should naturally penalize excessive model complexity.
- 2.
The formulation of the information criterion score did not follow the rules of probability because probabilities of out-of-sample predictions were not properly normalized, and multiple probabilities were combined by summation. In this sense, it lacked a firm connection to the statistical theory underlying information criteria approaches.
Here we propose an alternative formulation in which we interpret a normalized version of LoCoH hulls as an estimated probability surface and recast the cross-validation score as the total log probability of out-of-sample predictions, a common choice in cross-validation schemes. The approach, explained in detail below, results in more appropriate behavior, but also has the effect of significantly altering the optimal parameter values selected by the algorithm. Thus, in addition to presenting the new cross-validation equation, we include tables and figures with the newly selected parameter values and newly calculated derived metric values (home range area, mean duration, and mean visitation rates). Finally, we offer an alternative R script that searches a much broader parameter space in a more efficient manner (Additional file 1).
Updated Cross-Validation Approach
Because the probability of each test point is normalized based on the total area contained within all of the training hulls, there exists a natural penalty for high k values. For example, a k value equal to the number of training points (k_{max}; regardless of the s value) will result in all hulls being identical and each test point overlapping all of the hulls. However, the large total area of the hullset when k=k_{max} will result in relatively small probability values for each test point (i.e., independent probability values equal to the inverse of the area of one of the hulls), effectively penalizing the parameter set containing k_{max}. The underlying cross-validation procedure could very easily be extended for the optimization of the the adaptive parameter in the a-method (as opposed to the k-method) because of its scaling with the total area of the hullset.
Results
Parameter values for analysis
ID | Species | Sample | s (Algo) | k (Algo) | s (Guide) | k Range |
---|---|---|---|---|---|---|
Points | (Guide) | |||||
AG063 | Zebra | 2111 | 0.003 | 355 | 0.023125 | 20-25 |
AG252 | Zebra | 3601 | 0.001 | 485 | 0.0140625 | 20-25 |
AG253 | Zebra | 3601 | 0 | 156 | 0.0140625 | 25-30 |
AG255 | Zebra | 3601 | 0.001 | 405 | 0.0184375 | 20-25 |
AG256 | Zebra | 3601 | 0.001 | 335 | 0.0171875 | 15-20 |
AG205 | Springbok | 2887 | 0.05 | 182 | 0.003125 | 25-30 |
AG206 | Springbok | 3601 | 0.023 | 187 | 0.00875 | 25-30 |
AG207 | Springbok | 3601 | 0.036 | 155 | 0.01140625 | 20-25 |
AG209 | Springbok | 2887 | 0.013 | 171 | 0.002421875 | 25-30 |
AG214 | Springbok | 2887 | 0.001 | 104 | 0.00265625 | 15-20 |
AG215 | Springbok | 2883 | 0 | 554 | 0.00328125 | 25-30 |
Home range areas (in square kilometers)
ID | HR Area (Algo) | HR Area (Guide Low) | HR Area (Guide High) |
---|---|---|---|
AG063 | 1093 | 571 | 603 |
AG252 | 1486 | 913 | 958 |
AG253 | 593 | 501 | 513 |
AG255 | 871 | 579 | 600 |
AG256 | 1363 | 740 | 798 |
AG205 | 370 | 256 | 268 |
AG206 | 973 | 558 | 588 |
AG207 | 430 | 299 | 318 |
AG209 | 347 | 207 | 216 |
AG214 | 32 | 23 | 25 |
AG215 | 258 | 165 | 177 |
Mean duration (MNLV) values. The derived metrics obtained using the parameter sets recommended by the algorithm and by the guidelines set forth in the T-LoCoH documentation
ID | MNLV (Algo) | MNLV (Guide Low) | MNLV (Guide High) |
---|---|---|---|
AG063 | 48.9 | 10.0 | 11.3 |
AG252 | 77.3 | 10.4 | 11.7 |
AG253 | 2.6 | 10.7 | 12.5 |
AG255 | 75.1 | 9.5 | 10.3 |
AG256 | 42.0 | 8.0 | 9.7 |
AG205 | 92.6 | 24.4 | 27.1 |
AG206 | 80.8 | 14.3 | 16.4 |
AG207 | 67.9 | 12.3 | 14.5 |
AG209 | 78.9 | 23.4 | 26.0 |
AG214 | 24.7 | 16.5 | 19.4 |
AG215 | 2.6 | 37.9 | 42.6 |
Mean visitation (NSV) values
ID | NSV (Algo) | NSV (Guide Low) | NSV(Guide High) |
---|---|---|---|
AG063 | 13.8 | 5.8 | 6.6 |
AG252 | 9.1 | 5.6 | 6.3 |
AG253 | 61.5 | 15.0 | 16.0 |
AG255 | 19.7 | 8.1 | 9.5 |
AG256 | 14.0 | 7.4 | 8.6 |
AG205 | 7.1 | 4.2 | 4.5 |
AG206 | 8.2 | 6.5 | 6.9 |
AG207 | 17.8 | 14.9 | 15.7 |
AG209 | 5.7 | 3.6 | 3.8 |
AG214 | 20.2 | 14.6 | 16.3 |
AG215 | 218.1 | 6.6 | 6.8 |
Conclusion
Notes
Acknowledgements
The authors would also like to acknowledge Andy Lyons for creating, maintaining, and improving the T-LoCoH package.
Funding
The case study presented here used GPS movement data from zebra and springbok from Etosha National Park, Namibia, which were collected under a grant obtained by WMG (NIH GM083863). In addition, partial funding for this study was provided by NIH 1R01GM117617-01 to JKB and WMG. The funders had no role in study design, data collection and analysis, nor manuscript writing.
Availability of data and materials
Please contact Wayne M. Getz (wgetz@berkeley.edu) for data requests.
Authors’ contributions
PDV and ERD developed cross-validation approach. ERD ran analyses on empirical movement paths. All authors contributed to writing and editing the manuscript.
Ethics approval and consent to participate
All movement data were collected according to the animal handling protocol AUP R217-0509B (University of California, Berkeley).
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary material
References
- 1.Dougherty ER, Carlson CJ, Blackburn JK, Getz WM. A cross-validation-based approach for delimiting reliable home range estimates. Mov Ecol. 2017; 5(1):19.CrossRefPubMedPubMedCentralGoogle Scholar
- 2.Lyons AJ, Turner WC, Getz WM. Home range plus: a space-time characterization of movement over real landscapes. Mov Ecol. 2013; 1(1):2.CrossRefPubMedPubMedCentralGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.