The classification conundrum: species fidelity as leading criterion in search of a rigorous method to classify a complex forest data set

Abstract

We present a test involving a large number of data-analytical techniques to identify a rigorous numerical classification method optimising on statistically identified faithful species. The test follows a stepwise filtering process involving various numerical-classification tools. Five steps were involved in the testing: (1) evaluation of 322 classification tools using Optim-Class 1; (2) comparison of 20 best performing methods by standardising the various performances across a range of fidelity values using OptimClass 1 and OptimClass 2, to assess the effectiveness of the agglomerative clustering and one divisive technique; (3) calculation and comparison of Uniqueness values and ISAMIC (Indicator Species Analysis Minimising Intermediate Constancies) scores of the resulting classifications; (4) comparison of different classifications by analysing the similarities of the resulting synoptic tables using faithful species, assuming that clusters with similar faithful species represent corresponding vegetation types, and (5) final selection of the single best method based on an expert review of non-geometric internal evaluators, NMDS ordinations and mapped classification solutions. A complex data set, representing many forest vegetation types and consisting of 506 relevés of 20 m x 20 m sampled in the indigenous forests of Mpumalanga Province (South Africa), was tested. Analysis of Uniqueness provided insight into which methods produced classifications that did not share faithful species. The analysis of synoptic table similarity showed that the classification results were at most 88% similar, while in the most divergent case similarity of only 50% was achieved. OptimClass eliminated poorly performing numerical-classification combinations and highlighted the best performing methods. Yet it was unable to reveal the single best performing method unequivocally across the range of fidelity values used. In such cases, we suggest the solution can be sought in relying on involving external data through expert opinion. Ordinal Clustering and TWINSPAN produced the most outlying classification results. Flexible beta clustering (β = -0.25) in combination with Bray-Curtis coefficient, standardised by sample unit totals, produced the most informative result for our data set when using informal expert-defined ecological and biogeographical judgement criteria. We recommend that the performance of a set of methods be tested prior to selecting the final classification approach.

Abbreviations

ISAMIC:

Indicator Species Analysis Minimizing Intermediate Constancies

GIS:

Geographical Information Systems

NMDS:

Non-metric Multidimensional Scaling

PCoA:

Principal Coordinates Analysis

TWINSPAN:

Two-way Indicator Species Analysis

UPGMA:

Unweighted Pair-Group Method using Arithmetic Averages

References

  1. Aho, K., D.W. Roberts and T. Weaver. 2008. Using geometric and non-geometric internal evaluators to compare eight vegetation classification methods. J. Veg. Sci. 19: 549–562.

    Google Scholar 

  2. Anderson, M.J., T.O. Crist, J.M. Chase, M. Vellend, B.D. Inouye, A.L. Freestone, N.J. Sanders, H.V. Cornell, L.S. Comita, K.F. Davies, S.P. Harrison, N.J.B. Kraft, J.C. Stegen and N.G. Swen-son. 2010. Navigating the multiple meanings of â diversity: a roadmap for the practicing ecologist. Ecol. Lett. 14: 19–28.

    PubMed  Google Scholar 

  3. Belbin, L. and C. McDonald. 1993. Comparing three classification strategies for use in ecology. J. Veg. Sci. 4: 341–348.

    Google Scholar 

  4. Belbin, L. 1993. PATN Pattern Analysis Package. Users Guide. Division of Wildlife and Ecology, CSIRO.

    Google Scholar 

  5. Campbell, B.M., 1978. Similarity coefficients for classifying relevés. Vegetatio 37: 101–109.

    Google Scholar 

  6. Chase, J.M., A.A. Burgett and e.g., Biro. 2010. Habitat isolation moderates the strength 4 of top-down control in experimental pond food webs. Ecology 91: 637–643.

    PubMed  Google Scholar 

  7. Chytrý, M., L. Tichý, J. Holt and Z. Botta-Dukát. 2002. Determination of diagnostic species with statistical fidelity measures. J. Veg. Sci. 13: 79–90.

    Google Scholar 

  8. Chytrý, M. and L. Tichý. 2003. Diagnostic, constant and dominant species of vegetation classes and alliances of the Czech Republic: a statistical revision. Folia Facultatis Scientiarum Natu-ralium Universitatis Masarykianae Brunensis 108: 1–231.

    Google Scholar 

  9. Clarke, K.R. and R.M. Warwick. 1994. Change in Marine Communities: An Approach to Statistical Analysis and Interpretation. Plymouth Marine Laboratory, Plymouth.

    Google Scholar 

  10. Dale, M.B. 1995. Evaluating classification strategies. J. Veg. Sci. 6: 437–440.

    Google Scholar 

  11. D’Souza, L.E. and P.W. Barnes. 2008. Woody plant effects on soil seed banks in a central Texas savanna. Southwestern Naturalist 53: 495–506.

    Google Scholar 

  12. ESRI. 2002. ArcView 3.3. Environmental Systems Research Institute, Redlands, CA.

    Google Scholar 

  13. Faith, D.P., P.R. Minchin and L. Belbin. 1987. Compositional dissimilarity as a robust measure of ecological distance. Vegetatio 69: 57–68.

    Google Scholar 

  14. Feoli, E. and D. Lausi. 1980. Hierarchical levels in syntaxonomy based on information functions. Vegetatio 42: 113–115.

    Google Scholar 

  15. Jüriado, I., J. Liira, D. Csencsics, I. Widmer, C. Adolf, K. Kohv and C. Scheidegger. 2011. Dispersal ecology of the endangered woodland lichen Lobaria pulmonaria in managed hemiboreal forest landscape. Biodivers. Conserv. 20: 1803–1819.

    Google Scholar 

  16. Gauch Jr., H.G. and R.H. Whittaker. 1981. Hierarchical classification of community data. J. Ecol. 69: 537–557.

    Google Scholar 

  17. Hennekens, S.M. and J.H.J. Schaminée. 2001. TURBOVEG, a comprehensive database management system for vegetation data. J. Veg. Sci. 12: 589–591.

    Google Scholar 

  18. Hill, M.O. 1979. TWINSPAN – a FORTRAN Program for Arranging Multivariate Data in an Ordered Two-way Table by Classification of the Individuals and Attributes. Ecology & Systematics, Cornell University, Ithaca, NY.

    Google Scholar 

  19. Hill, M.O. and H.G. Gauch Jr. 1980. Detrended correspondence analysis, an improved ordination technique. Vegetatio 42: 47–58.

    Google Scholar 

  20. Hogeweg, P. 1976. Iterative character weighting in numerical taxonomy. Computers in Biology and Medicine 6: 199–211.

    CAS  PubMed  Google Scholar 

  21. Huhta, V. 1979. Evaluation of different similarity indices as measures of succession in arthropod communities of the forest floor after clear-cutting. Oecologia 41: 11–23.

    PubMed  Google Scholar 

  22. Jongman, R.H.G., C.J.F. ter Braak and O.F.R. van Tongeren. 1995. Data Analysis in Community and Landscape Ecology. Cambridge University Press, Cambridge.

    Google Scholar 

  23. Kent, M. and Coker, P. 1994. Vegetation Description and Analysis – A Practical Approach. Wiley, Chichester.

    Google Scholar 

  24. Kindt, R. and R. Coe. 2005. Tree Diversity Analysis. A Manual and Software for Common Statistical Methods for Ecological and Biodiversity Studies. World Agroforestry Centre (CRAF), Nairobi.

    Google Scholar 

  25. Knollová, I., M. Chytrý, L. Tichý and O. Hájek. 2005. Stratified resampling of phytosociological databases: some strategies for obtaining more representative data sets for classification studies. J. Veg. Sci. 16: 479–486.

    Google Scholar 

  26. Legendre, P. and L. Legendre. 1998. Numerical Ecology, second ed. Elsevier, Amsterdam.

    Google Scholar 

  27. Lepš, J. and P. Šmilauer. 2003. Multivariate Analysis of Ecological Data Using CANOCO. Cambridge University Press, Cambridge.

    Google Scholar 

  28. Lötter, M.C., A.J. Emery, and S.D. Williamson. 2002. Forests. In: Emery, A.J., M.C. Lötter, and S.D. Williamson (eds.), Determining the Conservation Value of Land in Mpumalanga. Mpu-malanga Parks Board, Nelspruit, pp. 28–34.

    Google Scholar 

  29. McCune, B. and J.B. Grace. 2002. Analysis of Ecological Communities. MjM Software, Gleneden Beach, OR.

    Google Scholar 

  30. McCune, B. and M.J. Mefford. 2006. PC-ORD. Multivariate Analysis of Ecological Data. Version 5.20. MjM Software, Gleneden Beach, OR.

    Google Scholar 

  31. Minchin, P.R. 1987. An evaluation of the relative robustness of techniques for ecological ordination. Vegetatio 69: 89–107.

    Google Scholar 

  32. Mlambo, M.C., M.S. Bird, C.C. Reed, and J.A. Day. 2011. Diversity patterns of temporary wetland macroinvertebrate assemblages in the south-western Cape, South Africa. Afr. J. Aquatic Sci. 36: 299–308.

    Google Scholar 

  33. Mucina, L. 1997. Classification of vegetation: past, present and future. J. Veg. Sci. 8: 751–760.

    Google Scholar 

  34. Mucina, L. and E. van der Maarel. 1989. Twenty years of numerical syntaxonomy. Vegetatio 81: 1–15.

    Google Scholar 

  35. Mucina, L. and M. Hauser. 1993. A new method for determining optimal number of clusters in vegetation data. Abstracta Bo-tanica 17: 147–153.

    Google Scholar 

  36. Mucina, L., E. Pienaar, A. van Niekerk, M. Lötter, C.R. Scott-Shaw, M. Meets, L. Seoke, T. Sekome, S.J. Siebert, L. Loffler, S.G. Cawe, A.P. Dold, A. Abbott, J. Kalwij and L. Tichý. 2007. Habitat-level Classification of the Albany Coastal, Pondoland Scarp and Eastern Scarp Forests. Unpublished Report for DWAF. Stellenbosch University, Matieland, ZA.

    Google Scholar 

  37. Mueller-Dombois, D. and H. Ellenberg. 1974. Aims and Methods of Vegetation Ecology. Wiley, New York.

    Google Scholar 

  38. Noy-Meir, I., D. Walker and W.T. Williams. 1975. Data transformations in ecological ordination. J. Ecol. 63: 779–800.

    Google Scholar 

  39. Oksanen, J. and T. Tonteri. 1995. Rate of compositional turnover along gradients and total gradient length. J. Veg. Sci. 6: 8 15–824.

    Google Scholar 

  40. Podani, J. 1998. Explanatory variables in classifications and the detection of the optimum number of clusters. In: Hayashi, C., N. Ohsumi, K. Yajima, Y. Tanaka, H.-H. Bock and Y. Baba, (eds.), Data Science, Classification and Related Methods. Springer, Tokyo, pp. 125–132.

    Google Scholar 

  41. Podani, J. 2000. Introduction to the Exploration of Multivariate Biological Data. Backhuys Publishers, Leiden, NL.

    Google Scholar 

  42. Podani, J. 2001. Computer Programs for Data Analysis in Ecology and Systematics. User’s Manual. Scientia, Budapest.

    Google Scholar 

  43. Podani, J. 2005. Multivariate exploratory analysis of ordinal data in ecology: pitfalls, problems and solutions. J. Veg. Sci. 16: 497–510.

    Google Scholar 

  44. Podani, J. 2006. Braun-Blanquet’s legacy and data analysis in vegetation science. J. Veg. Sci. 17: 113–117.

    Google Scholar 

  45. Popma, J., L. Mucina, O. van Tongeren and E. van der Maarel. 1983. On the determination of optimal levels in phytosociological classification. Vegetatio 52: 65–75.

    Google Scholar 

  46. Redman, C.M. and L.R. Leighton. 2009. Multivariate faunal analyses of the Turonian Bissekty Formation: variation in the degree of marine influence in temporally and spatially averaged fossil assemblages. Palaios 24: 18–26.

    Google Scholar 

  47. Roberts, D.W. 2010. Labdsv: ordination and multivariate analysis for ecology. R package version 1.4–1. http://CRAN.R-pro-ject.org/package=labdsv; accessed on 15 March 2010

    Google Scholar 

  48. Roleček, J., L. Tichý, D. Zelený and M. Chytrý. 2009. Modified TWINSPAN classification in which the hierarchy respects cluster heterogeneity. J. Veg. Sci. 20: 596–602.

    Google Scholar 

  49. Schulze, R.E. 1997. South African Atlas for Agrohydrology and Climatology. Water Research Commission, Pretoria, Report TT82/96.

    Google Scholar 

  50. Schmidtlein, S., L. Tichý, H. Feilhauer and U. Faude. 2010. A brute-force approach to vegetation classification. J. Veg. Sci. 21: 1162–1171.

    Google Scholar 

  51. Tamás, J., J. Podani and P. Csontos. 2001. An extension of presence/ absence coefficients to abundance data: a new look at absence. J. Veg. Sci. 12: 401–410.

    Google Scholar 

  52. Tichý, L. 2002. JUICE, software for vegetation classification. J. Veg. Sci. 13: 451–453.

    Google Scholar 

  53. Tichý, L. and J. Holt. 2006. JUICE, Program for Management, Analysis and Classification of Ecological Data. User’s Manual. Masaryk University, Brno.

    Google Scholar 

  54. Tichý, L., M. Chytrý, M. Hájek, S. Talbot and Z. Botta-Dukát. 2009. OptimClass: Using species-to-cluster fidelity to determine the optimal partition in classification of ecological communities. J. Veg. Sci. 21: 287–299.

    Google Scholar 

  55. van der Maarel, E. 1979. Transformation of cover-abundance values in phytosociology and its effects on community similarity. Vegetatio 39: 97–114.

    Google Scholar 

  56. van der Maarel, E. 2007. Transformation of cover-abundance values for appropriate numerical treatment – alternatives to the proposals by Podani. J. Veg. Sci. 8: 767–770.

    Google Scholar 

  57. van Groenewoud, H. 1992. The robustness of Correspondence, De-trended Correspondence, and TWINSPAN Analysis. J. Veg. Sci. 3: 239–246.

    Google Scholar 

  58. Wildi, O. 2010. Data Analysis in Vegetation Ecology. Wiley, Chich-ester, UK.

    Google Scholar 

  59. Williams, N.E. 2010. Restoration of nontarget species: bee communities and pollination function in riparian forests. Restor. Ecol. 19: 450–459.

    Google Scholar 

  60. Wolda, H. 1981. Similarity indices, sample size and diversity. Oe-cologia 50: 296–302.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to M. C. Lötter.

Electronic supplementary material

Rights and permissions

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Cite this article

Lötter, M.C., Mucina, L. & Witkowski, E.T.F. The classification conundrum: species fidelity as leading criterion in search of a rigorous method to classify a complex forest data set. COMMUNITY ECOLOGY 14, 121–132 (2013). https://doi.org/10.1556/ComEc.14.2013.1.13

Download citation

Keywords

  • Cluster analysis
  • Fidelity
  • JUICE software
  • Resemblance
  • Vegetation classification