A model for correlation within clusters and its use in pollen analysis

Dale, M. B.; Allison, L.; Dale, P. E. R.

doi:10.1556/ComEc.11.2010.1.8

A model for correlation within clusters and its use in pollen analysis

Open access
Published: 30 December 2010

Volume 11, pages 51–58, (2010)
Cite this article

Download PDF

You have full access to this open access article

Community Ecology Aims and scope Submit manuscript

A model for correlation within clusters and its use in pollen analysis

Download PDF

M. B. Dale¹,
L. Allison² &
P. E. R. Dale¹

117 Accesses
3 Citations
Explore all metrics

Abstract

Many methods of cluster analysis do not explicitly account for correlation between attributes. In this paper we explicitly model any correlation using a single factor within each cluster: i.e., the correlation of atributes within each cluster is adequately described by a single component axis. However, the use of a factor is not required in every cluster. Using a Minimum Message Length criterion, we can determine the number of clusters and also whether the model of any cluster is improved by introducing a factor. The technique allows us to seek clusters which reflect directional changes rather than imposing a zonation constrained by spatial (and implicitly temporal) position. Minimal message length is a means of utilising Okham’s Razor in inductive analysis. The ‘best’ model is that which allows most compression of the data, which results in a minimal message length for the description. Fit to the data is not a sufficient criterion for choosing models because more complicated models will almost always fit better. Minimum message length combines fit to the data with an encoding of the model and provides a Bayesian probability criterion as a means of choosing between models (and classes of model). Applying the analysis to a pollen diagram from Southern Chile, we find that the introduction of factors does not improve the overall quality of the mixture model. The solution without axes in any cluster provides the most parsimonious solution. Examining the cluster with the best case for a factor to be incorporated in its description shows that the attributes highly loaded on the axis represent a contrast of herbaceous vegetation and dominant forests types. This contrast is also found when fitting the entire population, and in this case the factor solution is the preferred model. Overall, the cluster solution without factors is much preferred. Thus, in this case classification is preferred to ordination although more data are desirable to confirm such a conclusion.

Article PDF

Insights in Hierarchical Clustering of Variables for Compositional Data

Article Open access 16 November 2023

Topics in constrained and unconstrained ordination

Article 19 November 2014

Binary Coefficients Redux

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Abbreviations

MDL:: Minimum description Length
MML:: Minimal Message Length

References

Agusta, Y. and Dowe, D. L. 2003. Unsupervised learning of correlated multivariate Gaussian mixture models. Lecture Notes in Artificial Intelligence 2903, Springer-Verlag, Berlin. pp. 477–489.
Google Scholar
Aitchison, J. and Kay, J. W. 2003. Possible solutions of some essential zero problems. In: Compositional Data Analysis. Compositional Data Analysis Workshop, Universitat de Girona. pp. 1–6.
Berryman, A. A. 1992. On choosing models for describing and analyzing ecological time series. Ecology 73: 694–698.
Article Google Scholar
Amari, S. and Nagaoka, H. 2000. Methods of Information Geometry Translations of Mathematical Monographs, American Mathematical Society and Oxford University Press, Oxford.
Google Scholar
Balasubramanian, V. 1997. Statistical inference, Occam’s razor, and statistical mechanics on the space of probability distributions. Neural Computation 9: 349–368.
Article Google Scholar
Bennett, K. D. and Porter, C. 2001. Late Quarternary dynamics of Western Tierra del Fuego. Uppsala Universitet: https://doi.org/www.geo.uu.se/Institutionen för geovetenskaper: Paleobiologi: forskning.
Berryman, A. A. 1992. On choosing models for describing and analyzing ecological time series. Ecology 73: 694–698.
Article Google Scholar
Bezdek, J.C., Coray, C., Gunderson, R. and Watson, J. 1981a. Detection and characterization of cluster substructure I. linear structure: fuzzy c-lines. SIAM J. App. Math. 40: 339–357.
Article Google Scholar
Bezdek, J.C., Coray, C., Gunderson, R. and Watson, J. 1981b. Detection and characterization of cluster substructure II. Fuzzy c-varieties and convex combinations thereof. SIAM J. App. Mathe. 40: 358–372.
Article Google Scholar
Birks, H. J. B. and Gordon, A. D. 1985. Numerical methods in Quaternary Pollen Analysis. Academic Press, London.
Google Scholar
Boulton, D. M. and Wallace, C. S. 1970. A program for numerical classification. Computer J. 13: 63–69.
Article Google Scholar
Browne, M. W and Zhang, G. 2005. DyFA: Dynamic Factor Analysis of Lagged Correlation Matrices Version 2.03 [Computer Software and Manual]. https://doi.org/quantrm2.psy.ohio-state.edu/browne.
Crutchfield, J. P. and Young, K.. 1989. Inferring statistical complexity. Physical Rev. Lett. 63: 105–108.
Article CAS Google Scholar
Dale, M. B. 2000. Mt Glorious Revisited: Secondary succession in subtropical rainforest. Community Ecol. 1: 181–193.
Article Google Scholar
Dale, M. B. 2001. Minimal message length clustering, environmental heterogeneity and the variable Poisson model. Community Ecol. 2: 171–180.
Article Google Scholar
Dale, M. B. 2007. Changes in the model of within-cluster distribution of attributes and their effects on cluster analysis of vegetation data. Community Ecol. 8: 9–14.
Article Google Scholar
Dale, M. B., Allison, L. and Dale, P. E. R. 2007. Segmentation and clustering as complementary sources of information. Acta Oe-col. 31:193–202.
Article Google Scholar
Dale, M. B., Allison, L. and Dale, P. E. R. submitted. Attribute properties and clustering procedures: an example using pollen analysis.
Dale, M. B., Dale, P. E. R. and Edgoose, T. 2002. Markov models for incorporating temporal dependence. Acta Oecol. 23:261–269.
Article Google Scholar
Dale, M. B., Salmina, L. and Mucina, L. 2001. Minimum message length clustering: an explication and some applications to vegetation data. Community Ecol. 2: 231–247.
Article Google Scholar
Dale, M. B. and Walker, D. 1970. Information analysis of pollen diagrams. Pollen et Spores 2: 21–37.
Google Scholar
Dale, M. B. and Wallace, C. S. 2005. Hierarchical clusters of vegetation types. Community Ecol. 6: 57–74.
Article Google Scholar
Edgoose, T. and Allison, L. 1999. MML Markov classification of sequential data. Statistics and Computing 9: 269–278.
Article Google Scholar
Edwards, R. T. and D. L. Dowe 1998. Single factor analysis in MML mixture modelling. Lecture Notes in Artificial Intelligence (LNAI) 1394, Springer-Verlag, Berlin. pp. 96–109.
Google Scholar
Georgieff, M. P. and Wallace, C. S. 1984. A general selection criterion for inductive inference. Proceedings 6^th European Conference Artificial Intelligence, (ECAI-84) Pisa. pp. 473–482.
Gordon, A.D. and Birks, H.J.B. 1972. Numerical methods in Quaternary palaeoecology. I. Zonation ofpollen diagrams. New Phytol. 71:961–979.
Article Google Scholar
Gower, J. C. 1974. Maximal predictive classification. Biometrics 30: 643–654.
Article Google Scholar
Green, D. G. 1983a. Interactive pollen time series analysis. Pollen et Spores 25: 531–540.
Google Scholar
Green, D. G. 1983b. The ecological interpretation of fine resolution pollen records. The New Phytol. 94: 459–477.
Article Google Scholar
Ihm, P. and van Groenewoud, H. 1975. A multivariate ordering of vegetation data based on Gaussian type gradient response curves. J. Ecol. 63: 767–777.
Article Google Scholar
Jörnsten, R. and Bin Yu. 2003. Simultaneous gene clustering and subset selection for sample classification via . Bioinformatics 19: 1100–1111.
Article Google Scholar
Kodratoff, Y. 1986. Leçons d’apprentissage symbolique, Editions Cépadues, Toulouse.
Google Scholar
Lafferty, J., McCallum, A. and Pereira, F. 2001. Conditional random fields: probabilistic models for segmenting and labelling sequence data. In: Proceedings 18th International Conference on Machine Learning (ICML 2001), Morgan Kaufmann, San Francisco. pp. 282–289.
Legendre, P. and Gallagher, E. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia 270: 271–280.
Article Google Scholar
Li, C. Biswas, G., Dale, M. B. and Dale, P. E. R. 2001. Building models of ecological dynamics using HMM-based temporal data clustering. In: Advances in Intelligent Data Analysis, the 4th International Conference on Intelligent Data Analysis, Lec-ture Notes in Computer Science Series 2189, Springer, Berlin. pp. 53–62.
Google Scholar
Liu, B., Hsu, W., Mun, L-F. and Lee, H-Y. 1999. Finding interesting patterns using user expectation. I.E.E.E. Trans. Knowledge and Data Engineering 11: 817–832.
Google Scholar
Mac Nally, R. 2000. Regression and model-building in conservation biology, biogeography and ecology: the distinction between – and reconciliation of – ‘predictive’ and ‘explanatory’ models. Biodivers. Conserv. 9: 655–671.
Article Google Scholar
Markgraf, V. 1983. Late and Postglacial vegetational and paleocli-matic changes in subantarctic, temperate, and arid environments in Argentina. Palynology 7: 43–70.
Article Google Scholar
Paez M. M., Schäbitz, F. and Stutz, S. 2001. Modern pollen–vegetation and isopoll maps in southern Argentina. J. Biogeogr. 28: 997–1021.
Article Google Scholar
Rahwan, T. and Jennings, N. R. 2008. An improved dynamic programming algorithm for coalition structure generation. In: L. Padgham, D. C. Parkes, J. Mueller and S. Parsons (eds.), Proceedings 7th International Conference on Autonomous Agents and Multiagent systems (AAMAS), Estoril, Portugal. pp. 1417–1420.
Google Scholar
Rissanen, J. J. 1978. Modelling by shortest data description. Automation 14: 465–471.
Article Google Scholar
Schader, M. 1979. Branch and bound clustering with a generalised scatter criterion. Oper. Res. Verfahren 30: 154–162.
Google Scholar
Schmidhuber, J. 1997. What’s interesting? Tech. Rep. IDSIA-35–97, IDSIA, Lugano, Switzerland.
Google Scholar
Shalizi, C. R. and Crutchfield, J. P. 2001. Computational mechanics: pattern and prediction, structure and simplicity. J. Stat. Phys. 104: 819–881.
Article Google Scholar
Sombattheera, C. and Ghose, A. 2008 A best-first anytime algorithm for computing optimal coalition structures. In: L. Padgham, D. C. Parkes, J. Mueller and S. Parsons (ed.), Proceedings 7thIn-ternational Conference on Autonomous Agents and Multiagent systems (AAMAS), Estoril, Portugal. pp. 1425–1427.
Google Scholar
Vinod, H. D. 1969. Integer programming and the theory of grouping. Amer. Stat. Ass. J. 64: 506–519.
Article Google Scholar
Visser, G. and Dowe, D. L. 2007. Minimum message length clustering of spatially-correlated data with varying inter-class penalties. 6th IEEE International Conference on Computer and Information Science (ICIS 2007), 11–13 July 2007, Melbourne, Australia, pp. 17–22.
Walker, D. 1966. The late Quaternary history of the Cumberland lowlands. Philosophical Transactions Royal Society 251:1–210.
Article Google Scholar
Wallace, C. S. 1995. Multiple factor analysis by MML estimation. Technical Report 95/218, Dept Computer Science, Monash University, Clayton, Victoria 3168, Australia. 21pp.
Wallace, C. S. 1998. Intrinsic classification of spatially-correlated data. Computer J. 41: 602–611.
Article Google Scholar
Wallace, C. S. 2005. Statistical and Inductive Inference by Minimum Message Length. Springer, Berlin.
Google Scholar
Wallace, C. S. and Freeman, P. R. 1992. Single-factor analysis by minimal message length estimation. J. Roy. Stat. Soc. B 54: 195–209.
Google Scholar
Wallace, C. S. and Georgieff, M. P. 1983. A general objective for inductive inference. Technical Report 32, Department Computer Science, Monash University, Clayton, Victoria 3168, Australia.
Westhoff, V., and E. van der Maarel. 1978. The Braun-Blanquet approach. In: R. H. Whittaker (ed.), Classification of Plant Communities. Dr. W. Junk, Den Haag. pp. 287–399.
Chapter Google Scholar
Yamada, H. and Amaroso, S. 1971. Structural and behavioural equivalences of tessellation automata. Information and Control 18:1–31.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Environment, Griffith University, 4111, Nathan Queensland, Australia
M. B. Dale & P. E. R. Dale
Dept. Computer Science and Software Engineering, Monash University, Clayton, Victoria, Australia
L. Allison

Authors

M. B. Dale
View author publications
You can also search for this author in PubMed Google Scholar
L. Allison
View author publications
You can also search for this author in PubMed Google Scholar
P. E. R. Dale
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. B. Dale.

Electronic supplementary material

Supplementary material, approximately 63 KB.

Rights and permissions

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Dale, M.B., Allison, L. & Dale, P.E.R. A model for correlation within clusters and its use in pollen analysis. COMMUNITY ECOLOGY 11, 51–58 (2010). https://doi.org/10.1556/ComEc.11.2010.1.8

Download citation

Received: 28 September 2009
Revised: 15 January 2010
Accepted: 20 March 2010
Published: 30 December 2010
Issue Date: June 2010
DOI: https://doi.org/10.1556/ComEc.11.2010.1.8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A model for correlation within clusters and its use in pollen analysis

Abstract

Article PDF

Similar content being viewed by others

Insights in Hierarchical Clustering of Variables for Compositional Data

Topics in constrained and unconstrained ordination

Binary Coefficients Redux

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material, approximately 63 KB.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A model for correlation within clusters and its use in pollen analysis

Abstract

Article PDF

Similar content being viewed by others

Insights in Hierarchical Clustering of Variables for Compositional Data

Topics in constrained and unconstrained ordination

Binary Coefficients Redux

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material, approximately 63 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation