, 213:85 | Cite as

Application of a dendrogram seriation algorithm to extract pattern from plant breeding data

  • Vivi Noviati Arief
  • I. H. DeLacy
  • K. E. Basford
  • M. J. Dieters


A dendrogram is often used to display the results from hierarchical clustering; however, the order of objects in a standard dendrogram is arbitrary and so similarity cannot be readily interpreted. An optimized dendrogram, a dendrogram produced by re-ordering the objects using a seriation method, has a customized ordering that reflects the similarity among objects with most similar objects located closest together. Hierarchical clustering has been applied to the analysis of data from plant breeding programs to identify the patterns in breeding populations and to study genotype by environment interactions. In this paper we demonstrate the advantage of an optimized dendrogram for interpretation of plant breeding data and, given this advantage, argue that an optimized dendrogram should be used as the default whenever hierarchical clustering is used.


Dendrogram Optimized dendrogram Seriation Plant breeding 


  1. Arief VN, DeLacy IH, Wenzl P, Dreisigacker S, Crossa J, Dieters MJ, Basford KE (2013) Using molecular marker order to compare genetic structure in plant populations undergoing selection. J Environ Stat 4(4):1Google Scholar
  2. Arief VN, DeLacy IH, Crossa J, Payne T, Singh R, Braun H-J, Tian T, Basford KE, Dieters MJ (2015) Evaluating testing strategies for plant breeding field trials: redesigning a CIMMYT international wheat nursery to provide extra genotype connection accross cycles. Crop Sci 55:164–177CrossRefGoogle Scholar
  3. Bar-Joseph Z, Gifford DK, Jaakkola TS (2001) Fast optimal leaf ordering for hierarchical clustering. Bioinformatics 17(Suppl. 1):S22–S29CrossRefPubMedGoogle Scholar
  4. CIMMYT (2004) The International Wheat Information System™. Accessed 10 August 2007
  5. Cooper M, Woodruff DR (1993) Predicting Grain-Yield in Australian Environments Using Data from CIMMYT International Wheat Performance Trials. 3. Testing Predicted Correlated Response to Selection. Field Crop Res 35:191–204CrossRefGoogle Scholar
  6. Cooper M, DeLacy IH (1994) Relationships among analytical methods used to study genotypic variation and genotype-by-environment interaction in plant breeding multi-environment experiments. Theor Appl Genet 88:561–572CrossRefPubMedGoogle Scholar
  7. de la Vega AJ, DeLacy IH, Chapman SC (2007) Progress over 20 years of sunflower breeding in central Argentina. Field Crop Res 100:61–72CrossRefGoogle Scholar
  8. DeLacy IH, Cooper M (1990) Pattern analysis for the analysis of regional variety trials. In: Kang MS (ed) Genotype-by-Environment Interaction and Plant Breeding. Louisiana State University, Baton Rouge, pp 189–213Google Scholar
  9. DeLacy IH, Basford KE, Cooper M, Bull JK, McLaren CG (1996) Analysis of multi-environment trials—an historical perspective. In: Cooper M, Hammer GL (eds) Plant Adaptation and Crop Improvement. CAB International, Wallingford, pp 193–224Google Scholar
  10. Dice LR (1945) Measures of the amount of ecological association between species. Ecology 26:297–302CrossRefGoogle Scholar
  11. Dreisigacker S, Shewayrga H, Crossa J, Arief VN, DeLacy IH, Singh RP, Dieters MJ, Braun H-J (2011) Genetic structures of the CIMMYT international yield trial targeted to irrigated environments. Mol Breed 29(2):529–541. doi: 10.1007/s11032-011-9569-7 CrossRefGoogle Scholar
  12. Earle D, Hurley CB (2014) Advances in dendrogram seriation for application to visualization. J Comput Graph Stat. doi: 10.1080/10618600.2013.874295 Google Scholar
  13. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–14868CrossRefPubMedPubMedCentralGoogle Scholar
  14. Falconer DS, McKay TFC (1996) Introduction to quantitative genetics, 4th edn. Longman, Burnt Mill, HarlowGoogle Scholar
  15. Forina M, Armanino C, Raggio V (2002) Clustering with dendrograms on interpretation variables. Anal Chim Acta 454:13–19CrossRefGoogle Scholar
  16. Gabriel KR (1971) The biplot graphic display of matrices with application to principal component analysis. Biometrika 58(3):453–467CrossRefGoogle Scholar
  17. Gilmour AR, Gogel BJ, Cullis BR, Thompson R (2009) ASReml User Guide Release 3.0. VSN International Ltd, Hemel Hempstead, UKGoogle Scholar
  18. Gower JC (1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325–338CrossRefGoogle Scholar
  19. Gruvaeus G, Wainer H (1972) Two additions to hierarchical cluster analysis. Br J Math Stat Psychol 25:200–206CrossRefGoogle Scholar
  20. Hackett CA, Wachira FN, Paul S, Powell W, Waugh R (2000) Construction of a genetic linkage map for Camellia sinensis (tea). Heredity 85:346–355CrossRefPubMedGoogle Scholar
  21. Hahsler M, Hornik K, Buchta C (2008) Getting things in order: an introduction to the R package seriation. J Stat Softw 25(3).
  22. Hamann U (1961) Merkmalsbestand und verwandtschaftsbeziehungen der farinosae: ein beitrag zum system der monokotyledonen. Willdenowia 2(5):639–768Google Scholar
  23. Hill MO (1979) TWINSPAN: a FORTRAN program for arranging multivariate data in an ordered two-way table by classification of the individuals and attributes. Ecology and Systematics, Cornell University, Ithaca, NYGoogle Scholar
  24. Hill MO, Bunce RGH, Shaw MW (1975) Indicator species analysis, a disivise polythetic method of classification, and its application to a survey of native pinewoods in Scotland. J Ecol 63:597–613CrossRefGoogle Scholar
  25. Hurley CB (2004) Clustering visualizations of multidimensional data. J Comput Graph Stat 13(4):788–806CrossRefGoogle Scholar
  26. Liiv I (2010) Seriation and matrix reordering methods: an historical overview. Statistical Analysis and Data Mining 3:70–91. doi: 10.1002/sam Google Scholar
  27. McLaren CG (2007) TDM GMS Browse. IRRI Philippines. Accessed 4 August 2007
  28. Mirzawan PDN, Cooper M, DeLacy IH, Hogarth DM (1994) Retrospective analysis of the relationships among the test environments of the Southern Queensland sugarcane breeding programme. Theor Appl Genet 88:707–716CrossRefPubMedGoogle Scholar
  29. Rajaram S, van Ginkel M, Fischer RA (1995) CIMMYT’s wheat breeding mega-environments (ME). In: Li ZS, Xin ZY (eds) Proceedings of the 8th international wheat genetic symposium. Beijing, China, pp. 1101–1106Google Scholar
  30. Redden RJ, DeLacy IH, Butler DG, Usher T (2000) Analysis of line × environment interactions for yield in navy beans. 2. Pattern analysis of lines and environment within years. Aust J Agric Res 51:607–617CrossRefGoogle Scholar
  31. Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kans Sci Bull 38:1409–1438Google Scholar
  32. van Ooijen JW (2006) JoinMap® 4, Software fro the calculation of genetics linkage maps in experimental population. Kyazma BV, Wageningen, NetherlandGoogle Scholar
  33. van Os H, Stam P, Visser RGF, van Eck HJ (2005) RECORD: a novel method for ordering loci on a genetic linkage map. Theor Appl Genet 112:30–40CrossRefPubMedGoogle Scholar
  34. Ward JH (1963) Hierarchical grouping to optimise an objective function. J Am Stat Assoc 58:236–244CrossRefGoogle Scholar
  35. Williams WT (1976) Pattern Analysis in Agricultural Science. Elsevier, AmsterdamGoogle Scholar
  36. Wu Y, Bhat PR, Close TJ, Lonardi S (2009) Efficient and accurate construction of genetic linkage maps from the minimum spanning tree of a graph. PLoS Genet 4(10):e1000212CrossRefGoogle Scholar
  37. Wu H-M, Tien Y-J, C-h Chen (2010) GAP: a graphical environment for matrix visualization and cluster analysis. Comput Stat Data Anal 54:767–778CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  • Vivi Noviati Arief
    • 1
  • I. H. DeLacy
    • 1
  • K. E. Basford
    • 1
  • M. J. Dieters
    • 1
  1. 1.School of Agriculture and Food SciencesThe University of QueenslandBrisbaneAustralia

Personalised recommendations