Considering the extensive data sets and statistical techniques, animal breeding embodies a branch of machine learning that has a constantly increasing impact on breeding. In our study, information regarding the potential of machine learning and data mining within a large set of horses and breeds is presented. The individual assignment methods and factors influencing the success rate of the procedure are compared at the Czech population scale. The fixation index values ranged from 0.057 (HMS1) to 0.144 (HTG6), and the overall genetic differentiation amounted to 8.9% among the breeds. The highest genetic divergence (FST = 0.378) was established between the Friesian and Equus przewalskii; the highest degree of gene migration was obtained between the Czech and Bavarian Warmblood (Nm = 14,302); and the overall global heterozygote deficit across the populations was 10.4%. The eight standard methods (Bayesian, frequency, and distance) using GeneClass software and almost all mainstream classification algorithms (Bayes Net, Naive Bayes, IB1, IB5, KStar, JRip, J48, Random Forest, Random Tree, PART, MLP, and SVM) from the WEKA machine learning workbench were compared by utilizing 314,874 real allelic data sets. The Bayesian method (GeneClass, 89.9%) and Bayesian network algorithm (WEKA, 84.8%) outperformed the other techniques. The breed genomic prediction accuracy reached the highest value in the cold-blooded horses. The overall proportion of individuals correctly assigned to a population depended mainly on the breed number and genetic divergence. These statistical tools could be used to assess breed traceability systems, and they exhibit the potential to assist managers in decision-making as regards breeding and registration.
This is a preview of subscription content, log in to check access.
The authors would like to thank Professor Petr Hořín (Department of Animal Genetics, VFU Brno) for providing samples of the Camargue, Murgese, and Icelandic horses. This section would be incomplete without quoting Irena Vrtková, PhD (Laboratory of Agrogenomics) and her unwavering support over the years.
The research was funded by a project (NAZV QH92277) of the National Agency for Agricultural Research of the Ministry of Agriculture of the Czech Republic, utilizing the institutional support for the development of Mendel University in Brno. Furthermore, the research was supported by the Ministry of Education, Youth and Sports under project No. LO1210 solved at the Centre for Research and Utilization of Renewable Energy.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
All procedures performed in studies involving animals were in accordance with the ethical standards of the institution or practice at which the studies were conducted.
Table S2The numbers of animals sampled per population and correctly assigned, and the individual assignment success rates for each population achieved using different assignment methods and numbers of microsatellite markers (GeneClass). (DOCX 35 kb)
Table S4The numbers of animals sampled per population and correctly assigned, and the individual assignment success rates for each population achieved using different assignment methods and numbers of microsatellite markers (the WEKA software). (DOCX 36 kb)
Bjørnstad G, Røed KH (2002) Evaluation of factors affecting individual assignment precision using microsatellite data from horse breeds and simulated breed crosses. Anim Genet 33:264–270CrossRefGoogle Scholar
Cavalli-Sforza LL, Edwards AWF (1967) Phylogenetic analysis: models and estimation procedures. Am J Hum Genet 19:233–257Google Scholar
Cornuet JM, Piry S, Luikart G, Estoup A, Solignac M (1999) New methods employing multilocus genotypes to select or exclude populations as origins of individuals. Genetics 153:1989–2000Google Scholar
Dalvit C, De Marchi M, Dal Zotto R, Gervaso M, Meuwissen T, Cassandro M (2008) Breed assignment test in four Italian beef cattle breeds. Meat Sci 80:389–395CrossRefGoogle Scholar
Fan B, Chen YZ, Moran C, Zhao SH, Liu B, Zhu MJ, Xiong TA, Li K (2005) Individual-breed assignment analysis in swine populations by using microsatellite markers. Asian Australas J Anim Sci 18:1529–1534CrossRefGoogle Scholar
Goldstein DB, Ruiz Linares A, Cavalli-Sforza LL, Feldman MW (1995) Genetic absolute dating based on microsatellites and the origin of modern humans. Proc Natl Acad Sci U S A 92:6723–6727CrossRefGoogle Scholar
Goodman SJ (1997) Rst Calc: a collection of computer programs for calculating estimates of genetic differentiation from microsatellite data and determining their significance. Mol Ecol 6:881–885CrossRefGoogle Scholar
Hauser L, Seamons TR, Dauer M, Naish KA, Quinn TP (2006) An empirical verification of population assignment methods by marking and parentage data: hatchery and wild steelhead (Oncorhynchus mykiss) in Forks Creek, Washington, USA. Mol Ecol 15:3157–3173CrossRefGoogle Scholar
Iquebal MA, Sarika, Dhanda SK et al (2013) Development of a model webserver for breed identification using microsatellite DNA marker. BMC Genet 14:118CrossRefGoogle Scholar
Iquebal MA, Ansari MS, Sarika DSP, Verma NK, Aggarwal RA, Jayakumar S, Rai A, Kumar D (2014) Locus minimization in breed prediction using artificial neural network approach. Anim Genet 45:898–902CrossRefGoogle Scholar
Jaiswal S, Dhanda SK, Iquebal MA, Arora V, Shah TM, Angadi UB, Joshi CG, Raghava GPS, Rai A, Kumar D (2016) BIS-CATTLE: a web server for breed identification using microsatellite DNA markers. Curr Res Bioinforma 5:10–17CrossRefGoogle Scholar
Jamieson A, Taylor SCS (1997) Comparisons of three probability formulae for parentage exclusion. Anim Genet 28:397–400CrossRefGoogle Scholar
Kalinowski ST, Taper ML, Marshall TC (2007) Revising how the computer program CERVUS accommodates genotyping error increases success in paternity assignment. Mol Ecol 16:1099–1106CrossRefGoogle Scholar
Koskinen M (2003) Individual assignment using microsatellite DNA reveals unambiguous breed identification in the domestic dog. Anim Genet 34:297–301CrossRefGoogle Scholar
Liu K, Muse SV (2005) PowerMarker: integrated analysis environment for genetic marker data. Bioinformatics 21:2128–2129CrossRefGoogle Scholar
Nei M (1973a) The theory and estimation of genetic distances. In: Morton NE (ed) Genetic Structure of Populations. University Press of Hawaii, HonoluluGoogle Scholar
Nei M (1973b) Analysis of gene diversity in subdivided populations. Proc Natl Acad Sci U S A 70:3321–3323CrossRefGoogle Scholar
Nei M, Tajima F, Tateno Y (1983) Accuracy of estimated phylogenetic trees from molecular data. J Mol Evol 19:153–170CrossRefGoogle Scholar
Paetkau D, Calvert W, Stirling I, Strobeck C (1995) Microsatellite analysis of population structure in Canadian polar bears. Mol Ecol 4:347–354CrossRefGoogle Scholar
Pérez-Enciso M (2017) Animal breeding learning from machine learning. J Anim Breed Genet 134:85–86CrossRefGoogle Scholar
Piry S, Alapetite A, Cornuet JM, Paetkau D, Baudouin L, Estoup A (2004) GeneClass2: a software for genetic assignment and first-generation migrant detection. J Hered 95:536–539CrossRefGoogle Scholar
Putnová L, Štohl R, Vrtková I (2018) Genetic monitoring of horses in the Czech Republic: a large-scale study with a focus on the Czech autochthonous breeds. J Anim Breed Genet 135:73–83CrossRefGoogle Scholar
Rannala B, Mountain JL (1997) Detecting immigration by using multilocus genotypes. Proc Natl Acad Sci U S A 94:9197–9201CrossRefGoogle Scholar
Rousset F (2008) Genepop'007: a complete reimplementation of the Genepop software for windows and Linux. Mol Ecol Resour 8:103–106CrossRefGoogle Scholar
Talle SB, Fimland E, Syrstad O, Meuwissen T, Klungland H (2005) Comparison of individual assignment methods and factors affecting assignment success in cattle breeds using microsatellites. Acta Agric Scand Sect A-Anim Sci 55:74–79Google Scholar
Van de Goor LH, van Haeringen WA, Lenstra JA (2011) Population studies of 17 equine STR for forensic and phylogenetic analysis. Anim Genet 42:627–633CrossRefGoogle Scholar
Van Oosterhout C, Hutchinson WF, Wills DPM, Shipley P (2004) MICRO-CHECKER: software for identifying and correcting genotyping errors in microsatellite data. Mol Ecol Notes 4:535–538CrossRefGoogle Scholar
Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358–1370Google Scholar