Innovations in Statistical Analysis and Genetic Algorithms

Drezner, Taly Dawn

doi:10.1007/978-3-030-19111-5_9

Taly Dawn Drezner⁶

Part of the book series: International Series in Operations Research & Management Science ((ISOR,volume 281))

473 Accesses

Abstract

We proposed new techniques in statistics and in genetic algorithms. We used biological principles to innovate new approaches in genetic algorithms that yielded improved solutions to optimization problems, finding improved best known results for multiple instances. These mimicked patterns in the natural world, including female choice of mates, as well as alpha male social structures. We also highlight inconsistencies between biological processes and their genetic algorithm counter-parts.

Two other innovations in methodology and statistics include our development of the sequential correlated Bonferroni test which controls for false positive results that occur from running multiple statistical tests. It incorporates the correlation between significant p-values, thereby resulting in a less conservative filter. We also developed the statistical underpinnings of a new approach for estimating transition points (in species or any other defined population) between stages. Transition from one stage to the next is a natural part of life, yet it can be difficult to estimate, particularly in cases where only a few transitions occur in every measurement period. We confirmed the validity and applicability of this new approach demonstrating low standard errors and robust output.

Download chapter PDF

The arithmetic mean of what? A Cautionary Tale about the Use of the Geometric Mean as a Measure of Fitness

Article Open access 28 March 2022

Peter Takacs & Pierrick Bourrat

“The Theory was Beautiful Indeed”: Rise, Fall and Circulation of Maximizing Methods in Population Genetics (1930–1980)

Article 13 July 2016

Jean-Baptiste Grodwohl

Models in the Biological Sciences

9.1 Opening: Background and a Role Model

I am very lucky to have Zvi Drezner as a father. He is a warm, dedicated, and engaged dad, as well as a successful researcher and a true role model in both life and work. Curious about the world, I followed his (and my mother’s) footsteps into academia, though with a focus in ecology (more precisely, the life science component of Geography, called biogeography). Through the years we have conversed about many topics from natural history and science, to astronomy, statistics, and genetic algorithms, among many others. From these conversations, we were both exposed to new ideas through stimulating and fun conversations. I am very fortunate to have these very special father–daughter times, which I cherish.

Zvi’s family discussions about research don’t end with wife Tammy and daughter Taly; one day my 9-year old son sat on Grandpa Zvi’s lap looking at an image on the computer of population and facility locations in Orange County, California, which my son curiously asked about. After grandpa’s explanation, young Ryan said, “There should be more facilities in denser areas!” Grandpa was so impressed, that he developed and together we wrote up the answer to that comment, which was published (Drezner et al. 2019) shortly after Ryan’s 10th birthday.

9.2 Overview

Through the many conversations Zvi Drezner and I have had, several innovations have emerged. Two papers represent the intersection of a biologically trained scientist with genetic algorithm approaches that mimic biological principles, resulting in new approaches and the identification of weaknesses such as terminology that is used incorrectly in genetic algorithms relative to the biological counterparts that those concepts mimic.

Our other innovations involve improvement in methods, data collection, and statistics. I developed a new approach rather intuitively for collecting data to estimate transition points in populations (Drezner 2008). Statistically unconfirmed, my father and I developed the statistical foundation with the help of an order statistics specialist. We also developed an improvement to the Bonferroni statistical correction, which adjusts output for false positive results that are generated as a byproduct of running multiple statistical tests, a common occurrence in ecology and other disciplines. We updated the Bonferroni approach, creating a far less conservative version for more practical use.

9.3 Innovations in Genetic Algorithms

Zvi Drezner has often focused on optimization problems that typically have too many solutions to check individually or solve by a branch and bound algorithm. Thus, heuristic approaches are needed to find good solutions so as to solve problems with a reasonable amount of computing time and resources. There are several commonly used heuristic approaches including tabu search (Glover and Laguna 1997), simulated annealing (Kirkpatrick et al. 1983), and variable neighborhood search (Hansen and Mladenović 2001). All of these approaches select a starting solution and then seek to improve that solution with a possibly better one in the next iteration. By comparison, the genetic algorithm heuristic approach includes multiple solutions at a time and seeks good solutions through a merging process of other solutions, and isolating better solutions for further improvement. In many cases, better results are obtained through a merging process.

Zvi Drezner has worked extensively with genetic algorithms (GAs). The GA approach mimics biological processes for algorithm development to solve optimization problems. GAs take a set of solutions and “mate” them to find improved solutions. Mating involves combining elements of each parent to create a new solution, the offspring. Each solution is an individual in the population. The process of mating, producing offspring, and adding them into the population is repeated a given number of times. Fitness in GAs is the value of the objective function and the goal is to find the solution with the best objective function value. The best population member is the GA outcome. All GAs include: (1) rules for parent selection, (2) procedures for merging or combining the two parents and generating offspring, and (3) a decision rule to determine which offspring are to be kept (at the expense of an existing member) and which should be removed (death or becoming non-reproductive). Any of these three components can be modified to improve the performance of the algorithm. Being biologically trained, I discussed biological phenomena and applications with Zvi Drezner and we developed new approaches for GAs based on biological principles. We also identify terms used in GAs that are inconsistent with their usage in biology.

In both of our GA papers, we pursue new approaches for parent selection. Most GAs select two parents at random, though in nature this is not a random process. My parents (Drezner and Drezner 2006) first designated about half of the population as males and half females. Parents are then selected randomly, one from each gender. Although very basic, this replication of nature yielded improved results. Drezner and Marcoulides (2003) suggested selecting one parent at random, and then selecting the second parent that is most dissimilar to the first parent from a random subset. In our work, we proposed two non-random parent selection rules, one mimicking the alpha male phenomenon in nature (one male to several females). The other was inspired by ideas of female choice, where one (the female) is randomly selected but the better of two randomly selected males is chosen for mating with a pre-specified probability of π.

9.3.1 Biological Background

A gene is a piece of DNA that influences a trait in that organism (Freeman et al. 2014). The same gene may have several forms, called alleles (Freeman et al. 2014). For example, in Mendel’s famous pea experiments, the gene for seed shape included an allele for round seeds and an allele for wrinkled seeds (Mendel 1866; Freeman et al. 2014).

Every individual possesses a unique combination of genetically determined traits, some of which may benefit it through its life. If those alleles translate into beneficial traits that result in the production of more offspring, then more individuals in the next generation will carry those traits (alleles). By comparison, other traits may confer disadvantages that reduce an individual’s fitness. Fitness is the number of offspring an individual can produce relative to other individuals (Freeman et al. 2014). If these disadvantages translate to reduced reproductive success, those traits will be represented in proportionally fewer individuals in the next generation. Thus the next generation will typically have more individuals that carry beneficial traits and fewer individuals with the disadvantageous traits. Through generations and time, the genetic make-up of a population changes. Natural selection occurs in populations through time. These principles of inheritance and the success of more fit individuals have been applied in GAs to solve optimization problems.

9.3.2 The Female Choice Approach

In nature, females may choose their mate in a variety of ways, including through visual cues such as coloring and appearance, as these can be signs of health and access to food, or females may choose mates through preference of a particular male’s territory (e.g., better resources) (Rosser 1992; McGraw and Ardia 2003). In our female choice-inspired study (Drezner and Drezner 2018), two random individuals (males) are selected. The better individual is selected as the first parent at a pre-specified probability 0 ≤ π ≤ 1 of the time. Otherwise, the other individual is selected as the first parent. When π = 1, the better population member is always selected as the first parent, and when π = 0 the inferior one is always selected. The other sampled individual is returned to the population, and then the second parent (the female) is selected by the Drezner and Marcoulides (2003) principle that selects a more dissimilar mate, also consistent with the biological principle of inbreeding depression (Edmands 2007; Fenster and Galloway 2000). The GA process then commences, mating these two parents to produce an offspring. This was tested on π with values of 0, 25, 50, 75, and 100% to find the best value of π.

Extensive experiments were performed on the planar p-median problem (the multi-source Weber problem) (Brimberg et al. 2000), and the quadratic assignment problem (Drezner 2015). The planar p-median was tested on three problems with n = 654, 1060, and 3038 demand points (Reinelt 1991) for a total of 57 instances. Each instance was run 10 times and the best and average solution was recorded. The best solutions were obtained for π = 0. For example, the best known solution was found for the n = 654 instances for all 17 instances in all 10 runs. Unlike our original biological premise, the quality of the solution generally deteriorates as π increases. Four new best known solutions were obtained for the n = 3038 instances (for p = 250, 350, 450, 500). The quadratic assignment problem was tested on the (de Carvalho and Rahmann 2006) 14 instances. The best results were obtained for π = 0.25. One new best known solution was obtained for quadratic assignment problem instance BL144 (Drezner and Drezner 2018).

9.3.3 The Alpha Male Approach

In nature, many species have social structures with a dominant male that sires many or all offspring in a group of females (Freeman et al. 2014). Species with dominant males include various mammals such as baboons (Galbany et al. 2015), rodents (Farentinos 1980), horses (Wolter et al. 2014), and seals (Hoelzel et al. 1999), as well as other animal groups including birds (Polak 2006), insects (McDermott et al. 2014), and fish (Solomon-Lane et al. 2014). Male to male competition or combat may determine which male gets to mate, sometimes with multiple females (Haley et al. 1994). In such situations the stronger, healthier male is most likely to be the victor. Initially, we followed this principle by selecting the best population members as alpha males in developing the approach. However, randomly selected alpha males yielded better results. In Drezner and Drezner (2019) we began with a population of 100 and randomly selected k individuals of the population as alpha males, with the rest of the individuals defined as female (100 − k). The value of k is a parameter of the algorithm. Each female was randomly paired with one of the alpha males for mating and then an offspring was generated. Thus, 100 − k new offspring were produced, resulting in 200 − k individuals (the original 100 plus 100 − k offspring). The 100 best individuals were carried forward to constitute the next population. We tested many values of k to find the best performing one. This process was repeated and each time k members were randomly selected as alpha males (they may have been, e.g., females, previously). This process was repeated a pre-specified number of times so that run times are comparable to previous experiments (Drezner and Misevičius 2013).

We tested the alpha male approach on the de Carvalho and Rahmann (2006) 14 instances of the quadratic assignment problem. We tested fixed values of k in every iteration and randomly generated the value of k in a range in every generation. Generating the value of k in a range provided better results than a fixed value of k. The best results were obtained around k = 25 in a range of 5, for example, 20 ≤ k ≤ 25. For k = 25 there are, on average, about 3 females associated with each alpha male. Values vary in the animal world, but are often one male to a single digit number of females (Lukas and Clutton-Brock 2014). Two new best known solutions were obtained for the quadratic assignment problem instances BL100 and BL121 (Drezner and Drezner 2019). Our results show that randomly selected alpha males rather than better fit ones yielded the best results for solving our quadratic assignment instances. We also observed that when the female mates with an alpha male that is more dissimilar to her, the results are better, consistent with biological observations that breeding with close relatives produces less fit offspring, termed inbreeding depression in populations (Freeman et al. 2014). We also observe that when the number of females per alpha male fluctuates (e.g., over time), the results are better than with a fixed number of females, which necessarily occurs in animal populations.

9.3.4 Genetic Algorithm Terms and Parallel Biological Principles

In biology, the term fitness refers to how many viable offspring any given individual can produce relative to other members of the same species (Freeman et al. 2014). Fitness is related to the environment and to interactions with other species, both positively (e.g., mutualisms, facilitation) and negatively (e.g., competition). By comparison, in GAs fitness is not related to offspring production (foundational in biology) nor to the environment, as the environment is not a component of GAs. Rather, in GAs fitness represents the value of the objective function and the goal is to find the most fit population member (best solution), i.e., how good an individual is in its value of the objective solution. Thus the term fitness strongly diverges from its biological origins.

The term invasion (or invasive species) relates to the establishment of a species in a new place beyond its native range, where it negatively impacts (sometimes dramatically) the native species and community (e.g., Kent et al. 2018). Introduced species fall under a similar definition but their effect on the native community is less destructive. Both of these refer to new species to an area. In contrast, immigration is when individuals of the same species move from one population to another already established population. Immigration is important for increasing gene flow and fitness in recipient populations, while the term invasion is inherently negative and involves new, competing species that do not contribute to the gene pool of another species, in this case the one of interest (Whiteley et al. 2015). In GAs, a few outsiders may be added to increase genetic diversity, akin to the process of immigration in biology, but the term invasion (perhaps derived from the human idea of an attack by a foreign army) has been misapplied to this process. Since GAs only involve one species, they do not include interspecific (between species) interactions.

Both in biology and in common usage, the term generation refers to a large segment of a population made up of similarly aged individuals, or a cohort. In GAs, however, the term is essentially used in place of the term “birth” or to describe a single offspring, which is “generated.” Thus a family with three children would be described as being composed of two generations biologically (parents, children), while in GAs, each child represents a different generation.

There are also examples of GA results that parallel biological phenomena more closely. For example, individuals that are too similar or too different yield poor offspring in both GA and in the natural world (inbreeding and outbreeding depression). For example, fitness can decline when reproduction occurs between genetically distant members of the same species (outbreeding depression) (Edmands 2007; Fenster and Galloway 2000). In GAs, the equivalents of both inbreeding and outbreeding depression also show reduced success. Drezner and Drezner (2018) review many of these applications and many more parallels between GAs and the biological processes that inspire them.

9.4 Innovative Statistical Analysis

Zvi Drezner and I also have two statistical innovations. (1) We developed the statistical underpinnings of a new field approach designed to estimate transition points in a population (of any species). The principle is not specific to the life sciences and can be used for numerous applications that require transition point (from one stage to another) assessment or quantification in a population where individual transition age is variable. (2) Inspired by our exposure to order statistics, we developed a new and far less conservative approach for dealing with type I (false positive) errors that result when multiple statistical tests are run.

9.4.1 Estimating Transition Between Stages

Life cycle transition points (such as the age when juveniles transition to adults) are foundational in ecology. I estimated the juvenile–adult transition age (when reproduction begins) in a long-lived species (e.g., 150–250 years). Estimating the mean age at which a population becomes reproductive is complicated; sampling for the youngest reproductive individual yields an outlier (whose value is related to sample size), thus not representing a measure of central tendency for the population. Sampling individuals over many years can be difficult in long-lived species. In a population of 400 individuals whose lifespans average 200 years, only two individuals per year will transition, requiring decades of observations. Even sample size, foundational in statistics, is itself difficult to establish as very young juveniles are irrelevant, as are older adults. Distinguishing those individuals that are statistically meaningful and which are not is unclear. I now briefly introduce the species I used to develop the new approach for context and then I discuss the methodology and its statistical foundations.

The saguaro (pronounced “swah-roh”) cactus (Carnegiea gigantea, Fig. 9.1) is a keystone species and a charismatic plant that symbolizes the desert, with branches (“arms”) that seem to reach up to the sky (Drezner 2014). The age of transition from juvenile to reproductive adult varies with environmental conditions. Several of the plants in Fig. 9.1 show the presence of fruits (irregular small features at the tops of the plants). As individuals get older, branches eventually develop, where each additional branch essentially doubles the number of seeds that can be produced each season by that plant (Steenbergh and Lowe 1983). Just as the age when reproduction starts varies over the species’ range, branching is also variable. For example, in dry areas, plants tend to be under-branched, while where more water is available, plants use that water to increase the number of seeds they produce by branching (Yeaton et al. 1980).

In order to establish the age at which these plants begin to produce offspring, the oldest juveniles (e.g., 5 oldest) with no reproductive structures (flowers, fruits), and the same number of the youngest adults that are reproductive, were sampled (Drezner 2008). These are the individuals that are near the transition point (Drezner 2008). Individual age was estimated from height using a site-specific model developed for this particular species (Drezner 2003). I sampled the oldest pre-transition and the youngest post-transition members in four environmentally distinct populations for comparison. Each population was extensively searched such that the number of plants examined to isolate the oldest pre- and youngest post-transition individuals was large. The average age of these (e.g., 10) plants yielded the estimate for the mean transition age in that population. This methodology was later employed in a second study to find the transition from columnar form (no branches) to the branched form (Drezner 2013b). In the juvenile–adult transition study, the five (k) shortest flowering individuals and the five tallest non-flowering individuals were extracted. For the follow-up study on the transition to branched form, I used k = 10 (Drezner 2013b). When originally published, this approach was presented as it was carried out, but it was developed only intuitively (Drezner 2008) and was lacking statistical justification. We developed the statistical underpinnings of the technique. We assume the following (Drezner et al. 2015):

1.
Once an individual has transitioned to the next stage, it does not revert back.
2.
The transition age is normally distributed. Other distributions can be analyzed in the same way.
3.
There are about the same number of individuals at a given age in the range covered by the 2k observations.

The total number of individuals observed in a population (typically identified as the “sample size”), those assessed, and then included or excluded from the final 2k list is not relevant to our analysis. For example, if a million very young juveniles or very old adults were also present in the population of interest, it would not change our analysis or our results. We found the transition distribution as follows:

Let the k expected smallest values of the standardized Normal distribution for a sample of n be a ₁(n) ≤ a ₂(n) ≤… ≤ a _k(n). These values were found by extensive simulations. The unknown mean and standard deviation of the distribution of the transition age are μ and σ. The data consist of the k youngest post-transition individuals with ages a ₁ ≤… ≤ a _k and k oldest pre-transition individuals with ages b ₁ ≥… ≥ b _k.

To fit the data for a specific sample size n to the order statistics requires finding the μ and σ that satisfy the following set of equations as closely as possible:

$$\displaystyle \begin{aligned} a_j(n)\sigma+\mu = a_j; -a_j(n)\sigma+\mu = b_j \mbox{ for } j = 1,\ldots,k. \end{aligned} $$

(9.1)

The solution that minimizes the sum of squares of errors in these equations can be obtained by solving a simple linear regression where μ is the y −intercept and σ is the slope. The solution to this simple linear regression is based on several values:

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} &\displaystyle &\displaystyle \mu=\frac{\sum_{j=1}^ka_j+\sum_{j=1}^kb_j}{2k}; S_x = 2\sum_{j=1}^k a_j^2(n);~ S_y = \sum_{j=1}^k\left\{(a_j-\mu)^2+(b_j-\mu)^2\right\};\\ &\displaystyle &\displaystyle S_{xy}= \sum_{j=1}^k a_j(n)\left(a_j-b_j\right). \end{array} \end{aligned} $$

(9.2)

In addition to the mean μ calculated in (9.2)

$$\displaystyle \begin{aligned} \sigma=\frac{S_{xy}}{S_x}. \end{aligned} $$

(9.3)

The standard errors of μ and σ are

$$\displaystyle \begin{aligned} SE(\mu)=\sqrt{\frac{S_y-\sigma S_{xy}}{4k(k-1)}};~ SE(\sigma)=SE(\mu)\sqrt{\frac{2k}{S_x}}. \end{aligned} $$

(9.4)

We calculated the correlation coefficient r and found the p-value of the regression

$$\displaystyle \begin{aligned} r=\sigma\sqrt{\frac{S_x}{S_y}}. \end{aligned} $$

(9.5)

When the analysis is repeated for various values of n, the n that yields the largest value of r is selected for calculating the values in Eqs. (9.2) and (9.3). A spreadsheet that automatically calculates μ, σ, and their standard errors for the transition distribution for any k ≤ 10 is available at http://onlinelibrary.wiley.com/doi/10.1002/env.2351/suppinfo (Drezner et al. 2015). The spreadsheet calculates the values by Eqs. (9.2)–(9.5) for every 10 ≤ n ≤ 200, selects the sample size resulting in the largest value of the correlation coefficient r, and records the parameters of the transition distribution for the selected n. Researchers can insert their observations to obtain the results. For complete details see Drezner et al. (2015).

The main results (Drezner et al. 2015) include:

1.
The originally developed methodology is statistically sound and offers a new, practical, and robust approach for estimating transition points such as in years of age.
2.
We calculated the underlying statistics and confirm that the measure of central tendency originally estimated (Drezner 2008, 2013b) is indeed the mean (Drezner et al. 2015).
3.
While the original means were correct, the standard errors reported in the two original studies were not, and interestingly, all eight erroneous values originally reported (Drezner 2008, 2013b) were higher than the updated, correct values. Even with means as high as 139 years, the updated SE values in all 8 field-collected datasets were less than 1 year, including those based on only 5 pre- and 5 post-transition values (Drezner 2008; Drezner et al. 2015).
4.
Despite the small number of values used in the final calculations (recognizing that those are derived from a much larger number of measurements), the results are robust and insensitive to changes in k. The start of branching study used k = 10; we compared those results with k = 5, 6, 7, 8, and 9 individuals from each stage. The estimated mean ages for the branching transition in the four sites with k = 10 (k = 5 in parentheses following) were: 77.8 (79.1) years of age, 95.9 (97.2), 102.9 (102.8), and 139.2 (139.9) (Drezner et al. 2015).
5.
This technique makes assessing transitions fast and efficient, requiring only one field season, and can be easily calculated with our spreadsheet http://onlinelibrary.wiley.com/doi/10.1002/env.2351/suppinfo (Drezner et al. 2015).

9.4.2 The Correlated Bonferroni Technique

Running multiple statistical tests yields multiple p-values, creating a statistical challenge as the more tests that are run, the more likely a significant result will be obtained by chance. Twenty results would be expected to have one significant (p < 0.05) result by chance alone. In fact, there is a 64% chance that at least one result in 20 would be statistically significant. The Bonferroni technique (BT) was developed to correct for false positive (type I) errors. It approximately divides α (e.g., 0.05) by the number of significant results (k), and only results with p-values lower than the new threshold pass the test (Bonferroni 1936; Miller 1981). The BT was updated by Holm (1979) who proposed the less conservative sequential Bonferroni technique (SeqBT) (Rice 1989) where all significant p-values are placed in ascending order, recalculating $\frac {\alpha }{k}$ for each test anew in sequence. Thus, the tenth smallest p-value must be less than approximately $\frac {\alpha }{10}$ (see Table 9.1), the ninth smallest p-value less than $\frac {\alpha }{9}$, etc. The SeqBT has since been adopted in many studies (e.g., Drezner 2013a; Gittman et al. 2016; Snyder and Stepien 2017).

Table 9.1 Critical values for 3 ≤ s ≤ 10 significant results compared with BT

Full size table

Concerns have been expressed about even the less conservative SeqBT. Not only is it used inconsistently, but the decision-making process for applying it is uncertain (Cabin and Mitchell 2000). For example, should two tables, each with 10 significant results, be pooled (thus k = 20), or does each set of analyses stand on its own (Cabin and Mitchell 2000)? In a survey of editors of three highly respected life science journals, there was no consensus on the usage and application of the SeqBT (Cabin and Mitchell 2000). Also troubling is that more in-depth analysis is discouraged with the SeqBT (Moran 2003). As more significant results are obtained, the cut-off significance level declines, potentially eliminating results that had been previously viable with a smaller k (Moran 2003). This is even more problematic in cases where results are consistently significant, but in all cases they are very close to α (e.g., 0.05) (Moran 2003). In such cases, the consistency across tests, rather than demonstrating a reliable pattern worth reporting, instead becomes a liability as many or all results are rejected by the SeqBT. If 5 of 10 tests are significant (p < 0.05) but only marginally, the SeqBT will fail to reject all null hypotheses (i.e., none would remain significant) (Moran 2003). However, the likelihood that 5 of 10 tests yield p < 0.05 by random chance is less than 1 in 10,000! Thus, some of these 5 results must be significant, yet their significance would be reversed with the BT or SeqBT corrections. The Bonferroni technique and its modifications have been used to reduce false positive results, but at the cost of rejecting viable results. The SeqBT remains rather conservative (Cabin and Mitchell 2000). The reduction of potential false positive results should not lead to excessive false negative results from BT-type corrections that may be too stringent.

The Bonferroni tests assume that these ordered statistics (p-values in ascending order) are essentially independent, yet they are not. Even using randomly generated data, the correlation coefficient ρ may be surprisingly high (e.g., for k=10, ρ is greater than 0.9, Table 9.1) (Arnold et al. 1992). We developed the correlated BT (CorBT) (Drezner and Drezner 2016) that incorporates the inherent correlation that exists among ordered data. The CorBT can be used in place of the BT, or it can be used sequentially in place of the SeqBT, as our proposed sequential correlated BT (SeqCorBT). When ρ = 0, the CorBT is equivalent to the BT, and the SeqCorBT is equivalent to the SeqBT. The correlation values ρ and the cut-off p-values for 3–10 significant results are given in Table 9.1. Note that for a different value of α, the critical values can be approximated by multiplying by $\frac {\alpha }{0.05}$. By adjusting for the natural correlation among p-values, much less conservative cut-offs result, but researchers can soundly correct for false positive results associated with multiple tests. For 10 significant (p = 0.05) results, the lowest significance level must be lower than 0.0212 to be significant using our SeqCorBT, compared to < 0.0051 with the BT or SeqBT tests. We provide a user-friendly spreadsheet at http://onlinelibrary.wiley.com/doi/10.1002/bes2.1214/suppinfo which is available for readers who wish to apply this technique to their own research.

Drezner and Drezner (2016) found that for s significant results

$$\displaystyle \begin{aligned} \rho\approx 1-\frac{1.329}s+\frac{6.396}{s\sqrt{s}}-\frac{12.646}{s^2}, \end{aligned} $$

(9.6)

with significance 3.5 × 10⁻¹⁶⁹ for the regression analysis.

Let θ(ρ, k) be the critical value for the kth significant result. If the smallest p-value of k significant results is less than θ(ρ, k), the kth null hypothesis can be rejected with significance α. It is shown in Drezner and Drezner (2016) that

$$\displaystyle \begin{aligned} \theta(\rho,k)=\frac \alpha k +\left(\alpha-\frac \alpha k \right)\rho^{\lambda(\rho,k)}, \end{aligned} $$

(9.7)

where λ(ρ, k) is

$$\displaystyle \begin{aligned} \lambda(\rho,k)\approx 3.928+\frac{1}{1-\rho}\left(1.101-\frac{3.811}{k}+\frac{4.765}{k^2}\right)-(1-\rho)^2\left(3.009+\frac{1.783}{k}\right). \end{aligned} $$

(9.8)

9.5 Summary

We proposed new techniques in statistics and in genetic algorithms. We used biological principles to innovate new approaches in genetic algorithms that yielded improved solutions to optimization problems, finding improved best known results for multiple instances. These mimicked patterns in the natural world, including female choice of mates, as well as alpha male social structures. We also highlight inconsistencies between biological processes and their genetic algorithm counterparts.

Two other innovations in methodology and statistics include our development of the sequential correlated Bonferroni test which controls for false positive results that occur from running multiple statistical tests. It incorporates the correlation between significant p-values, thereby resulting in a less conservative filter. We also developed the statistical underpinnings of a new approach for estimating transition points (in species or any other defined population) between stages. Transition from one stage to the next is a natural part of life, yet it can be difficult to estimate, particularly in cases where only a few transitions occur in every measurement period. We confirmed the validity and applicability of this new approach demonstrating low standard errors and robust output.

Interacting with a researcher in a very different field resulted in unique problem solving opportunities. The intersection of a genetic algorithm researcher with a life scientist helped to expose inconsistencies and fuel new avenues of investigation, while the mathematical and statistical foundations offered by a mathematician helped to solidify novel approaches for scientists. Such free form and synergistic collaborations are too few in research, but offer great potential for new directions and perspectives.

References

Arnold, B. C., Balakrishnan, N., & Nagaraja, H. N. (1992). A first course in order statistics. New York: Wiley.
Google Scholar
Bonferroni, C. E. (1936). Teoria statistica delle classi e calcolo delle probabilita. Libreria internazionale Seeber.
Google Scholar
Brimberg, J., Hansen, P., Mladenović, N., & Taillard, E. (2000). Improvements and comparison of heuristics for solving the uncapacitated multisource Weber problem. Operations Research, 48, 444–460.
Article Google Scholar
Cabin, R. J., & Mitchell, R. J. (2000). To Bonferroni or not to Bonferroni: When and how are the questions. Bulletin of the Ecological Society of America, 81, 246–248.
Google Scholar
de Carvalho, S. A. Jr., & Rahmann, S. (2006). Microarray layout as a quadratic assignment problem. In D. Huson, O. Kohlbacher, A. Lupas, K. Nieselt, & A. Zell (Eds.), Proceedings of the German Conference on Bioinformatics (Vol. 83, pp. 11–20), Bonn: Gesellschaft für Informatik.
Google Scholar
Drezner, T., & Drezner, Z. (2006). Gender specific genetic algorithms. Information Systems and Operations Research, 44, 117–127.
Article Google Scholar
Drezner, T. D. (2003). Saguaro (Carnegiea gigantea, Cactaceae) age–height relationships and growth: The development of a general growth curve. American Journal of Botany, 90, 911–914.
Article Google Scholar
Drezner, T. D. (2008). Variation in age and height of onset of reproduction in the saguaro cactus (Carnegiea gigantea) in the Sonoran Desert. Plant Ecology, 194, 223–229.
Article Google Scholar
Drezner, T. D. (2013a). The paradoxical distribution of a shallow-rooted keystone species away from surface water, near the water-limited edge of its range in the Sonoran Desert: Seed-seedling conflicts. Acta Oecologica, 47, 81–84.
Article Google Scholar
Drezner, T. D. (2013b). Variability in reproductive effort of a keystone species: Age and height of branch establishment. Physical Geography, 34, 136–148.
Article Google Scholar
Drezner, T. D. (2014). The keystone saguaro (Carnegiea gigantea, Cactaceae): A review of its ecology, associations, reproduction, limits, and demographics. Plant Ecology, 215, 581–595.
Article Google Scholar
Drezner, T. D., Drezner, Z., & Balakrishnan, N. (2015). Estimating the transition of individuals between life stages. Environmetrics, 26, 526–533.
Article Google Scholar
Drezner, Z. (2015). The quadratic assignment problem. In G. Laporte, S. Nickel, & F. S. da Gama (Eds.), Location science (pp. 345–363). Chum: Springer.
Google Scholar
Drezner, Z., & Drezner, T. D. (2016). A remedy for the overzealous Bonferroni technique for multiple statistical tests. Bulletin of the Ecological Society of America, 97, 91–98.
Article Google Scholar
Drezner, Z., & Drezner, T. D. (2018). Biologically inspired parent selection in genetic algorithms (in review)
Google Scholar
Drezner, Z., & Drezner, T. D. (2019). The alpha male genetic algorithm. IMA Journal of Management Mathematics, 30, 37–50.
Google Scholar
Drezner, Z., & Marcoulides, G. A. (2003). A distance-based selection of parents in genetic algorithms. In M. G. C. Resende & J. P. de Sousa (Eds.), Metaheuristics: Computer decision-making (pp. 257–278). Boston: Kluwer Academic Publishers.
Chapter Google Scholar
Drezner, Z., & Misevičius, A. (2013). Enhancing the performance of hybrid genetic algorithms by differential improvement. Computers & Operations Research, 40, 1038–1046.
Article Google Scholar
Drezner, Z., Gelfand, R. J., & Drezner, T. D. (2019). Sensitivity of large scale facility location solutions. Journal of Supply Chain and Operations Management, 17(2).
Google Scholar
Edmands, S. (2007). Between a rock and a hard place: Evaluating the relative risks of inbreeding and outbreeding for conservation and management. Molecular Ecology, 16, 463–475.
Article Google Scholar
Farentinos, R. (1980). Sexual solicitation of subordinate males by female tassel-eared squirrels (Sciurus aberti). Journal of Mammalogy, 61, 337–341.
Article Google Scholar
Fenster, C. B., & Galloway, L. F. (2000). Inbreeding and outbreeding depression in natural populations of Chamaecrista fasciculata (Fabaceae). Conservation Biology, 14, 1406–1412.
Article Google Scholar
Freeman, S., Harrington, M., & Sharp, J. C. (2014). Biological science (2nd Canadian ed.). Toronto: Pearson.
Google Scholar
Galbany, J., Tung, J., Altmann, J., & Alberts, S. C. (2015). Canine length in wild male baboons: Maturation, aging and social dominance rank. PloS One, 10, e0126415.
Article Google Scholar
Gittman, R. K., Peterson, C. H., Currin, C. A., Joel Fodrie, F., Piehler, M. F., & Bruno, J. F. (2016). Living shorelines can enhance the nursery role of threatened estuarine habitats. Ecological Applications, 26, 249–263.
Article Google Scholar
Glover, F., & Laguna, M. (1997). Tabu search. Boston: Kluwer Academic Publishers.
Book Google Scholar
Haley, M. P., Deutsch, C. J., & Le Boeuf, B. J. (1994). Size, dominance and copulatory success in male northern elephant seals, Mirounga angustirostris. Animal Behaviour, 48, 1249–1260.
Article Google Scholar
Hansen, P., & Mladenović, N. (2001). Variable neighborhood search: Principles and applications. European Journal of Operational Research, 130, 449–467.
Article Google Scholar
Hoelzel, A. R., Le Boeuf, B. J., Reiter, J., & Campagna, C. (1999). Alpha-male paternity in elephant seals. Behavioral Ecology and Sociobiology, 46, 298–306.
Article Google Scholar
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65–70.
Google Scholar
Kent, A., Drezner, T. D., & Bello, R. (2018). Climate warming and the arrival of potentially invasive species into boreal forest and tundra in the Hudson Bay Lowlands, Canada. Polar Biology, 41, 2007–2022.
Article Google Scholar
Kirkpatrick, S., Gelat, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220, 671–680.
Article Google Scholar
Lukas, D., & Clutton-Brock, T. (2014). Costs of mating competition limit male lifetime breeding success in polygynous mammals. Proceedings of the Royal Society of London B: Biological Sciences, 281(1786), 20140418.
Article Google Scholar
McDermott, D. R., Chips, M. J., McGuirk, M., Armagost, F., DiRienzo, N., & Pruitt, J. N. (2014). Boldness is influenced by sublethal interactions with predators and is associated with successful harem infiltration in Madagascar hissing cockroaches. Behavioral Ecology and Sociobiology, 68, 425–435.
Article Google Scholar
McGraw, K. J., & Ardia, D. R. (2003). Carotenoids, immunocompetence, and the information content of sexual colors: An experimental test. The American Naturalist, 162(6), 704–712.
Article Google Scholar
Mendel, G. (1866). Versuche über pflanzenhybriden. Verhandlungen des naturforschenden Vereines in Brunn 43, 44.
Google Scholar
Miller, R. G. (1981). Simultaneous statistical inference. New York: McGraw Hill.
Book Google Scholar
Moran, M. D. (2003). Arguments for rejecting the sequential Bonferroni in ecological studies. Oikos, 100, 403–405.
Article Google Scholar
Polak, M. (2006). Booming activity of male Bitterns Botaurus stellaris in relation to reproductive cycle and harem size. Ornis Fennica, 83, 27–33.
Google Scholar
Reinelt, G. (1991). TSLIB a traveling salesman library. ORSA Journal on Computing, 3, 376–384.
Article Google Scholar
Rice, W. R. (1989). Analyzing tables of statistical tests. Evolution, 43, 223–225.
Article Google Scholar
Rosser, A. M. (1992). Resource distribution, density, and determinants of mate access in puku. Behavioral Ecology, 3, 13–24.
Article Google Scholar
Snyder, M. R., & Stepien, C. A. (2017). Genetic patterns across an invasion’s history: A test of change versus stasis for the Eurasian round goby in North America. Molecular Ecology, 26, 1075–1090.
Article Google Scholar
Solomon-Lane, T. K., Willis, M. C., Pradhan, D. S., & Grober, M. S. (2014). Female, but not male, agonistic behaviour is associated with male reproductive success in stable bluebanded goby (Lythrypnus dalli) hierarchies. Behaviour, 151, 1367–1387.
Article Google Scholar
Steenbergh, W. F., & Lowe, C. H. (1983). Ecology of the saguaro III: Growth and demography. Washington: National Park Service.
Google Scholar
Whiteley, A. R., Fitzpatrick, S. W., Funk, W. C., & Tallmon, D. A. (2015). Genetic rescue to the rescue. Trends in Ecology & Evolution, 30(1), 42–49.
Article Google Scholar
Wolter, R., Pantel, N., Stefanski, V., Möstl, E., & Krueger, K. (2014). The role of an alpha animal in changing environmental conditions. Physiology & Behavior, 133, 236–243.
Article Google Scholar
Yeaton, R. I., Karban, R., & Wagner, H. B. (1980). Morphological growth patterns of saguaro (Carnegiea gigantea: Cactaceae) on flats and slopes in Organ Pipe Cactus National Monument, Arizona. The Southwestern Naturalist, 25, 339–349.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Geography, York University, Toronto, ON, Canada
Taly Dawn Drezner

Authors

Taly Dawn Drezner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Taly Dawn Drezner .

Editor information

Editors and Affiliations

Faculty of Management, University of New Brunswick, Fredericton, NB, Canada
H. A. Eiselt
Department of Electrical Engineering, Pontificia Universidad Católica de Chile Complex Engineering Systems Institute, Santiago, Chile
Vladimir Marianov

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Drezner, T.D. (2019). Innovations in Statistical Analysis and Genetic Algorithms. In: Eiselt, H., Marianov, V. (eds) Contributions to Location Analysis. International Series in Operations Research & Management Science, vol 281. Springer, Cham. https://doi.org/10.1007/978-3-030-19111-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-19111-5_9
Published: 08 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19110-8
Online ISBN: 978-3-030-19111-5
eBook Packages: Economics and FinanceEconomics and Finance (R0)

Publish with us

Policies and ethics

Innovations in Statistical Analysis and Genetic Algorithms

Abstract

Similar content being viewed by others

The arithmetic mean of what? A Cautionary Tale about the Use of the Geometric Mean as a Measure of Fitness

“The Theory was Beautiful Indeed”: Rise, Fall and Circulation of Maximizing Methods in Population Genetics (1930–1980)

Models in the Biological Sciences

9.1 Opening: Background and a Role Model

9.2 Overview

9.3 Innovations in Genetic Algorithms

9.3.1 Biological Background

9.3.2 The Female Choice Approach

9.3.3 The Alpha Male Approach

9.3.4 Genetic Algorithm Terms and Parallel Biological Principles

9.4 Innovative Statistical Analysis

9.4.1 Estimating Transition Between Stages

9.4.2 The Correlated Bonferroni Technique

9.5 Summary

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Innovations in Statistical Analysis and Genetic Algorithms

Abstract

Similar content being viewed by others

The arithmetic mean of what? A Cautionary Tale about the Use of the Geometric Mean as a Measure of Fitness

“The Theory was Beautiful Indeed”: Rise, Fall and Circulation of Maximizing Methods in Population Genetics (1930–1980)

Models in the Biological Sciences

9.1 Opening: Background and a Role Model

9.2 Overview

9.3 Innovations in Genetic Algorithms

9.3.1 Biological Background

9.3.2 The Female Choice Approach

9.3.3 The Alpha Male Approach

9.3.4 Genetic Algorithm Terms and Parallel Biological Principles

9.4 Innovative Statistical Analysis

9.4.1 Estimating Transition Between Stages

9.4.2 The Correlated Bonferroni Technique

9.5 Summary

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation