Schema Analysis in Tree-Based Genetic Programming

Burlacu, Bogdan; Affenzeller, Michael; Kommenda, Michael; Kronberger, Gabriel; Winkler, Stephan

doi:10.1007/978-3-319-90512-9_2

Schema Analysis in Tree-Based Genetic Programming

Bogdan Burlacu^7,8,
Michael Affenzeller^7,8,
Michael Kommenda^7,8,
Gabriel Kronberger⁹ &
…
Stephan Winkler^7,8

Conference paper
First Online: 06 July 2018

557 Accesses
2 Citations

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

Abstract

In this chapter we adopt the concept of schemata from schema theory and use it to analyze population dynamics in genetic programming for symbolic regression. We define schemata as tree-based wildcard patterns and we empirically measure their frequencies in the population at each generation. Our methodology consists of two steps: in the first step we generate schemata based on genealogical information about crossover parents and their offspring, according to several possible schema definitions inspired from existing literature. In the second step, we calculate the matching individuals for each schema using a tree pattern matching algorithm. We test our approach on different problem instances and algorithmic flavors and we investigate the effects of different selection mechanisms on the identified schemata and their frequencies.

Download conference paper PDF

2.1 Introduction

2.1.1 Diversity and Evolutionary Dynamics

“Evolutionary dynamics” is an often-encountered expression in genetic programming (GP) research. It refers to changes within the population, such as quality and size distribution [15], genotype-phenotype maps and neutral networks [4, 9], diversity [5, 6], modularity and building blocks [11, 25], bloat [18], evolvability [2, 22] or emergent phenomena [3].

The dynamics of the population are uniquely influenced by the interplay between selection and recombination operators (crossover, mutation), as well as specific parameterizations and problem instances. As a biologically-inspired process, GP is able to deal with noisy data, multiple local optima, non-smooth objective functions, while also critically depending on genetic diversity in order to evolve the solution candidates towards the given goal. Population diversity at both the genotypic and phenotypic level remains one of the main focus points for GP research.

In this work we analyze population diversity looking at the distribution of solution candidates into subsets that belong to the same schema or structural template. We define such templates as rooted trees containing wildcard node symbols in their structure. Additionally, we describe population convergence via schema frequency curves over the evolutionary run.

2.1.2 Genetic Programming Schemata

The study of schema theorems began with John Holland’s work on providing a mathematical justification for the performance of genetic algorithms. The canonical version of a genetic algorithm (Holland [8]) used a fixed-length binary string encoding where each bit took a value from the set {0, 1}. Holland then defined schemata (or schemata) as binary string templates with symbols from the set {0, 1, ∗}, where ∗ represents a wildcard symbol that can be matched by either a 0 or a 1.

The fixed length schemata, each equivalent to a hyperplane in the search space, represented suitable theoretical instruments for the analysis of genetic operators and their effects on the distribution of solution candidates along the hyperplanes, in relation with the average fitness of the population. Holland’s schema theory states that the number of low-order, low defining-length schemata with above average fitness increases exponentially between successive generations, where:

A schema’s order is given by the number of fixed positions in the binary string
The defining length is given by the distance between the first and last fixed positions in the binary string
Schema average fitness is the average fitness if its matching individuals

In this context, low-order, low-defining-length schemata are seen as building blocks, structural patterns increasingly sampled by selection and used by the genetic algorithm to assemble better and better solutions.

It was later shown by Poli [13] that Holland’s findings are also valid in the context of GP, with some small provisions: “building blocks in GP and variable-size GAs with one-point crossover exist, but they are not necessarily all short, low-order or highly fit”. Schema theorems for GP are complicated by the variable-length tree encoding, requiring mathematical formulations for the expected schema frequencies to also account for the size variation of individuals under the action of selection, crossover and mutation. Several schema definitions dealing with these issues were proposed in the literature [24].

Despite significant progress in the last couple of decades, leading to exact formulations of the expected number of individuals sampling a schema at the next generation [16, 17], “large gaps remain between GP theory and practice”, due to the large number of schema equations in typical GP populations, and the “large number of terms growing proportionally to the square of the number of program shapes times the square of the number of possible crossover points” [19]. Thus, from a practical perspective, the application of schema theory on concrete algorithms and problem instances remains problematic.

In this work we attempt to close the gap between schemata as theoretical instruments for the analysis of population dynamics and their role in empirical investigations and we introduce a practical methodology to identify GP schemata and compute their frequencies. We consider the hyperschema definition by Poli et al. [12], where a schema is a rooted tree template that may include two types of wildcard symbols:

The ‘= ’ symbol matches any valid node of the same type and arity
The ‘#’ symbol matches any valid subtree

Figure 2.1 shows an example of a hyperschema and matching trees. We notice that the # symbol can match both leaf and function nodes, while the = symbol only matches nodes of the same type (a function node and a leaf node, respectively, for the two occurrences below).

Originally, the set of wildcards in Poli’s hyperschema was chosen in such a way as to make it easier to evaluate the effects of the genetic operators on schema frequencies and to enable a more concise mathematical representation of the schema equations. Our proposed methodology is not bound by such considerations and supports different schema structures, containing either or both wildcard symbols.

The remainder of the chapter is organized as follows: Sect. 2.2 describes our methodology for schema generation and matching, Sect. 2.3 gives details about our empirical experiments, Sect. 2.4 shows the obtained results and Sect. 2.5 discusses some final conclusions.

2.2 Methodology

We construct relevant schemata using hereditary relationships between crossover parents and their offspring. The schemata may include wildcard symbols from the set {=, #} and are matched against the population of solution candidates using a pattern matching algorithm adapted from the field of XML query matching. Since GP schemata represent a more restricted instance of wildcard query matching, we adapt the algorithm’s implementation with additional constraints. The two steps, schema generation and schema matching, are described in more detail below. The methodology was implemented in HeuristicLab [23].

2.2.1 Schema Generation

Conceptually, we expand on the idea by Stephens and Waelbroek [20] that “at the level of the microscopic degrees of freedom, the strings, the action of crossover by its very nature introduces the notion of a schema.”

The schema generation algorithm tries to exploit the fact that structural similarity is passed on (to various degrees) from parents to their offspring via the crossover operation. Additionally, it is assumed that successful individuals selected for reproduction will participate as root parents^{Footnote 1} in multiple crossover operations. In these circumstances, we can generate schemata from crossover root parents by considering crossover cutpoints as potential candidates for wildcard placement. We arrive at the following heuristic:

1.
Group individuals based on their common root parent
2.
Identify all genetic fragments and their respective positions in the root parent
3.
For each fragment f with preorder index f _i in the root parent, replace the node at position f _i with a wildcard.

The heuristic is controlled by a minimum schema length parameter which limits wildcard placement in order to avoid the creation of ‘match-all’ schemata (schemata that contain wildcards in the tree root or in its close proximity). The method is listed as pseudocode in Algorithm 2.1.

Since wildcards are inserted at cutpoint locations, the structure of the generated schemata is influenced indirectly by the selection pressure applied on the population, which determines the multiplicity of root parent individuals (how many times each individual participates in crossover as a root parent) and therefore the number of wildcards. Intuitively, the method will generate more general schemata under high selection pressure and more specific schemata (containing fewer wildcards) under lower selection pressure. The algorithm can generate different kinds of schemata (according to the schema definitions in the literature, for an excellent summary see [24]), depending on the kind of wildcard symbols used for replacement.

2.2.2 Schema Matching

The schema matching part of our methodology is based on the algorithm for the tree homeomorphism decision problem by Götz et al. [7], which tries to find a non-injective mapping between every parent-child pair in a query tree Q (the schema) and corresponding ancestor-descendant pairs in data tree D (the matched individual). Such a situation is shown in Fig. 2.2 where, according to the algorithm, the query tree Q is matched by the data tree D. The algorithm runs in O(|D|⋅|Q|⋅ depth(Q)) time using a stack of depth bounded by O(depth(D) ⋅ branch(D)).

We notice that the algorithm in its default implementation does not enforce strict enough matching rules as required by schema matching, since tree nodes are matched from the bottom up if they have the same label, without additional considerations for their depth in the tree (relative to the root node). Therefore we added additional rules in our implementation, to make sure two nodes are only matched if they are on the same level in the tree and their parent and children nodes are matched as well. Another important detail is the matching of commutative symbols, in which case the algorithm does not consider the order of the child subtrees (internally, a sorting is performed). For example, a schema x (in postfix notation) will be matched by an individual x y because the + symbol is commutative, despite the fact that the x symbol is found at different positions in the argument order.

2.3 Experimental Setup

We compared the evolution of schema frequencies between two algorithmic variants: standard genetic programming (SGP) [10] and genetic programming with strict offspring selection (OSGP) [1]. The difference between the two algorithms consists of an extra selection step enforced by OSGP on the generated offspring, such that offspring get rejected if they do not fulfil certain performance criteria. In effect, the extra selection step concentrates the algorithm’s efforts on generating adaptive changes (that do not decrease fitness), making it possible for less fit individuals to participate as parents if they can produce children fitter than themselves, while high fitness individuals might not contribute if they cannot be improved.

Each problem and algorithm configuration was repeated for a number of 20 runs, from which a single representative run was selected based on best performance on the training data. This final run selection step was necessary for clarity and space reasons, as the slight differences between runs (particularly at the genotypic level) make it impossible for schemata generated from one population genealogy to be applied to another population genealogy.

2.3.1 Algorithm Parameters

We applied our schema generation and matching methodology at each generation on the whole population of solution candidates.^{Footnote 2} The parameterizations for the two algorithms are presented in Tables 2.1 and 2.2.

Table 2.1 SGP configuration

Full size table

Table 2.2 OSGP configuration

Full size table

The OSGP algorithm uses the same primitive set, tree depth and size limits and crossover and mutation operators as SGP, with differences in population size, stopping criteria and selection mechanism.

2.3.2 Problem Instances

For this experiment, we selected one symbolic regression benchmark problem that facilitates discernible genotypic representations of solutions, in order to more easily observe solution fragments or building blocks contained by the schemata. We used the Poly-10 [14] synthetic symbolic regression benchmark, where the goal is to find the target function:

$$\displaystyle \begin{aligned} f(\mathbf{x})=x_1 x_2 + x_3 x_4 + x_5 x_6 + x_1 x_7 x_9 + x_3 x_6 x_{10} \end{aligned} $$

(2.1)

For the second test problem we used the Tower dataset [21], containing real-world data in the form of gas chromatography measurements of the composition of a distillation tower.

2.3.3 Analysis Methods

We perform our analysis a posteriori with the help of a complete genealogical record of the algorithmic run. We generate a set of potential schemata from the population at each generation, and match it against the whole genealogy in order to determine the evolution of schema frequencies over time.

As diversity loss in the course of the evolutionary process reflects itself in the set of schemata obtained each generation (which can contain duplicates or can repeat structures obtained in previous generations), we additionally perform filtering based on the schema frequency curves. If two schemata have highly correlated frequency curves (with a Pearson’s R ² correlation coefficient value > 0.99), one of them is removed from the set of all schemata.

2.4 Empirical Results

We prefix each tested configuration with the name of the algorithm, followed by distinctive parameters such as the selection mechanism and maximum tree length, and then followed by the problem name. For example, the name SGP-P-25-Poly10 denotes a standard GP run with proportional selection and a maximum tree length of 25, while SGP-T-25-Poly10 denodes the same configuration with tournament selection instead.

When discussing schema frequencies, we use the notation S _1,P…S _10,P for schemata generated by SGP with proportional selection, and S _1,T…S _10,T for SGP with tournament selection. For OSGP, we use the notation S _1,G…S _10,G to denote the 10 most common schemata. To keep a concise notation, we repeat the same notation in each section corresponding to each tested problem.

2.4.1 Standard GP

2.4.1.1 Poly-10 Problem

We first look at the convergence of SGP-P-25-Poly10. At the structural level, convergence should manifest itself as an increased occurrence count of repeated patterns in the population. Table 2.3 shows the most frequent schemata found in the last generation, represented in postfix notation. The notation S _1,P…S _10,P in the first column of the table is used to designate the schemata obtained in the SGP run with proportional selection.

Table 2.3 SGP-P-25-Poly10: most common schemata in the last generation

Full size table

We notice that some schemata (for example, S _1,P and S _2,P, as well as S _3,P and S _4,P) share a degree of structural similarity. A closer look at their respective frequency curves (not detailed here for space reasons) reveals that:

The frequency curves for S _1,P and S _2,P are highly correlated (R ² = 0.962), however S _2,P represents a more specific template which matches fewer individuals at each generation.
The frequency curves for S _3,P and S _4,P are correlated (R ² = 0.916). In this case S _4,P represents the slightly more specific template, matching fewer individuals than S _3,P.
The frequency curves for S _7,P and S _10,P are correlated (R ² = 0.907), with S ₁₀ being the slightly more specific schema.

The fact that we obtained similar and frequency-correlated schemata via our crossover-based generation procedure indicates the presence of similar parent individuals in the population, suggesting loss of diversity. We focus on the most relevant schemata (S _1,P, S _3,P and S _7,P) and show their frequency evolution in Fig. 2.3.

The frequency curves show the moments when the algorithm was able to discover parts of the formula such as x ₁ x ₂, x ₃ x ₄ and x ₅ x ₆. The schemata sharply increase their frequency in the population in the beginning of the run and then vary according to the internal dynamics of the evolutionary search (competition between schemata, stagnation in the later stages).

From a diversity perspective, the schema frequency approach has the ability to identify high level similarities in the population (e.g., when 30% of the population share the same genetic template) that would otherwise be hard to notice with conventional metrics like tree distances.

The results so far confirm that schemata identified by our method correspond to what could be considered as building blocks for this problem, including in their structure the terms of the formula and showing an exponential increase in frequency from the moment of their occurrence.

We calculate schema average quality (as the average quality of their matching individuals) and show the results in Fig. 2.4. The quality curves suggest that the identified schemata are of above-average quality.

A similar situation can be observed for SGP with tournament selection, where several frequent schemata are present in the last generation and shown in Table 2.4. We notice that the top four schemata are matching relatively high proportions of the population and show their detailed frequency evolution in Fig. 2.5.

Table 2.4 SGP-T-25-Poly10: most common schemata in the last generation

Full size table

The generated schemata correspond to solution building blocks, containing terms of the target formula. Compared to proportional selection, the extra selection pressure applied on the population by the tournament selection (with a group size of 5) leads to larger schemata.

The observed schema frequency evolutions for SGP with proportional and tournament selection support the idea that relevant schemata increase in frequency over the generations.

Quality measurements in Fig. 2.6 show a significant difference between the average quality of the population and the average schema qualities. The discontinued line segments in this figure correspond to generations when the schema frequency dropped to zero, therefore an average quality could not be calculated. The results suggest that tournament selection (applying higher pressure on the population) promotes higher quality schemata.

2.4.1.2 Tower Problem

We compare the two standard GP configurations using proportional and tournament selection, denoted SGP-P-25-Tower and SGP-T-25-Tower. The most common schemata for the SGP variant with proportional selection are given in Table 2.5.

Table 2.5 SGP-P-25-Tower: most common schemata in the last generation

Full size table

The obtained templates have low length and only include 2 out of 25 input variables, with the most common schema matching 15% of the population in the last generation. This result suggests that the two variables x ₁ and x ₆ are more relevant (in terms of the implicit variable ranking performed by GP) for the modeling of the target. In terms of quality, the produced symbolic regression solution achieved a Pearson’s R ² correlation with the target variable of 0.8. As with the previous problem, we plot the evolution of schema frequencies, using a correlation-based filtering step to eliminate similar curves. The Pearson’s R ² correlation values for S _1,P…S _10,P show that:

S _1,P is highly correlated with S _4,P and S _5,P
S _2,P is highly correlated with S _6,P
S _8,P is highly correlated with S _9,P

The frequencies of the remaining schemata are shown in Fig. 2.7, while their qualities, along with the average quality of the population are shown in Fig. 2.8. We see that S _2,P becomes frequent rather early and is overall more frequent that S _1,P, while the latter has a marginally higher frequency in the last generation. Quality-wise, S _1,P and S _2,P are clearly above the average of the population, while S _7,P and S _8,P occasionally dip below the average.

Tournament selection determines the evolution of more complex schemata. The ten most frequent schemata in the last generation shown in Table 2.6 are larger in size, match more individuals and contain more variables from the dataset.

Table 2.6 SGP-T-25-Tower: most common schemata in the last generation

Full size table

Correlation analysis of the frequency curves reveals that:

S _1,T is highly correlated with S _2,T, S _3,T and S _4,T with an R ² value of 0.96.
S _7,T is highly correlated with S _10,T with an R ² value of 0.95

Filtering correlated schemata, we display the remaining schema frequency curves in Fig. 2.9. Interestingly S _1,T, the most common schema in the last generation has a noticeably lower average quality compared to S _5,T, S _6,T and S _7,T, although it still manages to rise above the average population quality, as seen in Fig. 2.10.

2.4.2 Offspring Selection GP

As previously mentioned, OSGP implements an additional selection step which decides if the offspring produced by mutation and crossover are accepted into the next generation. We analyze the influence of offspring selection on the generated schemata and their frequencies.

2.4.2.1 Poly-10 Problem

Surprisingly, schema frequencies in the last generation show that only two out of all the generated schemata managed to survive. Furthermore, these two schemata represent a very similar genotypic template which managed to propagate itself to all of the individuals in the population. The two schemata are displayed in Table 2.7. This result shows that it is entirely possible under strict offspring selection for the algorithm to converge to a single genetic template.

Table 2.7 OSGP-25-Poly10: most common schemata in the last generation

Full size table

Since we only have two schemata in the last generation, we investigate the evolution of schema frequencies using a different strategy: we rank the schemata based on their overall frequency, that is, the average of their individual frequencies in each generation. The new ranking is shown in Table 2.8, where the frequency represents an average of the schema frequency over all generations.

Table 2.8 OSGP-25-Poly10: most common schemata overall

Full size table

Several of the schemata from Table 2.8 match the same individuals and have highly correlated frequency curves. These schemata were filtered from Fig. 2.11 to eliminate clutter. The figure shows multiple schemata (S _3,G, S _6,G and S _8,G) proliferating in the population in the earlier generations of the run, only to become extinct later.

After generation 38, the two most frequent schemata in the last generation, S _1,G and S _2,G have overlapping frequency curves, suggesting that S _2,G has a higher degree of specificity, presumably due to the lack of ‘#’ wildcard symbols in its structure.

2.4.2.2 Tower Problem

We notice a similar behavior for the Tower problem, where a single schema denoted as S _1,G matches all the individuals in the last generation:

C X1 C X12 X6 X23 X22 # C X5 X1 C

Like before, we consider in this situation the most frequent schemata overall, shown in Table 2.9.

Table 2.9 OSGP-25-Tower: most common schemata overall

Full size table

Figure 2.12 shows the evolution of schema frequencies for the top three most frequent schemata from Table 2.9. We see schema S _1,G rising in frequency after generation 20 and driving other schemata to extinction.

Compared to SGP, the schemata obtained by OSGP and their frequency evolution suggests a more pronounced loss of diversity as the population becomes dominated by a single schema.

2.5 Conclusion

We described in this chapter a practical approach for performing schema analysis on GP populations, considering a well-known schema definition (Poli’s hyperschema) that uses two types of wildcard symbols for function and leaf nodes, respectively. The methodology can be easily extended to include different schema definitions or stricter matching rules.

Hyperschema are generated algorithmically by taking into account genealogical information about crossover offspring and their respective parents. A pattern matching algorithm is then used to match schemata against the GP population at each generation.

We tested our methodology using two test problems (Poly-10 and Tower) and two algorithmic variants: Standard GP and Offspring Selection GP. The results validate our approach: the identified schemata for each test problem are of increasing frequency in the population and above-average quality. Compared to other methods for measuring genotypic diversity, our schema-based approach offers a detailed picture of the propagation of repeated patterns, while also being able to identify these patterns.

The evolution of schema frequencies suggests that diversity loss starts to occur early in the evolutionary run and tends to homogenize the genotypic structure of the population. As expected, this phenomenon is highly influenced by the selection mechanism. For both problems, the SGP runs using tournament selection displayed lengthier, more frequent and more specific schemata. Offspring selection determines even more drastic effects, as the population shares a single (and rather specific) genetic template.

Future research in this direction will focus on a more detailed analysis of population dynamics where we also consider schema disruption events. The approach can also be employed online to guide the evolutionary process, for example by avoiding loss of diversity via localized mutation rates within frequent schemata.

Notes

1.
The terms root parent and non-root parent refer to the two parents involved in a crossover operation: the root parent passes on to the child its entire rooted tree structure, with the exception of the subtree swapped by crossover at an arbitrary location (called a cutpoint) from the non-root parent.
2.
To maintain low computational times, certain compromises had to be made in terms of population size and number of generations.

References

Affenzeller, M., Winkler, S., Wagner, S., Beham, A.: Genetic Algorithms and Genetic Programming: Modern Concepts and Practical Applications. Numerical Insights. CRC Press, Singapore (2009)
Book Google Scholar
Altenberg, L., et al.: The evolution of evolvability in genetic programming. Advances in genetic programming 3, 47–74 (1994)
Google Scholar
Banzhaf, W.: Genetic programming and emergence. Genetic Programming and Evolvable Machines 15(1), 63–73 (2014). https://doi.org/10.1007/s10710-013-9196-7
Article Google Scholar
Banzhaf, W., Leier, A.: Evolution on neutral networks in genetic programming. In: Genetic programming theory and practice III, pp. 207–221. Springer (2006)
Google Scholar
Burke, E., Gustafson, S., Kendall, G.: A survey and analysis of diversity measures in genetic programming. In: Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation, pp. 716–723. Morgan Kaufmann Publishers Inc. (2002)
Google Scholar
Burke, E.K., Gustafson, S., Kendall, G.: Diversity in genetic programming: An analysis of measures and correlation with fitness. IEEE Transactions on Evolutionary Computation 8(1), 47–62 (2004)
Article Google Scholar
Götz, M., Koch, C., Martens, W.: Efficient algorithms for descendant-only tree pattern queries. Inf. Syst. 34(7), 602–623 (2009). https://doi.org/10.1016/j.is.2009.03.010
Article Google Scholar
Holland, J.H.: Adaptation in Natural and Artificial Systems. The University of Michigan Press (1975)
Google Scholar
Hu, T., Banzhaf, W., Moore, J.H.: Population Exploration on Genotype Networks in Genetic Programming. In: Proceedings of the 13th International Conference on Parallel Problem Solving from Nature – PPSN XIII, 2014, pp. 424–433. Springer International Publishing, Cham (2014)
Google Scholar
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA (1992)
MATH Google Scholar
Krawiec, K., Wieloch, B.: Functional modularity for genetic programming. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO ’09, pp. 995–1002. ACM, New York, NY, USA (2009). http://doi.acm.org/10.1145/1569901.1570037
Poli, R.: Hyperschema theory for gp with one-point crossover, building blocks, and some new results in ga theory. In: Genetic Programming, Proceedings of EuroGP 2000, pp. 15–16. Springer-Verlag (2000)
Google Scholar
Poli, R.: Exact schema theory for genetic programming and variable-length genetic algorithms with one-point crossover. Genetic Programming and Evolvable Machines 2(2), 123–163 (2001). https://doi.org/10.1023/A:1011552313821
Article Google Scholar
Poli, R.: A simple but theoretically-motivated method to control bloat in genetic programming. In: Proceedings of the 6th European Conference on Genetic Programming, EuroGP’03, pp. 204–217. Springer-Verlag, Berlin, Heidelberg (2003). http://dl.acm.org/citation.cfm?id=1762668.1762688
MATH Google Scholar
Poli, R., Langdon, W.B., Dignum, S.: Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat. In: in GECCO 2007: Proceedings of the 9th Annual Conference on Genetic and Evolutionary, pp. 1588–1595. ACM Press (2007)
Google Scholar
Poli, R., McPhee, N.F.: General schema theory for genetic programming with subtree-swapping crossover: Part I. Evolutionary Computation 11(1), 53–66 (2003).
Article Google Scholar
Poli, R., McPhee, N.F.: General schema theory for genetic programming with subtree-swapping crossover: Part II. Evolutionary Computation 11(2), 169–206 (2003). https://doi.org/10.1162/106365603766646825
Article Google Scholar
Poli, R., McPhee, N.F.: Covariant parsimony pressure for genetic programming. In: GECCO 2008: Proceedings of the 10th annual conference on Genetic and Evolutionary Computation, pp. 1267–1274. ACM Press (2008)
Google Scholar
Poli, R., Vanneschi, L., Langdon, W.B., McPhee, N.F.: Theoretical results in genetic programming: The next ten years? Genetic Programming and Evolvable Machines 11(3–4), 285–320 (2010). http://dx.doi.org/10.1007/s10710-010-9110-5
Article Google Scholar
Stephens, C.R., Waelbroeck, H.: Effective degrees of freedom in genetic algorithms. Physical Review E 57(3), 3251–3264 (1998)
Article Google Scholar
Vladislavleva, E.J., Smits, G.F., Den Hertog, D.: Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. Evolutionary Computation, IEEE Transactions on 13(2), 333–349 (2009)
Article Google Scholar
Wagner, G.P., Altenberg, L.: Perspective: complex adaptations and the evolution of evolvability. Evolution 50, 967–976 (1996)
Article Google Scholar
Wagner, S., Kronberger, G., Beham, A., Kommenda, M., Scheibenpflug, A., Pitzer, E., Vonolfen, S., Kofler, M., Winkler, S.M., Dorfer, V., Affenzeller, M.: Architecture and design of the heuristiclab optimization environment. Advanced Methods and Applications in Computational Intelligence, Topics in Intelligent Engineering and Informatics 6, 197–261 (2013)
Article Google Scholar
White, D.: An overview of schema theory. Computing Research Repository CoRR abs/1401.2651 (2014). http://arxiv.org/abs/1401.2651
Woodward, J.R.: Modularity in Genetic Programming. Proc. of Genetic Programming: 6th European Conference, EuroGP 2003 Essex, pp. 254–263. Springer (2003). http://dx.doi.org/10.1007/3-540-36599-0_23
Google Scholar

Download references

Acknowledgements

The work described in this paper was done within the COMET Project Heuristic Optimization in Production and Logistics (HOPL), #843532 funded by the Austrian Research Promotion Agency (FFG).

Author information

Authors and Affiliations

Heuristic and Evolutionary Algorithms Laboratory, University of Applied Sciences Upper Austria, Hagenberg, Austria
Bogdan Burlacu, Michael Affenzeller, Michael Kommenda & Stephan Winkler
Institute for Formal Models and Verification, Johannes Kepler University, Linz, Austria
Bogdan Burlacu, Michael Affenzeller, Michael Kommenda & Stephan Winkler
Heuristic and Evolutionary Algorithms Laboratory, University of Applied Sciences Upper Austria, Hagenberg, Austria
Gabriel Kronberger

Authors

Bogdan Burlacu
View author publications
You can also search for this author in PubMed Google Scholar
Michael Affenzeller
View author publications
You can also search for this author in PubMed Google Scholar
Michael Kommenda
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Kronberger
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Winkler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bogdan Burlacu .

Editor information

Editors and Affiliations

BEACON Center for the Study of Evolution in Action and Department of Computer Science, Michigan State University, East Lansing, Michigan, USA
Wolfgang Banzhaf
Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Randal S. Olson
Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
William Tozier
Center for the Study of Complex Systems, University of Michigan, Ann Arbor, Michigan, USA
Rick Riolo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Burlacu, B., Affenzeller, M., Kommenda, M., Kronberger, G., Winkler, S. (2018). Schema Analysis in Tree-Based Genetic Programming. In: Banzhaf, W., Olson, R., Tozier, W., Riolo, R. (eds) Genetic Programming Theory and Practice XV. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-319-90512-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-90512-9_2
Published: 06 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90511-2
Online ISBN: 978-3-319-90512-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

2.1 Introduction

2.1.1 Diversity and Evolutionary Dynamics

2.1.2 Genetic Programming Schemata

2.2 Methodology

2.2.1 Schema Generation

2.2.2 Schema Matching

2.3 Experimental Setup

2.3.1 Algorithm Parameters

2.3.2 Problem Instances

2.3.3 Analysis Methods

2.4 Empirical Results

2.4.1 Standard GP

2.4.1.1 Poly-10 Problem

2.4.1.2 Tower Problem

2.4.2 Offspring Selection GP

2.4.2.1 Poly-10 Problem

2.4.2.2 Tower Problem

2.5 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation