In this paper, we suggest treestructureaware GP (Genetic Programming) operators that heed tree distributions in structure space and their possible structural difficulties. The main idea of the proposed GP operators is to place the generated offspring of crossover and/or mutation in a specified region of tree structure space insofar as possible by biasing the tree structures of the altered subtrees, taking into account the observation that most solutions are found in that region.
To demonstrate the effectiveness of the proposed approach, experiments on the binomial3 regression, multiplexor and even parity problems are performed. The results show that the results using the proposed treestructureaware operators are superior to the results of standard GP for all three test problems in both success rate and number of evaluations.
1. Introduction
GP (Genetic Programming)
[1
,
2]
is an extension of the GA (Genetic Algorithm)
[3]
, using evolution to optimize actual computer programs or algorithms to solve some task, typically involving a treetype representation. The tree representation of GP chromosomes, as compared with the string representation typically used in GA, gives GP more flexibility to encode solution representations for many realworld design and optimization applications
[4

6]
.
The recombination operator in GP is regarded as a main driving force for the success of a GP run. Many recombination operators have been proposed to improve its performance. These approaches include context preserving
[7]
, contextaware crossover
[7]
, GP uniform crossover
[8]
and, depthdependent crossover
[9]
. These approaches are mainly to minimize destructive effects of standard crossover, or preserve the position of genetic material in the genotype, and/or to accumulate building blocks. Although it is very important to preserve or minimize destruction of genetic material, there is no known implicit direction to favor.
The regions of a tree structure’s space by a distribution of tree shapes has been investigated
[10]
. It is shown that tree structure can have a substantial impact on determining problem difficulty in standard GP, and most solutions in standard GP are found in a specific region of the space of tree structures
[10]
.
We suggest new recombination operators using the above idea of tree structure space. The proposed GP operators are to place the generated offspring of crossover and/or mutation in a specified region of tree structure space insofar as possible, based on the observation that most solutions are found in that region. To enable that, the proposed operators are designed to utilize information about the region to which the parents belong and node/ depth statistics of the subtree selected for modification.
2. Structural Difficulty in GP
It was shown that structure alone can pose great difficulty for a standard GP search
[10

12]
. In particular, they classified four regions of the search space of tree structures, as shown in
Fig. 1
.
Node and depth regions  Reprinted with permission from [10]
The regions of tree structures are summarized as follows, classified according to number of nodes and tree depth. There exist at least four distinct regions for depths 026. Region I contains most solutions in standard GP. Far fewer individuals than in Region I appear in Regions II (IIa and IIb). Only partial mixing of size / shape subtrees occurs here, with mixing becoming nonexistent towards the boundaries furthest away from Region I. Region III (IIIa and IIIb) is a place where even fewer individuals typically appear. Regions IVa and IVb are regions that are structurally precluded from binary trees.
3. Treestructure Based Aware Operators
The main idea of the proposed treestructurebased aware GP operator is to place generated offspring(s) resulting from crossover and/or mutation into region I as much as possible, using the observation of tree structure as guidance in specifying the shape of genetic material to be introduced
[10
,
11]
. Because most solutions of test problems found by standard GP search occur in region I, it is natural to say that it is easy for GP to find solutions to problems which exist in region I. Furthermore, biasing the operators to put generated offspring into region I seems to be plausible strategy to improve search capability in standard treebased GP.
First, from the parent’s region information to which a parent belongs, we can determine whether the offspring should move in some particular direction in shape space or stay in the current region, based on the idea that most solutions are found in region I. Second, the node/depth ratio of the chosen subtree is used to generate an offspring as close as desired to a specified structure by identifying a structurally appropriate subtree of parent2 for subtree swapping.
The treestructurebased aware crossover algorithm is described in
Fig. 2
. Two parents are selected by roulette wheel selection and the region of each parent is examined. A subtree of parent1 is chosen at random, and a subtree of parent2 is chosen for substitution according to the node/depth ratio of the subtree of parent1 and the regions of each parent.
Treestructureaware crossover
Treestructureaware mutation
For example, if the region of parent1 is lower than the region of parent2, a subtree of parent2 is chosen among possible candidates which have a smaller node/depth ratio than that of the subtree of parent1. This operation generates a child which has high probability of being in region I. In the other case, a subtree of parent2 is chosen among candidates which have larger node/depth ratios than that of the subtree of parent1. Unlike standard crossover, offspring are obtained from crossover of two parents, because it is difficult to control the shape of both offspring by the subtree swapping operation in the proposed crossover. Therefore, offspring from the proposed crossover is generated one at a time.
The proposed mutation operator has a similar concept as of crossover. The major feature of the treestructurebased aware mutation is to control the replaced subtree by generation methods (grow, full, or half & half) considering the current shape of the parent tree.
4. Experiments & Results
 4.1 Experimental setup
The treestructurebased aware GP operators have been applied to three standard benchmark problems — binomial3 regression, multiplexor and even parity problem — and compared with standard GP operators and their combinations. In all experiments, 20 runs were executed and the number of evaluations and hit ratio or success rate of solutions found are recorded for standard crossover & mutation and the proposed crossover & mutation. We used lilGP
[13]
for the standard GP runs, and modified it heavily to implement the new crossover and mutation operators. The GP parameters were as shown below.

• Number of generations : 500 for binomial3, 2000 to 10000 for multiplexor, and 100 to 7000 for evenparity

• Population sizes : 500

• Initial population: half_and_half

• Initial depth : 26

• Max depth : 17, (25 for 11multiplexor)

• Selection : Tournament (size=7) for standard roulette wheel for proposed

• Crossover : 0.9

• Mutation : 0.1
 4.2 Binomial3 regression
The binomial3
[11]
is a symbolic regression problem which is used for many applications. The problem involves seeking a function expressible as:
The binomial3 problem was defined as fitness cases for 50 equidistant points generated from Eq. (1), over the interval [1,0). The raw fitness is the sum of the absolute errors. The ephemeral random constants (ERCs) are uniformly distributed over a specified interval of the form [α, α], where α is a real number that defines the range for ERCs. Four values for α were usednamely: {0, 1, 3, 10}.
The experimental results for the binomial3 problem are summarized in
Table 1
. The proposed methods produced better results than standard crossover & mutation, in terms of number of evaluations, and success rate. Because the binomial3 problem is easier than the other two problems, it shows only slight differences in the success rate comparison. However, big improvements are shown for the number of evaluations, which is one of the most important indexes. The proposed crossover & mutation method reduces evaluations required approximately 70% compared to standard GP, except for the very easy case of no ERCs.
Results of binomial3 regression problem
Results of binomial3 regression problem
To investigate the distribution of offspring in tree structure space produced by the proposed crossover & mutation, data from a typical run for ERC[100,100] are plotted in
Fig. 3
. During the initial random generation, most of the 500 individuals are placed within the lefthand portion of region I, and a few trees are located at the boundary with region IIb and on the border line in the lower left corner. As the generations are increased, the generated trees are moved to right and distributed evenly. From the results, we can confirm that most of proposed operations was executed according to the intention of treestructure based aware operation. One of the most interesting phenomena is the distribution of individuals shows much high diversity than the result of standard GP’s in
Fig. 4
.
Distribution of tree structures generated by the proposed crossover & mutation operators for 500 individuals by generation (binomial3 regression with ERC[100,100])
Distribution of tree structures generated by standard crossover & mutation for 500 individuals by generation (binomial3 regression with ERC[100, 100])
 4.3 Multiplexor
The objective of the multiplexor problem
[1]
is to find a boolean function which performs multiplexing using an nbit address. The experimental results for 6 and 11bit multiplexor problems are summarized in
Table 2
.
Results of binomial3 regression problem
Results of binomial3 regression problem
For the 6multiplexor, the proposed method is superior to standard GP for success rate and evaluations. Notably, the evaluations needed using the proposed crossover & mutation operators are only 17% of standard GP’s. For the 11multiplexor, the success rate of the proposed crossover & mutation is outstanding, at 60%; the other is 0%.
The distribution of trees in tree structure space using the proposed crossover & mutation for a typical run on the 6 multiplexor is very similar in the case of binomial3 problem (not in shown in the paper because of space limitation).
 4.4 Evenparity
The third experiment was on the evennparity problem
[4]
. It has been recognized as difficult for genetic programming to induce if no bias favorable to their induction is introduced in the function set, the input representation, or in any other part of the algorithm
[14]
.
Table 3
summarizes the experimental results for 3, 4 and 5even parity problems. For the even3 parity problem, the results of the proposed method is better than those of standard GP in terms of number of evaluations. The success rates of the proposed method is better than standard GP’s for the even4 parity problem. Moreover, the number of evaluations of the proposed crossover & mutation is only 12% of standard GP’s.
Results of even parity problem
Results of even parity problem
For the even5 parity problem, the success rate of the proposed crossover & mutation (75%) is quite superior to standard GP’s (never found). The distribution graph of tree structures for a typical run on the even3 parity problem displays similar results to those seen for the binomial3 regression and multiplexor problems
 4.5 Investigation of the depth distribution of individuals
Fig. 5
shows the depth distributions of parents that participated in GP crossover and mutation in the regression problem with ERC[100,100]. The distribution using the newly proposed operators is more balanced than the results with standard operators. With standard operators, the parents’ tree sizes of 16 and 17 are a larger proportion than with the proposed operators. The case of the 6multiplexor problem is similar to that of the regression problem, as shown in
Fig. 6
.
Depth distribution of parents which participated in GP crossover & mutation, on the regression problem with ERC[100,100]
Depth distribution of Parents which participated in GP crossover & mutation, on the 6multiplexor problem
The distribution of participating parents in the proposed GP operation is very widely spread in the 5parity problem, as shown in
Fig. 7
. That seems to be closely related to the diversity of the population, because with the standard operators, most of individuals rapidly grew to large trees.
Depth distribution of parents which participated in GP crossover & mutation, on the 5parity problem
5. Conclusion
We have suggested new recombination operators based on tree distributions in structure space and structural difficulties. To demonstrate the effectiveness of our proposed approach, experiments on the binomial3 regression, multiplexor and even parity problem were performed. The experimental results showed that the results using the proposed treestructure based aware operators were superior to the results of standard GP for all three test problems in both success rate and number of evaluations.
Further study will aim at analysis, extension and refinement of the treestructure based aware GP operators to validate their effectiveness more theoretically and to apply them to more complex and practical realworld problems.
Acknowledgements
This Research was supported by Seokyeong University in 2013.
BIO
Kisung Seo received the BS, MS, and Ph.D degrees in Electrical Engineering from Yonsei University, Seoul, Korea, in 1986, 1988, and 1993 respectively. He became Full Time Lecturer and Assistant Professor of Industrial Engineering in 1993 and 1995 at Seokyeong University, Seoul, Korea. He joined Genetic Algorithms Research and Applications Group (GARAGe) and Case Center for ComputerAided Engineering & Manufacturing, Michigan State University from 1999 to 2002 as a Research Associate. He was also appointed Visiting Assistant Professor in Electrical & Computer Engineering, Michigan State University from 2002 to 2003. He was a Visiting Scholar at BEACON (Bio/computational Evolution in Action CONsortium) Center, Michigan State University from 2011 to 2012. He is currently Associate Professor of Electronics Engineering, Seokyeong University. His research interests include genetic programming, evolutionary design of gait generation for quadruped and humanoid robots, and evolutionary prediction for weather system.
Cheolhyuk Bhang He received B.S. and M.S. degrees from Seokyong University, Seoul, Korea, in 2007 and 2009, respectively. Currently, he is a R&D team manager with the AP Systems, Hwaseong, Korea. His research interests are genetic programming and control systems in power electronics.
Koza J. R.
1992
Genetic Programming: On the Programming of Computers by Natural Selection
MIT Press
Cambridge, MA, USA
Luke S.
2000
Issues in Scaling Genetic Programming Breeding Strategies, Tree Generation and Code Bloat, PhD thesis
University of Maryland
Goldberg D.
1989
Genetic algorithms in search, optimization, and machine learning
Davidson J.W.
,
Savic D.A.
,
Walters G.A.
2003
“Symbolic and numerical regression: experiments and applications,”
Information Sciences
150
(12)
95 
117
DOI : 10.1016/S00200255(02)003717
Seo K.
,
Hyun S.
,
Goodman E. D.
2010
“Genetic Programming Based Automatic Gait Generation in Joint Space for a Quadruped Robot,”
Advanced Robotics
24
(15)
2199 
2214
DOI : 10.1163/016918610X534312
Seo K.
,
Hyeon B.
,
Hyun S.
,
Lee Y.
2013
“Genetic Programming Based Model Output Statistics for Short Range Temperature Prediction,”
SpringerVerlag
Lecture Notes in Computer Science
7835
122 
131
Majeed H.
,
Ryan C.
2014
“On the Constructiveness of Context Aware Crossover,”
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO’07)
London, England, United Kingdom
July 711, 2007
1659 
1666
Poli R.
,
Page J.
2000
“Solving High Order Boolean Parity Problems with Smooth Uniform Crossover,”
Sub Machine Code GP and Demes, Genetic Programming and Evolvable Machines
1
(1/2)
37 
56
DOI : 10.1023/A:1010068314282
Ito T.
,
Iba H.
,
Sato S.
“Depth Dependent Crossover for Genetic Programming,”
in Proceedings of the World Congress on Computational Intelligence
Anchorage, AK. USA
May 49, 1998
775 
780
Daida J. M.
,
Hilss A. M.
“Identifying Structural Mechanisms in Standard Genetic Programming,”
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO2003), LNCS
Chicago, IL, USA
July 1216, 2003
2724
1639 
1651
Daida J. M.
,
Polito J. A.
,
Stanhope S. A.
,
Berttam R. R.
,
Khoo J. C.
2001
“What Makes a Problem GP Hard? Analysis of a Tunably Difficult Problem in Genetic Programming,”
Genetic Programming and Evolvable Machines
2
(2)
165 
191
DOI : 10.1023/A:1011504414730
Ngyen X.
,
Hoai B.
,
McKay D. Essam
2006
“Representation and structural Difficulty in Genetic Programming,”
IEEE Transactions on Evolutionary Computation
10
(2)
157 
166
DOI : 10.1109/TEVC.2006.871252
Zongker D.
,
Punch B.
1995
LilGP User’s Manual
Michigan State University
Sliva S.
,
Costa E.
“Resource Limited Genetic Programming: The Dynamic Approach,”
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO’05)
Washington, DC, USA
June 2529, 2005
1673 
1680