October 1994 Report No. STAN-CS-TR-94- 1528 Architecture-Altering Operations for Evolving the Architecture of a Multi-Part Program in Genetic Programming by John R. Koza Department of Computer Science Stanford University Stanford, California 94305 ~ZEO ~! Computer Science Department October 21, 1994 ARCHITECTURE-ALTERING OPERATIONS FOR EVOLVING THE ARCHITECTURE OF A MULTI-PART PROGRAM IN GENETIC PROGRAMMING .lohn R. Koza Computer Science Department Stanford University Margaret Jacks Hall Stanford, California 94305-2140 USA ARCHITECTURE-ALTERING OPERATIONS FOR EVOLVING THE ARl~H~TECTURE OF A MULTI-PART PROGRAM IN GENETIC PROGRAMMING John R. Koza Computer Science Department Stanford University Margaret Jacks Hall Stanford, California 94305-2140 USA Koza@ CS . S tanford.Edu Phone: 415-941-0336 FAX- 415-941-9430 AB STRACT Previous work described a way to evolutionarily select the architecture of a multi-part computer program from among preexisting alternatives in the population while concurrently solving a problem during a run of genetic programming. This report describes six new architecture-altering operations that provide a way to evolve the architecture of a multi-part program in the sense of actually changing the architecture of programs dynamically during the run. The new architecture-altering operations are motivated by the naturally occurring operation of gene duplication as described in Susumu Ohno's provocative 1970 book Evolution by Means of Gene D u p I i c a t i o n as well as the naturally occurring operation of gene d e l e t i o n . The six new architecture-alterin~ operations are branch duplication, argument duplication, branch creation, argument creation, branch deletion and argument deletion. A connection is made between genetic programming and other techniques of automated problem solving by interpreting the architecture-altering operations as providing an automated way to specialize and generalize programs. The report demonstrates that a hierarchical architecture can be evolved to solve an illustrative symbolic regression problem using the architecture-altering operations. Future work will study the amount of additional computational effort required to employ the architecture-altering operations. TABLE OF CONTENTS 1. Introduction 2 ... Gene Duplication and Deletion in Nature 3 ... Gene Duplication and Deletion in Evolutionary Algorithms 1 0 4 ... Background on Genetic Programming 12 4.1....... Background on Automatically Defined Functions 12 4.2....... Steps for Executing Genetic Programming 14 4.3....... Methods of Determining the Architecture of a Multi-Part Program 15 4.3.1..... Method of Prospective Analysis 16 4.3.2..... Method of Seemingly Sufficient Capacity 16 4.3.3..... Method of Using Affordable Capacity 17 4.3.4..... Method of Retrospective Analysis 17 4.3.5..... Evolutionary Selection of the Architecture 17 4.4....... Creation of the Initial Random Population in an Architecturally Diverse Population 18 4.5....... Structure-Preserving Crossover in an Architecturally Diverse Popul ati on 5.... The Six New Architecture-Altering Genetic Operations 5. l....... Branch Duplication 5.2........ Argument Duplication 5.3........ Branch Deletion 5.3.1..... Branch Deletion by Consolidation 5.3.2..... Branch Deletion with Random Regeneration 5.3.3..... Branch Deletion by Macro Expansion 5.4....... Argument Deletion 5.4.1..... Argument Deletion by Consolidation 5.4.2..... Argument Deletion with Random Regeneration 5.4.3..... Argument Deletion by Macro Expansion 5.5....... Branch Creation 5.6....... Argument Creation 6 ...... Rotating the Tires on an Automobile 6.1....... Approach Without Automatically Defined Functions 6.2....... Approach With Automatically Defined Functions 6.3....... Approach With Evolutionary Selection of the Architecture 6.4....... Approach With Architecture-Altering Operations 6.4. l.... Generalization by Means of Branch Deletion and Argument Deletion 6.4.2..... Specialization by Means of Branch Duplication and Argument Duplication 6.4.3..... Specialization by Means of Branch Creation and Argument Creation 7 ...... Examples of Actual Runs 7. l...... Example 7.2....... Example 2 8 ...... Conclusions 9 ...... Future Work 10...... Acknowledgements 1 1..... Bibliography 1 9 2 2 22 25 27 27 28 28 29 30 30 30 3 1 36 3 9 39 40 4 1 42 42 43 44 5 5 5 6 4 1. Introduction ln nature, crossover ordinarily recombines a part of the chromosome of one parent with a corresponding (homologous) part of the second parent's chromosome. However, in certain very rare and unpredictable instances, this recombination does not occur in the usual way. A gene duplication is an illegitimate recombination event that results in the duplication of a lengthy subsequence of a chromosome. Susumu Ohno's seminal 1970 book Evolution by Gene Duplication proposed the provocative thesis that the creation of new proteins (and hence new structures and behaviors in living things) begins with a gene duplication and that gene duplication is "the major force of evolution." This report describes six new architecture-altering genetic operations for genetic programming that are suggested by the mechanism of gene duplication (and the complementary mechanism of gene deletion) in chromosome strings. This report proposes that these new operations be added to the toolkit of genetic programming when the user desires to evolve the architecture of a multi-part program containing automatically defined functions (ADFs) during the run of genetic programming. Thc six new architecture-altering operations can be viewed from five p e rspe ct i v e s . First, the new architecture-altering operations provide a new way to solve the problem of determining the architecture of the overall program in the context of genetic programming with automatically defined functions. Second, the new architecture-altering operations provide an automatic implementation of the ability to specialize and generalize in the context of automated problem-solving. Third, the new architecture-altering operations automatically and dynamically change the representation of the problem while simultaneously and automatically solving the problem . Fourth, the new architecture-altering operations automatically and dynamically decompose problems into subproblems and then automatically solve the overall problem by assembling the solutions of the subproblems into a solution of the overall p ro b l em . Fifth, the new architecture-altering operations automatically and dynamically discover useful subspaces (usually of lower dimensionality than that of the overall problem) and then automatically assemble a solution of the overall problem from solutions applicable to the individual subspaces. Section 2 of this report describes the naturally occurring processes of gene duplication and gene deletion. Section 3 describes analogs of gene duplication and gene deletion that have appeared in previous work with character strings in the field of genetic algorithms and other evolutionary algorithms. Section 4.1 provides basic background information on genetic programming and automatically defined functions. Section 4.2 lists the steps for executing genetic programming. Section 4.3 describes the five existing methods for determining the architecture of multi-part programs in the context of genetic programming with automatically defined functions. Section 4.4 describes different methods of creating the initial random population when these new operations are being used. Section 4.5 describes structure-preserving crossover with point typing in an architecturally diverse population. Section 5 describes the six new architecture-altering operations. Section 6 illustrates the architecture-altering operations using a g e d a n k e n experiment involving the problem of rotating the tires on an automobile. Section 7 contains some examples of actual runs of genetic programming with the new architecture- alterin~ operations. Section ~ is the conclusion and section 9 outlines future work. 2 2. Gene Duplication and Deletion in Nature In nature, deoxyribonucleic acid (DNA) is a long thread-like biological molecule that has the ability to carry hereditary information. DNA also has the ability to serve as a model for the production of replicas of itself. All known life forms on this planet (including bacteria, fungi, plants, animals, and humans) are based on the DNA molecule. The DNA molecule (called the chromosome) consists of a backbone that encases a long sequence of four nucleiotide bases: adenine (R ), guanine (G ), cytosine (C ), and thymine (T) . Proteins are responsible for such a wide variety of biological structures and functions that it can be said that the structure and functions of living organisms are primarily determined by proteins (Stryer 1988). For example, some proteins transport particles such as electrons, atoms, or large macromolecules within living organisms (e.g., hemoglobin transports oxygen in blood). Some proteins store particular particles for later use (e.g., myoglobin stores oxygen in muscles). Some proteins generate nerve impulses (e.g., rhodopsin is the photoreceptor protein in retinal rod cells) while other proteins enable signals to be communicated in the nervous system. Some proteins provide physical structure (e.g., collagen gives skin and bone their high tensile strength). Other proteins create physical contractile motion (e.g., actin and myosin). Proteins are the basis of the immune system (e.g., antibodies recognize and combine in highly specific ways with foreign entities such as bacteria). Hormonal proteins transmit chemical instructions throughout the living organism. Other proteins control the expression of the genetic information contained in the nucleic acids that are responsible for the reproduction of the organism. Growth-factor proteins control growth and differentiation. Protein molecules are polypeptides that are composed of sequences of between about 50 and several thousand amino acids. The sequence of amino acids appearing in a protein are specified by the sequence of nucleiotide bases appearing in the DNA. Sub-sequences consisting of three nucleiotide bases of DNA (called a codon) are translated, using the genetic code, into one of 20 amino acids. Then, organisms consisting of proteins created in this manner spend their lives attempting to grapple with their environment. Some organisms in a given population do better than others in this pursuit. In particular, some organisms survive to the age of reproduction, reproduce a certain number of offspring, and thereby pass on all or part of their genetic make-up (their DNA) to the next generation of the population. Over a period of time and many generations, the population as a whole evolves so as to give increasing representation to traits that contribute to survival of the organism in its environment, that facilitate the survival of individual organisms to the age of reproduction, and that facilitate larger numbers of offspring. As Charles Darwin stated in On the Origin of Species by Mean.~ of Natural Selection ( 1 859), "I think it would be a most extraordinary fact if no variation ever had occurred useful to each being's own welfare .... But if variations useful to any organic being do occur, assuredly individuals thus characterised will have the best chance of being preserved in the struggle for life; and from the strong principle of inheritance they will tend to produce offspring similarly characterised. This principle of preservation, I have called, for the sake of brevity, Natural Selection." 3 Biological populations display the ability to adapt, survive, and reproduce in their natural environments, to rapidly and robustly adapt in response to changes in the environment. Nature's methods for adapting biological populations to their environment and nature's method of adapting these populations to successive changes in their environments provides a potentially useful model for creating automated problem-solving techniques to problems that are generally thought to require "intelligence" to solve. In nature, the naturally occurring genetic operations of mutation and crossover (sexual recombination) provide one way to alter the linear string of nucleiotide b as es . Mutation, for example, alters the chromosomal string by changing one nucleiotide base of the string. When the changed DNA is then translated into work-performing proteins in the living cell, the mutation may lead either to the manufacture of a variant of the original protein or, as is often the case, to no viable protein being manufactured from the altered portion of the DNA. The variant of the protein (or absence of the protein) may then affect the structure and behavior of the living thing in some advantageous or disadvantageous way. If the change is advantageous, natural selection will tend to perpetuate the change. In the more common situation where the random mutant is disadvantageous, natural selection will tend to cause the extinction of the mutant. When crossover is performed, the linear string of nucleiotide bases of an offspring is created by recombining portions of the DNA of one parent with portions of the DNA of the second parent. The offspring produced by crossover usually differ from each parent by a substantial number of nucleiotide bases whereas the offspring produced by mutation differ from the single parent by only one base. In addition to frequent changes introduced by mutation and crossover, chromosomes are occasionally also modified by other naturally occurring genetic operations, such as gene duplication and gene deletion. Gene duplications are rare and unpredictable events in the evolution of genomic sequences. In gene duplication, there is a duplication of a portion of the linear string of nucleiotide bases of the DNA in the living cell. When a gene duplication occurs, there is no immediate change in the proteins that are manufactured by the living cell. The effect of a gene duplication is merely to create two identical ways of manufacturing the same protein. In the terminology of computer science, gene duplication is a semantics-preserving operation. Then, over a period of time, some other genetic operation, such as mutation or crossover, may change one or the other of the two identical genes. Over short periods of time, the changes accumulating in the changing gene may be of no practical effect or value. In fact, the changed part of the linear string of nucleiotide bases of the DNA will often not even produce a viable protein. However, as long as one of the two genes remains unchanged, the original protein manufactured from the unchanged gene continues to be manufactured and the structure and behavior of the organism involved continues as before. The changed gene is simply carried along in the DNA from generation to generation. Natural selection exerts considerable force in favor of maintaining a gene that manufactures a protein that is important for the successful performance and survival of a living organism. However, after a gene duplication has occurred, there is no disadvantage associated with the loss of the second way of manufacturing the original protein. Consequently, natural selection usually exerts little or no pressure to maintain a second way of manufacturing the same protein. Over a period of time, the second gene may accumulate additional changes and diverge more and more from the original gene. Eventually the changed gene may lead to the manufacture of a distinctly new and different protein that actually does affect the structure and behavior of the living thing in some advantageous or disadvantageous way. When a 4 changed gene leads to the- manufacture of a viable and advantageous new protein, natural selection again starts to work to preserve that new gene. Ohno's Evolution by Gene Duplication (1970) corrects the mistaken notion that natural selection is a mechanism for promoting change. Instead, Ohno emphasizes the essentially conservative role of natural selection in the evolutionary process: "...the true character of natural selection ... is not so much an advocator or mediator of heritable changes, but rather it is an extremely efficient policeman which conserves the vital base sequence of each gene contained in the genome. As long as one vital function is assigned to a single gene locus within the genome, natural selection effectively forbids the perpetuation of mutation affecting the active sites of a molecule." (Emphasis in original). Ohno further points out that simple point mutation and crossover are insufficient to explain major evolutionary changes. "...while allelic changes at already existing gene loci suffice for racial differentiation within species as well as for adaptive radi ation from an immediate ancestor, they cannot account for large changes in evolution, because large changes are made possible by the acquisition of new gene loci with previously non-existent functions." Ohno continues, "Only by the accumulation of forbidden mutations at the active sites can the gene locus change its basic character and become a new gene locus. An escape from the ruthless pressure of natural selection is provided by the mechanism of gene duplication. By duplication, a redundant copy of a locus is created. Natural selection often ignores such a redundant copy, and, while being ignored, it accumulates formerly forbidden mutations and is reborn as a new gene locus with a hitherto non-existent function." (Emphasis in original). Ohno concludes, "Thus, gene duplication emerges as the major force of evolution." Ohno's provocative thesis is supported by the discovery of pairs of proteins with similar sequences of DNA and similar sequences of an~ino acids, but different functions. Examples include trypsin and chymotrypsin; the protein of microtubules and actin of the skeletal muscle; myoglobin and the monomeric hemoglobin of hagfish and lamprey; myoglobin used for storing oxygen in muscle cells and the subunits of hemoglobin in red blood cells of vertebrates; and the light and heavy immunoglobin chains (Nei 1987, Maeda and Smithies 1986, Dyson and Sherratt 1985, Brooks Low 1988, Patthy 1991, Go 1991, Hood and Hunkapiller 1991). The moth Chironomus tentans provides an additional example of gene duplication in nature (Galli and Wislander 1993). In particular, we focus our attention on the particular contiguous sequence containing 3,959 nucleiotide bases of the DNA of this moth that is archived under accession number X70063 in the European Molecular Biology Laboratory (EMBL) database and the Gen Bank database. The 732 nucleiotide bases located at positions 918-1,649 of the 3,959 bases of the DNA sequence involved become expressed as a protein containing 244 (i.e., one third of 732) amino acid residues. The 759 nucleiotide bases at positions 2,513-3,271 become expressed as a protein containing 253 residues. The 732-base subsequence is called the "C. tentans Sp38-40.A" gene and the 759-base subsequence is called "C. tentans Sp38-40.B." The bases of DNA before position 918, the bases between positions 1,650 and 2,612, and the bases after position 3,371 of this sequence of length 3,959 do not become expressed as any protein. Both the "A" and the "B" proteins are secreted from the moth's salivary gland to form two similar, but different, kinds of water-insoluble fibers. The two kinds of fibers are, in turn, spun into one of two similar, but different, kinds of tubes. One tube is for larval protection and feeding while the other tube is for pupation. Table 1 shows the bases of DNA in positions 900 through 3,399 of the 3,959 nucleiotide bases of X70063. In the DNA sequence, R represents the nucleiotide base adenine, G represents cytosine, G represents guanine, and T represents thymine. Each group of three consecutive bases (a codon) of DNA becomes expressed as one of the 20 amino acid residues of the protein. The letters R, T, and G appearing at positions 918, 919, and 920, respectively in this reading frame, of the DNA sequence are translated into the amino acid residue methionine (denoted by the single letter M using the 20-letter coding for amino acid residues in proteins). Thus, methionine is the first amino acid residue (i.e., N -terminal) of protein "A. " Positions 921, 922, and 923 of the DNA contain the bases R, G, and R, respectively, and these three bases, in this reading frame, are translated into arginine (an amino acid residue denoted by the letter R ). Thus, arginine is the second amino acid residue of protein " A" and the protein sequence begins with the residues M and R. The DNA up to position 1,649 encodes the first protein. Positions 1,647, 1648, and 1,649 code for the amino acid resident lysine (denoted by the letter K ). Thus, lysine is the last (244th) residue (i.e., C-terminal) of protein "A." Table 1 Portion of a DNA sequence containing the two expressed p r o t e i n s . TGAAGTAATA TTAAGCTATG AGAATTAAGT TCCTAGTAGT ATTAGCAGTT 950 M RIK F L V V L A V ATCTGCTTGT TTGCACATTA TGCCTCAGCT AGTGGTATGG GGGGTGATAA 1000 C L F A H Y A S A S G M G G DK AAAACCCAAA GATGCCCCAA AACCCAAAGA TGCCCCAAAA CCCAAAGAAG 1050 KPK D A P K PKD A P K PKEV TGAAGCCTGT CAAAGCTGAG TCATCAGAGT ATGAGATAGA AGTCATTAAA 1100 KPV K A E SSEY EIE VIK CACCAGAAAG AAAAGACCGA GAAGAAGGAG AAGGAGAAGA AGACTCACGT 1150 HQKE KTE KKE KEKK THV TGAAACCAAG AAAGAAGTTA AAAAGAAGGA GAAGAAGCAA ATCCCTTGTT 1200 ETK KEVK KKE KKQ IPCS CTGAAAAACT CAAGGATGAA AAACTTGATT GTGAGACCAA GGGCGTCCCT 1250 EK L KDE KLDC ETK G V P GCAGGCTACA AAGCAATCTT CAAATTCACA GAAAACGAGG AGTGCGATTG 1300 A G Y K A I F K F T ENEE CDW GACGTGCGAT TATGAAGCAC TTCCACCACC TCCAGGAGCA AAGAAAGACG 1350 TCD YE A L P P P P G A KKDD ACAAGAAAGA AAAGAAGACA GTTAAAGTCG TTAAGCCACC AAAGGAGAAA 1400 KKE KKT VKVV KPP KEK CCACCAAAGA AGCTTAGAAA GGAATGCTCT GGCGAAAAAG TGATCAAATT 1450 PPKK L RK ECS G EKV IK F CCAAAACTGT CTCGTTAAGA TTAGAGGACT TATTGCCTTT GGTGATAAGA 1500 QNC LVKI R G L I A F G DKT CAAAGAACTT TGATAAGAAG TTCGCAAAGC TTGTCCAAGG AAAGCAGAAG 1550 KN F DKK F A K L V Q G KQK AAGGGCGCAA AAAAAGCTAA AGGCGGTAAG AAGGCAGCAC CAAAACCAGG 1600 K G A K K A K G G K K A A P K P G ACCAAAACCA GGGCCAAAAC AAGCTGATAA ACCAAAAGAT GCAAAAAAAT 1650 PKP G P KQ A DK PKD A KK AAACTGACAT AGTAAGAATA ATAAAATAAA CATTATTTGA GCAACATCAC 1700 AACACAAGAA AAAAATCATA TCAACATAAT TAAGACCTAA AAATTCTCGC 1750 TATTCACTTT TTTTCAAATG AATATCCAAA ACAACATCAT TAAGGGATCT 1800 TACACAATTT TATCCCAAAT TAGTTTTAAG TCTATTTTTT AGTTTTAAGT 1850 AAAACATTAG TTAGAGAAAT TTCAAATGCG AAAAAAAGAC AAAATCAAAA 1900 TTAACTCCAA CTAATTGTCT AGATCTAATC ACCACTGAAA AACAATATTT 1950 TTTTCAATAA TATCTGAGAT GAAAATTTTG TAAGATACGA TTCAAAAAAA 2000 AAAAAACAAA AACTTAAATA TTTTCTTTAT AAGAAAGTAA AAAACTTACA 2050 TGAACAACAA GTAGACTAAG GGCTTAAAAA TACTAAGGAA TTTAAAGAAA 2100 7 CTGAACCAAT AACATCCAAT - AAATATAAGC GTGTATTTAA CATCCATTCA 2150 TGCAAAATTT GACTTGTTTT ATTCTAAACT TTTGAATTGT GAATATTTTT 2200 GATGATTATT GAATATTTTA CAGCATTTTT CGACAAAATC CAAGGAAACT 2250 GTTTTGTTTA ATATATACTA CAGCTCAGTA TCTATGCACA CGAAAAACTG 2300 TAACAGACCA GACCATAAAA CCTACACATC ACCAAGATAC GTATTTTAAA 2350 TTCATGTGAC TGACAAAAGC TGGAAACACT TGTGTCACGT CATGAAAACC 2400 TCGTTGAAAT AAAACTTCTA GAAAGGTTAT CATGAAAGAG TATAAAAGAG 2450 ATCTCAAACG AGGCTCAGTC AGTTCAGTTT AGCTTGGACT TCATATGAAG 2500 TAATATTTAG CTATGAGAAT TAAGTTCCTA GTAGTATTAG CAGTTATCTG 2550 M R I K F L V V L A V I C CTTGCTTGCA CATTATGCCT CAGCTAGTGG TATGGGGGGT GATAAAAAAC 2600 L L A H Y A S A S G M G G D K K P CCAAAGATGC CCCAAAACCC AAAGATGCCC CAAAACCCAA AGAAGTGAAG 2650 K D A P K P K D A P K P K EVK CCTGTCAAAG CTGACTCATC AGAGTATGAG ATAGAAGTCA TTAAACACCA 2700 PVK A DSS EYE IEVI KHQ GAAAGAAAAG ACCGAGAAGA AGGAGAAGGA GAAGAAAGCT CACGTCGAAA 2750 KE K TEKK EKE KK A HVEI TCAAGAAAAA GATTAAAAAT AAGGAGAAGA AGTTTGTCCC ATGTTCTGAA 2800 KKK IKN KEKK F V P C S E ATTCTCAAGG ATGAAAAACT TGAATGTGAG AAAAATGCTA CTCCAGGCTA 2850 L KD EK L ECE KN A T P G Y TAAAGCACTC TTCGAATTCA AAGAAAGCGA AAGTTTTTGC GAATGGGAGT 2900 K A L F E F K E S E S F C EWEC GCGATTATGA AGCAATTCCA GGAGCAAAGA AAGACGAAAA AAAGGAGAAG 2950 DY E A I P G A KK D E K KEK AAGGTAGTTA AAGTCATTAA GCCACCAAAG GAAAAACCAC CAAAGAAGCC 3000 KVVK VIK PPK EKPP KKP TAGAAAGGAA TGCTCTGGCG AAAAAGTGAT CAAATTCCAA AACTGTCTCG 3050 R K E C S G E KVI K F Q NC L V TTAAGATTAG AGGACTTATT GCCTTTGGTG ATAAGACAAA GAACTTTGAT 3100 KI R G L I A F G D KTK N F D AAGAAGTTTG CAAAGCTTGT CCAAGGAAAG CAAAAGAAGG GCGCAAAAAA 3150 KK F A K LV Q G K QKK G A KK AGCTAAAGGC GGTAAGAAGG CAGAACCAAA ACCAGGACCA AAACCAGCAC 3200 A K G G K K A E P K P G P K P A P CAAAACCAGG ACCAAAACCA GCACCAAAAC CAGTACCAAA ACCAGCTGAT 3250 K P G P K P A P K P V P K P A D AAACCAAAAG ATGCAAAAAA ATAAACTGAC ATAGTGAGAA TAATAAAATA 3300 K P K ~ A K K Table 2 shows the 244 amino acid residues of the C. tentans Sp38-40.A protein. Table 2 Protein sequence of " A" protein. MRIKFL W LA VICLFAHYAS ASGMGGDKKP KDAPKPKDAP KPKEVKPVKA 50 ESSEYEIEVI KHQKEKTEKK EKEKKTHVET KKEVKKKEKK QIPCSEKLKD 100 EKLDCETKGV PAGYKAIFKF TENEECDWTC DYEALPPPPG AKKDDKKEKK 150 TVK W KPPKE KPPKKLRKEC SGEKVIKFQN CLVKIRGLIA FGDKTKNFDK 200 KFAKLVQGKQ KKGAKKAKGG KKAAPKPGPK PGPKQADKPK DAKK 244 Table 3 shows the 253 amino acid residues of the C. tentans Sp38-40.A protein. Table 3 Protein sequence of "B" protein. MRIKFLW LA VICLLAHYAS ASGMGGDKKP KDAPKPKDAP KPKEVKPVKA 50 DSSEYEIEVI KHQKEKTEKK EKEKKAHVEI KKKIKNKEKK FVPCSEILKD 100 EKLECEKNAT PGYKALFEFK ESESFCEWEC DYEAIPGAKK DEKKEKKVVK 150 VIKPPKEKPP KKPRKECSGE KVIKFQNCLV KIRGLIAFGD KTKNFDKKFA 200 KLVQGKQKKG AKKAKGGKKA EPKPGPKPAP KPGPKPAPKP VPKPADKPKD 250 AKK 253 The two proteins are similar, but different. For example, the first 14 amino acid residues are identical. Residue 15 of the "A" protein is phenylalanine (F), while the residue 15 of the "B" protein is leucine (L), a chemically similar amino acid. Residues 16-50 are identical. Residue 51 of the "A" protein is glutamic acid (E), while residue 51 of the "B" protein is Aspartic acid (D ). Both D and E are similar in that both are electrically negatively charged residues at normal pH values. However, for some positions, such as 76, the amino acid residues (T and A ) are not chemically or electrically similar. If we now read from the end of each protein, we see that the last five residues of each protein are identical. That is, positions 240-244 of protein "A" are identical to positions 249-253 of protein "B." Since the proteins are of different length, identification of this particular similarity between the two proteins requires aligning the two proteins in some way. Protein alignment algorithms, such as the Smith-Waterman algorithm (Smith and Waterman 1981), provide a way to align two proteins and to measure the degree of similarity or dissimilarity between two proteins. The Smith-Waterman algorithm is a progressive alignment method employing dynamic programming based on a scoring algorithm. Since the proteins being aligned are typically of different lengths, gaps may be introduced (and then lengthened) in an attempt to best align the residues making up the proteins. A penalty is assessed to open a gap (5 here) and another penalty is assessed to lengthen a gap (25 here). An additional penalty is assessed when one residue disagrees with another. This penalty is smaller for substitutions involving similar amino acid residues. The PAM-250 ("Percentage of Accepted point Mutations") matrix is used to reflect the likelihood of one amino acid residue being mutated into another. The overall scoring algorithm performs a tradeoff employing dynamic programming between the penalties assessed by the PAM-250 matrix, the gap-opening penalty, and the gap-lengthening penalty. The Smith-Waterman algorithm has been implemented in GeneWorks, a software package available from Intelligenetics Inc. of Mountain View, California. Table 4 shows the alignment of the C. tentans Sp38-40.A protein and the C. tentans Sp38-40.B protein. Identical residues are boxed. The alignment shows that there is 81% identity between the two protein sequences. As can be seen, the first 9 disagreement between the two aligned sequences occurs at position 15 and the second occurs at residue 51. The first gap is introduced at residue 112 where the "A" protein has an alanine (A ) residue. - A - gap of length 3 is introduced at positions 147, 148, and 149 where the "A" protein has three proline (P ) residues. Note that this alignment recognizes the identity between the last five residues of the two proteins. This alignment has a total cost of 265. Galli and Wislander (1993) point out that these two similar proteins arise as a consequence of a gene duplication. Immediately after the gene duplication occurred at some time in the distant past, there were two identical copies of the duplicated sequence of DNA. Over a period of millions of years since the initial gene duplication, additional mutations accumulated so that the two proteins are now only 81 % identical (after alignment). More importantly, the two proteins now perform different (but similar) functions in the moth. Table 4 Protein ali~nment of the "A" and "B" proteins. First.protein Second.protein First.protein Second.protein First.protein Second.protein First.protein Second.protein RIKFL W LA VIC RIKFL W LA VIC HYAS ASGMGGDKKP KDAPKPKDAP KPKEVKPVK HYAS ASGMGGDKKP KDAPKPKDAP KPKEVKPVK SSEYEIEVI KHQKEKTEKK EKEK SSEYEIEVI KHQKEKTEKK EKEK KGV NAT CSE (' .S E LKD LKD P GAKKD KKE P GAKKD KKE CDYE~ LPP~P DYE~ I-- KPPK EKPPKK ~ RKE CSGEKVIKFQ NCLVKIRGLI AFGDKTKNFD KPPK EKPPKK e RKE CSGEKVIKFQ NCLVKIRGLI AFGDKTKNFD First.protein |KKFAKLVQGK QKKGAKKAKG GKK Second.protein |KKFAKLVQGK QKKGAKKAKG GKK First.protein -~KDAK~ Second.protein K~KDAK~ ___Q ~PKP PKPAD 5 0 5 0 1 0 0 1 0 0 1 4 9 1 4 6 1 9 9 1 9 6 2 3 9 2 4 6 2 4 4 2 '~ Gene deletion also occurs in nature. In gene deletion, there is a deletion of a portion of the linear string of nucleiotide bases that would otherwise be translated and manufactured into work-performing proteins in the living cell. After a gene deletion occurs, some particular protein that was formerly manufactured will no longer be manufactured and there may be some change in the structure or behavior of the biological entity. The absence of the protein may then affect the structure and behavior of the living thing in some advantageous or disadvantageous way. If the deletion is advantageous, natural selection will tend to perpetuate the change, but if the deletion is disadvantageous, natural selection will tend to lead to the extinction of the change. 1 0 3. Gene Duplication and Deletion in Evolutionary Algorithms Analogs of the naturally occurring operation of gene duplication have been used in connection with the genetic algorithm and other evolutionary algorithms for some time . Cavicchio (1970) used intrachromosomal gene duplication in early work on pattern recognition using the genetic algorithm (Goldberg 1989). Holland (1975) suggested that intrachromosomal gene duplication might provide a mean of adaptively modifying the effective mutation rate by making two or more copies of a substring of adj acent alleles. If there are k copies of an allele, the probability of a mutation of a particular allele is k times greater than if there were but one occurrence of the allele. Gene duplication is implicitly used in the messy genetic algorithm (Goldberg, Korb, and Deb 1989). Lindgren (1991) analyzed the prisoner's dilemma game using an evolutionary algorithm that employed an operation analogous to naturally occurring gene dup l ic ati on . The prisoner's dilemma is a problem in game theory with numerous psychological, sociological, and geopolitical interpretations. In this game, two players can either cooperate or not cooperate. The players make their moves simultaneously and without communication. Each player then receives a payoff that depends on his move and the simultaneous move of the other player. The payoffs in the prisoner's dilemma game are arranged so that a non-cooperative choice by one player always yields a greater payoff to that player than a cooperative choice (regardless of what the other player does). However, if both players are selfishly non-cooperative, they are both worse off than if they had both cooperated. The game is not a zero-sum game because, among other things, both players are better off if they both cooperate. For a single encounter, the best strategy for each player is to be non-cooperative even though both are worse off than if they had both cooperated. However, the situation becomes considerably more interesting and complex if the two players engage in this game over a series of plays. In this so-called iterated version of the prisoner's dilemma, it becomes advantageous for cooperation to evolve (Axelrod 1984, 1987). In Lindgren's work, strategies for playing the game over a series of plays are expressed as fixed-length binary character strings of length 2, 4, 8, 16, or 32. Strings of length 2 represent game-playing strategies that take account of only the one previous action by the opponent. For example, the string 01 instructs the player to make a non-cooperative move (indicated by a 0) if the opponent made an uncooperative move on his previous move and to make a cooperative move (indicated by 1) if the opponent just made a cooperative move. This particular strategy is called "tit-for-tat" since the player mimics his opponent's previous move. The string 10 is called "anti-tit-for-tat" because it instructs the player to do the opposite of what the opponent did on the previous move. String 11 is "Mr. Nice Guy" and 00 is "Darth Vader. " The 16 strategies represented by strings of length 4 take account of the player's own previous action as well as the opponent's previous action. Strings of length 8 look back even farther and take account of the opponent's action two moves ago in addition to both players' actions one move ago. Similarly, strings of length 16 and 32 take account of additional previous moves of the opponent and/or the player. Lindgren used an evolutionary algorithm to evolve a population of game-playing strategies with varying degrees of look-back. Lindgren started with a population of 1 1 1,000 consisting of 250 copies o-f each of the 4 possible strings of length 2. The fitness of a string is measured according to the average score achieved by the strategy when that strategy is played interactively against all other strategies in the population. At each generational step of the process, a string is copied (reproduced) in proportion to its fitness. The strings in the population are occasionally modified by Lindgren's evolutionary algorithm using three operations. First, a mutation operation randomly alters a single bit in a single string. Second, a gene duplication operation doubles a given character string. For example, the gene duplication operation transforms the string 01 into 0101. This operation has no immediate effect on the play because this lengthened string takes the previous move of the player himself into account, but then causes the very same action to be taken. Third, a gene deletion operation (that Lindgren calls "split mutation") cuts the length of a string in half by randomly deleting either the first or second half of the string. For example, when this operation is applied to the string 1100, the result is either the string 11 or 00 (with equal probability). In general, this operation does have an immediate effect on the play. Lindgren's evolutionary algorithm did not contain the crossover (recombination) op eration . Over a period of many generations, Lindgren (1991) found that the dynamics of this population of game-playing strategies exhibited many interesting evolutionary phenomena. Strategies with varying degrees of look-back spontaneously emerged, prospered, and became extinct. The phenomena of mass extinction and punctuated equilibrium were seen in the population as a whole while the evolutionary process progressed . 1 2 4. Background on Genetic Programming John Holland's pioneering 1975 Adaptation in Natural and Artificial Systems described how the evolutionary process in nature can be applied to artificial systems using the genetic algorithm operating on fixed length character strings (Holland 1975). Additional information on current work in genetic algorithms can be found in Goldberg (1989), Forrest (1993), Davis (1987, 1991), and Michalewicz (1992). Genetic programming is an extension of the genetic algorithm in which the genetic population consists of computer programs (that is, compositions of primitive functions, terminals, and possibly automatically defined functions). Genetic programming starts with a primordial ooze of randomly generated computer programs composed of available programmatic ingredients and then genetically breeds the population of programs using the Darwinian principle of survival of the fittest and an analog of the naturally occurring genetic operation of crossover (sexual recombination). Genetic programming is described in the book Genetic Programming: On the Programming of Computers by Means of Natural Selection (Koza 1992). A videotape description of genetic programming can be found in Koza and Rice 1992. Genetic programming is capable of evolving computer programs that solve, or approximately solve, various problems from various fields. Many examples of recent work in genetic programming can be found in Kinnear (1994) . 4.1. Background on Automatically Defined Functions Many problem environments have regularities, symmetries, homogeneities, similarities, patterns, and modularities that can be exploited in solving the problem. An automatically defined function is a function (i.e., subroutine, procedure, module, D E F U N ) that is dynamically evolved during a run of genetic programming in association with a particular individual program in the population and which may be invoked by a calling program (e.g., a main program) that is simultaneously being evolved . Automatically defined functions can be implemented within the context of genetic programming by establishing a constrained syntactic structure for the overall programs in the population. ADFs are described briefly in Koza 1992 and more extensively in the book Genetic Programming Il: Automatic Discovery of Reusable Programs (Koza 1994a). A videotape description of automatically defined functions can be found in Koza 1994b. When automatically defined functions are being used, each program in the population contains one or more function-defining branches (each defining one automatically defined function) and one main result-producing branch. The automatically defined functions can perform arithmetic, conditional, and other types of operations, define constants, define subsets, and so forth. In addition, for certain problems, there may be other problem-specific types of branches (such as iteration- performing branches and iteration-terminating branches). Figure 1 shows an overall program consisting of one two-argument automatically defined function and one result-producing branch. ADF0, 411 410 LIST~ 412 ARG0, 413 ARGI 414 421 400 progn 470 419 (A~D~ 480 o~ 420 481 NAND~ ~5 ~0) ~A~F~ 487 ARGl~ ~A~D~ 422 482 LB) ~NOR~ 489 424 488 13 490 491 Figure 1: A program with one two-argument automatically defined function (AD F O ) and one result-producing branch with an argument map of {2}. The argument map describes the architecture of a multi-part program in terms of the number of its function-defining branches and the number of arguments that they each possess. The argument map of the set of automatically defined functions belonging to an overall program is the list containing the number of arguments possessed by each automatically defined function in the program. The argument map for the overall program in figure 1 is { 2 } because there is one function-defining branch that takes two arguments. The program in figure 1 contains architecture-defining points (also sometimes called "invariant points" because they are not altered by crossover or mutation) of the following types: (1) the P ROGN (labeled 400) appearing as the top-most point of the overall program, (2) a DEFUN (labeled 410) as the top-most point of the function-defining branch, (3) a name (i.e. ADF0 labeled 411) appearing as the first argument below the DEFUN, (4) the function LIST (labeled 412) appearing as the second argument of the DEFUN, (5) dummy arguments (such as ARG0 and ARGl labeled as 413 and 414, respectively) appearing below LIST, (6) the VALUES (labeled 419) of the function-defining branch appearing as the third argument of the DEFUN, and (7) the VALUES (labeled 470) of the result-producing branch appearing as the final argument of P ROGN. If the program in figure 1 were to have more than one automatically defined functions, there would be additional occurrences of items (2), (3), (4), (5), and (6) for each additional function-defining branch. The program in figure 1 also contains work-performing points (also sometimes called "noninvariant points" because these points are almost always different from branch to branch within a program and from program to program within the population). These work-performing points are the bodies of the result-producing branch and the function-defining branch(es). The work-performing points of figure 1 include (1) the five points labeled 420, 421, 422, 423, and 424 that are found below the VALUES (labeled 419) in the function-defining branch, and (2) the 11 points starting with the AND (labeled 480) that are found below the VALUES (labeled 470). 1 4 The result-producing branch may invoke all, some, or none of the automatically defined functions that are present within the overall program. The result-producing branch does not contain ~ du-mmy arguments (formal parameters). The result- producing branch typically contains the actual variables of the problem (e.g., D0, Dl, D 2, etc. here). The value returned by the overall program consists of the value returned by the result-producing branch. The automatically defined functions of a particular overall program are usually named sequentially as ADF0,ADFl, etc. The automatically defined functions typically each possess a certain number of dummy arguments (formal parameters). Here, AD F 0 possesses two dummy arguments, ARG0 and ARGl. Typically, the actual variables do not appear in the function-defining b r an c h e s . If the overall program has more than one automatically defined function, there may (or may not) be hierarchical references between function-defining branches. For example, the function-defining of an overall program may be allowed to refer (non-recursively) to all other previously-defined (i.e., lower numbered) function- defining branches. References within a particular program to an automatically defined function are to the automatically defined function belonging to that particular program. Actions (with side effects) may be performed within the function-defining branches, the result-producing branches, or both. When automatically defined functions are being used, the initial random generation of the population must be created so that each individual overall program in the population has the intended constrained syntactic structure. In figure 1, the constrained syntactic structure calls for one result-producing branch and one function-defining branch. The function-defining branch for A D F 0 iS a random composition of functions from the function set, ~adf, and terminals from the terminal set, ~adf. Here the function set, ~adf, consists of the two-argument Boolean functions AND, OR, NAND, and NOR. The terminal set, 'Irpb, of the function-defining branch consists of the two dummy arguments (formal parameters), A R G 0 and A R G 1 . The result- producing branch is a random composition of functions from the function set, ~rpb, and terminals from the terminal set, 'Irpb. In figure 1, the function set, ~rpb, of the result-producing branch consists of the two-argument Boolean functions AND, OR, NAND, and NOR as well as the now-defined automatically defined function, ADF0. The terminal set, 'rrpb~ of the result-producing branch consists of the five actual variables of the problem (i.e., D0, D1, D2, etc.). 4. 2 . Steps for Executing Genetic Programming Genetic programming is a domain-independent method that genetically breeds populations of computer programs to solve problems. Execution of genetic programming consists of the following steps. The six operations appearing as items (2)(c)(iii) through (2)(c)(ix) are the new architecture-altering operations described in detail in a later section below. The steps for executing genetic programming are as follows: (1) Generate an initial random population of computer programs. (2) Iteratively perform the following sub-steps until the termination criterion has been satisfied: (a) Execute each program in the population and assign it (explicitly or implicitly) a fitness value according to how well it solves the problem. 15 (b) Select program(s)- from the population to participate in the genetic operations in (c) below. (c) Create new program(s) for the population by applying the following genetic operations. (i)Reproduction: Copy an existing program to the new population. (ii)Crossover: Create new offspring program(s) for the new population by recombining randomly chosen parts of two existing programs. (iii) Mutation. Create one new offspring program for the new population by randomly mutating a randomly chosen part of one existing program . (iv) Branch duplication: Create one new offspring program for the new population by duplicating one function-defining branch of one existing program and making additional appropriate changes to reflect this change. (v) Argument duplication: Create one new offspring program for the new population by duplicating one argument of one function- defining branch of one existing program and making additional appropriate changes to reflect this change. (vi) Branch deletion: Create one new offspring program for the new population by deleting one function-defining branch of one existing program and making additional appropriate changes to reflect this change. (vii) Argument deletion: Create one new offspring program for the new population by deleting one argument of one function-defining branch of one existing program and making additional appropriate changes to reflect this change. (viii) Branch Creation: Create one new offspring program for the new population by adding one new function-defining branch containing a portion of an existing branch and creating a reference to that new b r a n c h . (ix) Argument creation: Create one new offspring program for the population by adding one new argument to the argument list of an existing function-defining branch and appropriately modifying references to that branch. (3) After satisfaction of the termination criterion (which usually includes a maximum number of generations to be run as well as a problem-specific success predicate), the single best computer program in the population produced during the run (the best-so-far individual) is designated as the result of the run. This result may (or may not) be solution) to the problem. a solution (or approximate 4.3. Methods of Determining the Architecture of a Multi-Part Program Before applying genetic programming to a problem, it is first necessary to perform at least five major preparatory steps. These steps involve determining (1) the set of terminals for each branch, (2) the set of functions for each branch, (3) the fitness measure, (4) the parameters for controlling the run, and (5) the result designation method and termination criterion. 1 6 In addition, when automatically defined functions are used, it is first necessary to perform a sixth major preparatory step. The sixth major step concerns the architecture of the yet-to-be-evolved overall programs in the population and involves determining (a) the number of function-defining branches, (b) the number of arguments possessed by each function-defining branch, and (c) if there is more than one function-defining branch, the nature of the hierarchical references (if any) allowed between the function-defining b r an c h e s . Sometimes these architectural choices flow so directly from the nature of the problem that they are obvious and virtually mandated. However, in general, there is no way of knowing a priori the optimal (or minimum) number of automatically defined functions that will prove to be useful for a given problem, or the optimal (or minimum) number of arguments for each automatically defined function, or the optimal (or sufficient) arrangement of hierarchical references among the automatically defined functions. The five existing methods (Koza 1994a) for making these architectural choices include methods based on (1) prospective analysis of the nature of the problem, (2) seemingly sufficient capacity (overspecification), (3) affordable capacity, (4) retrospective analysis of the results of actual runs, and (5) evolutionary selection of the architecture. 4 . 3 .1. Method of Prospective Analysis Some problems can be analyzed and decomposed into subproblems of known dimensionality. This insight or information can then be used to establish a common architecture for all programs in the population. For example, some problems involve finding a computer program (i.e., a mathematical expression, a composition of primitive functions and terminals) that produces the observed value of a dependent variable as its output when given the values of a certain number of independent variables as input. Problems of this type are called problems of symbolic regression, system identification, or simply "black box" problems. In many instances, it may be known that a certain number of the independent variables represent a certain relevant subspace or subsystem. In that event, the problem may be decomposed into subproblems based on the known (usually lower) dimensionality of the subspace or subsystem. When applying genetic programming with automatically defined functions to a problem where one discerns a subproblem of a certain dimensionality, one can use this insight to make the number of arguments of at least one of the function-defining branches equal to this dimensionality. Also, if it is known that there are a certain number of subspaces or subsystems, one could use this insight to choose that number as the number of function-defining branches. In practice, exact knowledge of these numbers is unnecessary; upper bounds on these numbers can be used to make the choice of the number of function-defining branches and the number of arguments possessed by each function-definin~ branch. 4 . 3 . 2 . Method of Seemingly Sufficient Capacity For many problems, the choice of the common architecture for the programs in the population can be made on the basis of providing seemingly sufficient capacity. That 17 is, one over-specifies the number of function-defining branches and the number of arguments possessed by each function-defining branch. Over-specification often works because genetic programming with automatically defined functions exhibits considerable ability to ignore extraneous dummy arguments of an automatically defined function, to ignore extraneous automatically defined functions, and to ignore unproductive hierarchical references among already-defined automatically defined fu nct i o n s . 4.3.3. Method of Using Affordable Capacity Resources are required by each additional function-defining branch and each additional argument, especially if the function-defining branches are permitted to invoke one another hierarchically. Thus, the practical reality is the amount of resources that one can afford to devote to a particular problem will strongly influence or dictate the choice of the common architecture for the programs in the population. Often the architectural choice is necessarily made on the basis of hoping that the resources that one can afford to devote to the problem will prove to be sufficient to solve the problem. 4 . 3 . 4 . Method of Retrospective Analysis A retrospective analysis of the results of sets of actual runs made with various architectural choices can determine the optimal architectural choice for a given problem. The idea is to make a number of runs of the problem with different combinations of the number of function-defining branches and the number of arguments that they each possess, to retrospectively compute the effort required to solve the problem with each such architecture, and to identify the optimal architecture. If one is dealing with a series of related problems, a thorough retrospective analysis of one problem may provide guidance for making the required architectural choice for a similar problem. 4.3.5. Evolutionary Selection of the Architecture The fifth technique for establishing the architecture of the overall program for solving a problem is to evolutionarily select the architecture dynamically during the run of genetic programming (described in Koza 1994a, chapters 21 - 25). The technique of evolutionary selection starts with an architecturally diverse initial random population. As the evolutionary process proceeds, certain individuals with certain architectures in the population will prove to be more fit than others in solving the problem. The more fit architectures prosper, while the less fit architectures tend to wither away. Eventually a program with a particular architecture may emerge that solves the problem. In this technique, various different architectures are created at the initial random generation (generation 0); however, no new architectures are ever created during the run and no architectures are altered during the run. There is a competition among the existing architectures during the course of the run. The architecturally diverse populations used with the technique of evolutionary selection require a modification of both the method of creating the initial random population (described in the next section below) and the crossover operation (described in the succeeding section below). Modifications are also required when the six new architecture-altering operations are used. 1 8 4.4. Creation of the Initial Random Population in an Architecturally Diverse Population The initial random population of programs may be created in any one of several possible ways when the six new architecture-altering operations described below are being used. One possibility (called the "minimalist approach") is that each multi-part program in the initial random population at generation 0 has a uniform architecture with exactly one automatically defined function possessing the minimal number of arguments appropriate to the problem. For example, for a problem involving floating-point variables, the argument map of every individual in generation 0 would usually be { 1 } under the "minimalist approach". A second possibility is that each program in the initial population has a uniform architecture with no automatically defined functions (i.e., only a result-producing branch). That is, the argument map of every individual in generation 0 is { }. In this event, the operation of branch creation must used to create function-defining branches. This operation may then be used with an unusually high frequency on generation 0 (where it is called the "big bang") to rapidly introduce multi-part programs into the population. A third possibility is that the population at generation 0 is architecturally diverse. This is the approach used when the technique of evolutionary selection of the architecture is being used. In this approach, the creation of an individual program in the initial random population begins with a random choice of the number of automatically defined functions, if any, that will belong to the program. Then a series of independent random choices is made for the number of arguments possessed by each automatically defined function, if any, in the program. All of these random choices are made within a wide range that includes every number that might reasonably be thought to be useful for the problem at hand. Zero is included in the range of choices for the number of automatically defined functions, so the initial random population also includes some programs without any automatically defined functions. Once the number of automatically defined functions is chosen for a particular overall program, the automatically defined functions, if any, are systematically named in the usual sequential manner from left to right. The range of possibly useful numbers of arguments for the automatically defined functions cannot, in general, be predicted with certainty for an arbitrary problem. There are some problems involving only a few actual variables where it is useful to have an automatically defined function that takes a large number of arguments. However, most problem-solving efforts focus primarily on solving problems by decomposing them into problems of lower dimensionality. Accordingly, it may be reasonable to cap the range of the number of probably-useful arguments for each automatically defined function by the number of actual variables of the problem. There is no guarantee that this cap (motivated by the desire to decompose problems) or any other cap is necessarily optimal, desirable, or sufficient to solve a given problem. In any event, practical considerations concerning resources often play a controlling role in setting the upper bound on the number of arguments to be pe rm itted . The range of potentially useful numbers of automatically defined functions cannot, in general, be predicted with certainty for an arbitrary problem. The number of potentially useful automatically defined functions does not necessarily bear any relation to the dimensionality of the problem. However, once again, considerations of resources play a controlling role in setting the upper bound on the number of automatically defined functions to be permitted. In practice, we often cap the number of automatically defined functions at the number of actual variables of the problem. 19 In practice, a zero-argument automatically defined function may or may not be a meaningful option. In the floating-point, integer, and certain other domains, it may be useful to include random constants because a zero-argument automatically defined function can be used to create an evolvable constant that can then be repeatedly called from elsewhere in the overall program. However, in the special case of the Boolean domain, the two possible Boolean constants (T or NIL) have limited usefulness because all compositions of these two constants merely evaluate to one of these two values. If an automatically defined function has no access to the actual variables of the problem, has no dummy variables, does not contain any side-effecting primitive functions, and does not contain any random constants, nothing is available to serve as terminals (leaves) of the program tree in the body of such a zero-argument automatically defined function. Random choices occurring during the creation of the initial random population determine whether the body of any particular function-defining branch of any particular program in the population actually hierarchically calls all, none, or some of the automatically defined functions that it is theoretically permitted to call. Subsequent crossovers may, of course, change the body of a particular function- defining branch during the run and thereby change the automatically defined functions that a branch actually calls hierarchically. Thus, the function-defining branches have the ability to organize themselves into arbitrary disjoint hierarchies of dependencies among the available automatically defined functions. For example, within an overall program with five automatically defined functions at generation 0, ADF4 might actually refer only to ADF2 and ADF3, with ADF2 and ADF3 not referring at all to either A D F 0 or A D F 1 . Meanwhile, A D F 1 might refer only to A D F 0 . In this situation, there would be two disjoint hierarchies of dependencies. A subsequent crossover might change this organization. For example, after such a crossover, A D F 3 might refer to A D F 0, but still not to A D F 1, thereby establishing a different hierarchy of dependencies. Any allowable (i.e., noncircular) hierarchy of dependencies may thus be created in generation 0 or created by crossover during the evolutionary proces s . 4. 5 . Structure-Preserving Crossover in an Architecturally Diverse Population In the crossover operation in genetic programming, a crossover point is randomly and independently chosen in each of two parents and genetic material from one parent is then inserted into a part of the other parent to create an offspring. A population may be architecturally diverse either because it was initially created with architectural diversity (as described above) or because the six new architecture-altering genetic operations (described below) create a diversity of new architectures during the run. If the population is architecturally diverse, the parents selected to participate in the crossover operation will often possess different numbers of automatically defined functions. Moreover, an automatically defined function with a certain name (e.g., A D F 2 ) belonging to one parent will often possess a different number of arguments than the same-named automatically defined function belonging to the other parent (if indeed A D F 2 iS present at all). After a crossover is performed, each call to an automatically defined function actually appearing in the crossover fragment from the contributing parent will no longer refer to the automatically defined function of the contributing parent, but instead will refer to the same-named automatically defined function of the receiving parent. Thus, we must redefine the crossover operation when it is employed in an architecturally diverse population. 2 o In the simplest practical implementation of genetic programming (as exemplified throughout most of Koza 1992), no syntactic constraints are involved in constructing individual programs in the~ population (except for a syntactic constraint as to the overall size of the program). Program size is typically measured in terms of depth (or, alternately, in terms of the total number of points in the program). A11 points in all programs are of a single common type. Subject only to the size limit established for generation 0, any function from the function set of the problem and any terminal from the terminal set may appear at any point in a program during the initial random creation of programs in generation 0. Similarly, subject only to the size limit established for offspring, the crossover operation produces valid offspring regardless of what points are chosen as crossover points from the two parents. When automatically defined functions are involved, each program in the population conforms to a more complex constrained syntactic structure (such as shown above in figure 1). The initial random population is created in accordance with this constrained syntactic structure. Crossover must be performed in a structure-preserving way so as to preserve the syntactic validity of all offspring. In structure-preserving crossover, the architecture-defining (invariant) points of an overall program are never eligible to be chosen as crossover points and are never altered by crossover. Instead, structure-preserving crossover is restricted to the work-performing (noninvariant) points. In structure-preserving crossover, the work-performing points in the overall program are partitioned into a certain number of types. The basic idea of structure-preserving crossover is that any work-performing point anywhere in the overall program is randomly chosen, without restriction, as the crossover point of the first parent. That point has a type assigned to it. Then, once the crossover point of the first parent has been chosen, the crossover point of the second parent is randomly chosen from among points of the same type. The typing of the work-performing points of an overall program constrains the set of subtrees that can potentially replace the chosen crossover point and the subtree below it. This typing is done so that the structure-preserving crossover operation will always produce valid offspring. There are several ways of assigning types to the work-performing points of an overall program. (1) Branch typing assigns the same type to all the work-performing points of each separate branch of an overall program (but a different type to each different branch). There are as many types of work-performing points as there are branches in the overall program. (2) Like-Branch Typing assigns the same type to all the work-performing points of each separate branch of an overall program and assigns a different type to each different branch, except that if the function sets and terminal sets of two branches are identical, all the points of both such branches are assigned the same type. (3)Point typing assigns a type to each individual work-performing point in the overall program reflective of both the branch where the point is located and the contents of the subtree starting at the point. The characteristics of the branch where the point is located is relevant in determining whether a subtree from another program may be inserted at the point. The contents of the subtree starting at the point are relevant in determining if the subtree may be inserted at a particular point of another program. If a program is subject to any additional problem-specific constrained syntactic structure, that additional structure, if any, must also be considered in typing. When all the programs in the population have a common architecture, any of the three methods of typing may be used. In practice, branch typing is most commonly 21 used. The crossover operation starts with two parents and produces two offspring when either branch typing or like-branch typing is being used. Point typing is used for architecturally diverse populations. If, for sake of argument, branch typing or like-branch typing were to be used on an architecturally diverse population, the crossover operation would be virtually hamstrung; hardly any crossovers could occur. The types produced by branch typing or like-branch typing are insufficiently descriptive and overly constraining in an architecturally diverse population. When point typing is used, the crossover operation acquires a directionality that did not exist with branch typing or like-branch typing. A distinction must be made between the contributing (first) parent and the receiving (second) parent. Consequently, the crossover operation starts with two parents, but produces only one offspring . The crossover point parent must be chosen the contributing (first) inserted at the point. (called the point of from the set of points parent "has meaning insertion) of the receiving (second) for which the crossover fragment from if the crossover fragment were to be When genetic material is inserted into the receiving parent during structure- preserving crossover with point typing, the offspring inherits its architecture from the receiving parent (the maternal line) and is guaranteed to be syntactically and semantically valid. Point typing is governed by three general principles. First, every terminal and function actually appearing in the crossover fragment from the contributing parent must be in the terminal set or function set of the branch of the receiving parent containing the point of insertion. This first general principle applies to actual variables of the problem, dummy variables, random constants, primitive functions, and automatically defined functions. Second, the number of arguments of every function actually appearing in the crossover fragment from the contributing parent must equal the number of arguments specified for the same-named function in the argument map of the branch of the receiving parent containing the insertion point. This second general principle governing point typing applies to all functions. However, the emphasis is on the automatically defined functions because the same function name is used to represent entirely different functions with differing number of arguments for different individuals in the population. Third, all additional problem-specific syntactic rules of construction, if any, must be satisfied. Structure-preserving crossover with point typing is described in detail in Koza 1994a. Structure-preserving crossover with point typing permits robust recombination while simultaneously guaranteeing that any pair of architecturally different parents will produce syntactically and semantically valid offspring. In addition, structure-preserving crossover with point typing enables the architecture appropriate for solving the problem to be evolutionarily selected during a run while the problem is being solved. In addition, when the six new architecture-altering operations (described below) are being used, structure-preserving crossover with point typing enables the architecture appropriate for solving the problem to be e v o l v e d during a run while the problem is being solved in the sense of actually changing the architecture of programs dynamically during the run. 2 2 5. The Six New Architecture-Altering Genetic Operations The six new architecture-altering genetic operations are as follows: (1) Branch duplication creates one new offspring program for the new population by duplicating a function-defining branch of an existing program (and making additional appropriate changes in the program to reflect the duplication). (2) Argument duplication creates one new offspring program for the new population by duplicating one argument of one function-defining branch of one existing program (and making additional appropriate changes in the program to reflect the duplication). (3) Branch deletion creates one new offspring program for the new population by deleting one function-defining branch of one existing program (and making additional appropriate changes in the program to reflect the deletion). (4) Argument deletion creates one new offspring program for the new population by deleting one argument of one function-defining branch of one existing program (and making additional appropriate changes in the program to reflect the deletion) . (5) Branch creation creates one new offspring program for the new population by adding one new function-defining branch containing a portion of an existing branch and creating a reference to that new branch. (6) A rgument creation creates one new offspring program for the population by adding one new argument to the argument list of an existing function-defining branch and appropriately modifying references to that branch. The six new architecture-altering operations differ from the operations of reproduction crossover, and mutation in that the argument maps of the participating individuals change as a consequence of performing the operations. During each generation of the evolutionary process, a certain percentage of the individuals in the population participate in the architecture-altering operations. Meanwhile, Darwinian selection causes differential selection in favor of more fit individuals. And, during each generation, individuals in the population are modified by the operations of crossover and mutation. The result is that the evolutionary process will select against individuals in the population with architectures that are less suitable for solving the problem. Individuals with unsuitable architectures will tend to become extinct over a period of generations. Similarly, individuals with architectures that facilitate solving the problem will tend to prosper. The six new operations are described in detail below. 5.1. Branch Duplication The operation of branch duplication duplicates one of the branches of a program in the following way: (1) Select a program from the population to participate in this operation. (2) Pick one of the function-defining branches of the selected program as the branch-to-be-duplicated. If the selected program has only one function-defining branch, that branch is automatically picked. If the selected program has no function-defining branches (or already has the maximum number of branches established for the problem at hand), this operation is aborted and no action is performed on this occasion. (3) Add a uniquely-named new function-defining branch to the selected program, thus increasing, by one, the number of function-defining branches in the selected 23 program. The new function-defining branch has the same argument list and the same body as the branch-to-be-duplicated. (4) For each occurrence of an invocation of the branch-to-be-duplicated anywhere in the selected program (e.g., the result-producing branch or any other branch that invokes the branch-to-be-duplicated), randomly choose either to leave that invocation unchanged or to replace that invocation with an invocation of the new branch. If the choice is to make the replacement, the arguments in the invocation of the new branch remain identical to the arguments of the existing inv oc ati on . The step of selecting a program is performed on the basis of fitness for branch duplication (and the other operations subsequently described herein), so that a program that is more fit has a greater probability of being selected to participate in the operation than a less fit program. A copy is first made of the selected program and the operation is then performed on the copy, so the unchanged original program remains in the population and is therefore available to be selected again on the basis of its fitness. The problem of symbolic regression of the Boolean even-parity function will be used for purposes of illustration throughout this report. The Boolean even-k-parity function takes k Boolean arguments, D0, D1, D2, and so forth (up to a total of k arguments). Each argument can take on the value T (true or 1) or NIL (false or 0). The even-k-parity function returns T if an even number of its Boolean arguments are T, but otherwise returns NIL. Parity functions are used to check the accuracy of stored or transmitted binary data in computers because a change in the value of any one of its arguments always changes (toggles) the value of the function. Because of this toggling, parity functions are often used as benchmarks in the study of machine learning and neural networks. The problem is to discover a program that mimics the behavior of the Boolean even-k-parity problem for every one of the 2k combinations of its k Boolean inputs. Suppose that the program in figure 1 has been selected, in step (1), as the program to participate in the operation of branch duplication. Since this program happens to have only one function-defining branch, the sole function-defining branch (defining ADF O ) iS picked, in step (2), as the branch-to-be-duplicated. In step (3), a new function-defining branch is added to the selected program, thus increasing, by one, the number of function-defining branches in the selected program. The new branch is given the new name of A D F 1 . This new name is unique within this program. Figure 2 shows the program resulting after applying the operation of branch duplication to the program in figure 1. The original program in figure 1 has an argument map of { 2 } . The program in figure 2 has an argument map of { 2, 2 } because the operation of branch duplication duplicated the two-argument function-defining b r a n c h . 2 4 ~de~ 510 ADF0~ ~LIS 511 ARG~j~ CARG1 'progn~SOO ~ 540 ~ 570 541~ 1542 v~ue~ ~ ~ST ~ 520 543 544 .~49 550 ARG1~ CAND, ARG 581 DF~ ~NA] ),~)587 582 583 ~ 1s89 S~X NOR ~90 591 Figure 2: Program with argument map of {2, 2} created using the operation of ~ranch duplication. Specifically, the function-defining branch starting at the DEFUN labeled 410 of figure 1 defining ADF0 (also shown as 510 of figure 2) is duplicated. The duplicated branch is added at DEFUN 540 of figure 2, thereby giving the new overall program three branches: a first function-defining branch starting at DEFUN 510, a new second function-defining branch starting at DEFUN 540, and a result-producing branch starting at v ALUES 570. The new function-defining branch is given the unique new name, ADFl, at 541. The argument list of the new function-defining branch is the same as the argument list of the branch-to-be-duplicated and consists of ARG0 543 and ARGl 544. The body of the new function-defining branch starting with VALUES 549 is the same as the body of the first function-defining branch starting at VALUES 419 of figure 1 (also shown as 519 in figure 2). There are two occurrences of invocations of the branch-to-be-duplicated, ADF0, in the result-producing branch of the selected program, namely ADF0 481 and ADF0 487 of figure 1. For each of these two occurrences, a random choice is made to either leave the occurrence of ADF0 unchanged or to replace it with the newly created ADFl. For the first invocation of ADF0 at 481 of figure 1, the choice is made to replace ADF0 with ADFl. Thus, ADFl appears at 581 in figure 2. The arguments for the invocation of ADFl 581 are Dl 582 and D2 583 in figure 2 (i.e., they are identical to the arguments Dl 482 and D2 483 for the invocation of ADF0 at 481 in figure 1). For the second invocation of ADF0 at 487 of figure 1, the choice is randomly made to leave ADF0 unchanged. Thus, ADF0 appears at 587 of figure 2. Because the duplicated new function-defining branch is identical to the previously existing function-defining branch (except for its name) and because the new function-defining branch ADFl iS invoked with the same arguments as ADF0 had been invoked, the value returned by the overall program is unchanged by the operation of branch duplication. The programs are semantically equivalent but, of course, structurally (syntactically) different. The operation of branch duplication can be interpreted as a "case splitting." After the branch duplication, the result-producing branch invokes ADF0 at 587 and ADFl at 581. ADF0 and ADFl can be viewed as separate newly-created procedures for handling 2s the two separate subproblems (cases). Immediately after the branch duplication operation, the two subproblems (cases) are handled in precisely the same way. Subsequent genetic operations may alter one or both of these two presently- identical function-defining branches. These subsequent changes can lead to a divergence in structure and behavior. This divergence may be interpreted as a specialization or refinement. That is, once ADF0 and ADF1 diverge, ADF0 can be viewed as a specialization for handling for subproblem (case) associated with its invocation by the result-producing branch. Similarly, ADF1 can be viewed as a specialization for handling its subproblem (case). The operation of branch duplication (and the other operations subsequently described herein) always produce a syntactically valid program. The operation of branch duplication (and the operations of argument duplication, branch creation, and argument creation described below) are recombinative in the sense that the offspring produced by each operation consists entirely of genetic material that comes from an existing member of the population. 5 . 2. Argument Duplication The operation of argument duplication duplicates one of the arguments in one of the automatically defined functions of a program in the following way: (1) Select a program from the population to participate in this operation. (2) Pick one of the function-defining branches of the selected program. If the selected program has only one function-defining branch, that branch is automatically chosen. If the selected program has no function-defining branches, this operation is aborted and no action is performed on this occasion. (3) Choose one of the arguments of the picked function-defining branch of the selected program as the argument-to-be-duplicated. If the picked function-defining branch has no arguments (or already has the maximum number established for the problem at hand), this operation is aborted and no action is performed on this occasion. (4) Add a uniquely-named new argument to the argument list of the picked function-defining branch of the selected program, thus increasing, by one, the number of arguments in its argument list. (5) For each occurrence of the argument-to-be-duplicated anywhere in the body of the picked function-defining branch of the selected program, randomly choose either to leave that occurrence unchanged or to replace that occurrence with the new argument. (6) For each occurrence of an invocation of the picked function-defining branch anywhere in the selected program (e.g., the result-producing branch or other branch that invokes the picked function-defining branch), identify the argument subtree in that invocation corresponding to the argument-to-be-duplicated and duplicate that argument subtree in that invocation, thereby increasing, by one, the number of arguments in the invocation. Because the function-defining branch containing the duplicated argument is invoked with an identical copy of the previously existing argument, the value returned by the overall program is unchanged by the operation of argument dupl ic ation . Figures 3 and 1 together illustrate the operation of argument duplication. The original program in figure 1 has an argument map of { 2 } . The new program in figure 3 has an argument map of { 3 } because the operation of argument duplication added an argument to the one function-defining branch. 2 6 611 610 ~ ~L~ 612 ~RG~ 613 614 615 621 ARC~2 619 p~gn 620 681 ~A~F0 ~622 682 683 684 ~UC~ 670 'A~DF0~ 687 '~B~ ~NOR 623 624 688 NOR 689 ~ ;J 695 690 691 696 697 Figure 3: Program with argument map of {3) created using the operation of argument duplication. ADF 0 in figure 1 takes two dummy arguments, namely ARG0 (labeled 413) and ARG 1 (labeled 414). Now suppose that the second argument, ARG 1 414 in figure 1 is chosen as the argument-to-be-duplicated. In figure 3, the argument list of A D F 0 has been changed by adding a uniquely- named new argument, ARG 2 at 615, thereby increasing the size of this argument list from two to three. There are two occurrences of the argument-to-be-duplicated in the body of the picked function-defining branch of the selected program, namely at 421 and 423 in figure 1. For each of these two occurrences, a random choice is made to either leave the occurrence of A R G 1 unchanged or to replace it with the newly created argument, ARG2. Figure 3 shows that the choice was in favor of a replacement for the first occurrence of ARGl at 421. Consequently, the new name, ARG2, appears at 621 of figure 3. The choice was against replacement for the second occurrence of ARGl at 423 of figure 1, so ARGl appears at 623 of figure 3. There are two occurrences of an invocation of ADF 0 in the result-producing branch at 481 and 487 of figure 1. The second argument, A R G 1, iS the argument-to -be- duplicated in this example. In the first invocation of ADF 0 at 481, the variable D 2 (labeled 483) corresponds to the argument-to-be-duplicated because it is the second argument of A D F 0 (labeled 481) . In the second invocation of A D F 0 at 487, the entire argument subtree consisting of (NOR D4 D0) at 489, 490, and 491 corresponds to the argument-to-be-duplicated. Because of the argument duplication, ADF 0 681 and AD F 0 687 in figure 3 now each take three arguments, instead of only two. For the first invocation of ADF0 at 681, D 2 683 has been duplicated so that D 2 now appears at both 683 and 684 of figure 3. For the second invocation of ADF0 at 687, the entire argument subtree (NOR D4 D0) has been duplicated so that it appears at both 689, 690, and 691 as well as 695, 696, and 697 of figure 3. Just as the operation of branch duplication was interpreted as a "case splitting," the operation of argument duplication can be interpreted as a "case splitting." Immediately after the argument duplication operation, the two subproblems (cases) are handled in precisely the same way. The particular instantiations of the second and third arguments in each invocation of A D F 0 provide the opportunity for the subsequent evolution of different ways of handling the two separate subproblems ( c ases ) . 27 Subsequent genetic operations may alter one or both of these two presently- identical arguments and these subsequent changes can lead to a divergence in structure and behavior. Once the second and third arguments diverge, this divergence may be interpreted as a specialization or refinement. 5 . 3 . Branch Deletion The operation of branch deletion deletes one of the automatically defined functions of a program in the following way: (1) Select a program from the population to participate in this operation. (2) Pick one of the function-defining branches of the selected program as the branch-to-be-deleted. If the selected program has no function-defining branches, this operation is aborted and no action is performed on this occasion. (3) Delete the branch-to-be-deleted from the selected program, thus decreasing, by one, the number of branches in the selected program. (4) For each occurrence of an invocation of the branch-to-be-deleted anywhere in the selected program (e.g., the result-producing branch or other branch that invokes the branch-to-be-deleted), replace the invocation of the branch-to-be- deleted with an invocation of a surviving branch (described below). When a function-defining branch is deleted, the question arises as to how to modify invocations of the branch-to-be-deleted by the other branches of the overall program. We consider three alternatives. One alternative (called branch deletion by consolidation) involves identifying a suitable second function-defining branch of the overall program as the surviving branch and replacing (consolidating) the branch-to-be-deleted with the surviving branch in each invocation of the branch-to-be-deleted. Branch deletion by consolidation almost never preserves the semantics of the overall program. Branch deletion by consolidation can be interpreted as a way to achieve generalization in a problem-solving procedure. A second alternative (called branch deletion with random regeneration) is to randomly generate new subtrees composed of the available functions and terminals in lieu of an invocation of the branch-to-be-deleted. Branch deletion with random regeneration almost never preserves the semantics of the overall program. A third alternative (called branch deletion by macro expansion) involves inserting the entire body of the branch-to-be-deleted for each instance of an invocation of that branch. Branch deletion by macro expansion preserves the semantics of the overall program at the expense of a massive increase in the size of the overall program. 5 . 3 . 1. Branch Deletion by Consolidation The first alternative (branch deletion by consolidation) begins by finding a suitable choice for the surviving branch within the overall program. Since branch deletion by consolidation involves two branches (i.e., the branch-to-be-deleted and the surviving branch), it is necessary that the selected program have at least two function-defining branches. Thus, when branch deletion by consolidation is being used and the selected program has less than two function-defining branches, this operation is aborted and no action is performed on this occasion. When branch deletion by consolidation is performed, the number of arguments possessed by the proposed surviving branch may equal to, less than, or greater than the number of arguments possessed by the branch-to-be-deleted. 2 8 Figure 2 illustrates the first of these three possibilities for branch deletion by consolidation (where the number of arguments possessed by the proposed surviving branch is equal to the number of arguments possessed by the branch-to-be-deleted). Suppose that the first function-defining branch (defining A D F 0 ) of the program in figure 2 is picked as the branch-to-be-deleted and that the second function-defining branch (defining ADF 1 ) iS to be the surviving branch. In that event, the first function-defining branch is deleted; the invocation of AD F 0 at 587 in figure 2 is to be replaced by an invocation of A D F l; and the two argument subtrees below the invocation of ADF 0 at 587 are retained as the argument subtrees for the invocation at 587 of AD F 1 . That is, the branch-to-be-deleted is merged into AD F 1 . The original program in figure 2 has an argument map of { 2, 2 } and the resulting program has an argument map of { 2 } . For this first possibility, the branch deletion may be viewed as a generalization of a procedure. Before the branch deletion, the two function-defining branches constitute different procedures for handling different subproblems (cases). The two branches do different things and they are invoked in different situations by the result-producing branch. After the branch deletion by consolidation, both subproblems (cases) are handled in the same way. That is, the procedure is generalized so that the surviving branch (AD F 1 here) handles both subproblems (cases) . In the second possibility, the number of arguments required by the proposed surviving branch is less then the number of arguments possessed by the branch-to- be-deleted. Any superfluous argument subtrees below the invocation of the branch- to-be-deleted are simply deleted. The branch deletion may also be viewed as a generalization of a procedure with an accompanying generalization of its arguments . In the third possibility, the number of arguments required by the a proposed surviving branch is greater than the number of arguments possessed by the branch-to-be-deleted. The required additional argument subtrees may be randomly generated (or duplicated). In the later case, the random creation is done using the same method of generation originally used to create the invoking branch (i.e., the branch containing the invocation of the branch-to-be-deleted) at the time of creation of the initial random population in generation 0 (with the branch-to-be- deleted being unavailable during this random regeneration). This approach may not be desirable because it introduces a significant mutational aspect to the operation. In that event, the operation may simply be aborted for this third possibility. 5.3.2. Branch Deletion with Random Regeneration When the second alternative (branch deletion with random regeneration) is being used, all of the argument subtrees required by the invocation of the branch-to-be- deleted are randomly generated. Except when random regeneration is used, the operation of branch deletion (and the operation of argument deletion described below) are recombinative in the sense that the offspring produced by each operation consists entirely of genetic material that comes from an existing member of the population. When random regeneration is used, branch deletion (and argument deletion) are partially recombinative and partially mutational. 5.3.3. Branch Deletion by Macro Expansion The third alternative (branch deletion by macro expansion) has the characteristic of preserving the semantics of the overall program; however, it has the disadvantage of 29 usually creating very large programs. Moreover, branch deletion by macro expansion would not work if recursion were being used, Each of the argument subtrees in each invocation~ of the branch-to-be-deleted is substituted into a copy of the body of the branch-to-be-deleted and the now-expanded body then replaces the invocation of the branch-to-be-deleted. Of course, if the objective of a deletion is change the semantics of the overall program (i.e., to achieve generalization), this alternative is undesirable. 5 . 4. Argument Deletion The operation of argument deletion deletes one of the arguments to one of the automatically defined functions of a program in the following way: (1) Select a program from the population to participate in this operation. (2) Pick one of the function-defining branches of the selected program. If the selected program has only one function-defining branch, that branch is automatically picked. If the selected program has no function-defining branches, this operation is aborted and no action is performed on this occasion. (3) Choose one of the arguments of the picked function-defining branch of the selected program as the argument-to-be-deleted. If the picked function-defining branch has no arguments (or already has only the minimum number of arguments established for the problem at hand), this operation is aborted and no action is performed on this occasion. (4) Delete the argument-to-be-deleted from the argument list of the picked function-defining branch of the selected program, thus decreasing, by one, the number of arguments in the argument list. (5) For each occurrence of an invocation of the picked function-defining branch anywhere in the selected program (e.g., the result-producing branch or other branch that invokes the picked function-defining branch), delete the argument subtree in that invocation corresponding to the argument-to-be-deleted, thereby decreasing, by one, the number of arguments in the invocation. (6) For each occurrence of the argument-to-be-deleted anywhere in the body of the picked function-defining branch of the selected program, replace the argument- to-be-deleted with a surviving argument (described below). When an argument is deleted, the question arises as to how to modify references to the argument-to-be-deleted within the picked branch. Again, we consider three alternatives . One alternative (called argument deletion by consolidation) involves identifying another argument of the picked branch as the surviving argument and replacing (consolidating) the argument-to-be-deleted with the surviving argument in the picked branch. When this alternative is employed, the operation of argument deletion may be viewed as a generalization in the sense that some information that was formerly considered in executing a procedure is now no longer considered. A second alternative (called argument deletion with random regeneration) is to generate a new subtree in lieu of an invocation of the argument-to-be-deleted using the same method of generation originally used to create the picked branch at the time of creation of the initial random population (with the argument-to-be-deleted being unavailable during this random regeneration). A third alternative (called argument deletion by macro expansion) may also be used. This alternative has the advantage of preserving the semantics of the overall program; however, it has the disadvantage of usually creating large programs. 3 o 5 . 4 . 1. Argument Deletion by Consolidation Suppose, in employing the first alternative (argument deletion by consolidation), that A RG 2 labeled 615 in figure 3 is chosen as the argument-to-be-deleted and that ARG 1 iS chosen (from among the remaining two arguments) as the surviving argument. The one occurrence of ARG2 at 621 in figure 3 is replaced by ARGl. The two invocations of AD F 0 by the result-producing branch (at 681 and 687) are modified by deleting the third argument subtree in each invocation. Specifically, the argument subtree D 0 684 is deleted from the invocation of ADF 0 at 681 and the argument subtree (NOR D4 D0) at 695, 696, and 697 is deleted from the invocation of ADF0 at 687. The result is the program shown in figure 1. The original program in figure 3 has an argument map of { 3 } and the resulting program in figure 1 has an argument map of { 2 } . Since argument deletion by consolidation involves two arguments (i.e., the argument-to-be-deleted and the surviving argument), it is necessary that the picked branch have at least two arguments. Thus, when argument deletion by consolidation is being used and the picked function-defining branch has less than two arguments, this operation is aborted and no action is performed on this occasion. It is often advisable to set a minimum permissible number of arguments for any function. For example, in a problem involving Boolean functions and no side-effects, there are only four possible Boolean functions of one argument, so it may be more efficient to exclude such one-argument functions for a Boolean problem. If this approach is adopted, whenever the picked function-defining branch has less than three arguments, the operation of argument deletion is aborted and no action is performed on this occasion. 5.4.2. Argument Deletion with Random Regeneration When the second alternative (argument deletion with random regeneration) is being used, a new subtree is randomly generated in lieu of an invocation of the argument- to-be-deleted using the same method of generation originally used to create the picked branch at the time of creation of the initial random population (with the argument-to-be-deleted being unavailable during this random regeneration). The subtree may consist of either a single available argument or an entire generated argument subtree composed of the available functions and terminals. 5.4.3. Argument Deletion by Macro Expansion When argument deletion by macro expansion (i.e., the third alternative) is used, the first step is to delete the argument-to-be-deleted from the argument list of the picked branch. The second step is then to create as many copies of the now-modified picked branch as there are invocations of the picked branch in the overall program (e.g., in the result-producing branch or other branches) and to give each such copy of the picked branch a unique name. The third step is to replace each invocation of the picked branch in the overall program with an invocation to a particular one of the uniquely-named copies of the now-modified picked branch. The fourth step is, for each uniquely-named copy of the now-modified picked branch, to insert the argument subtree corresponding to the argument-to-be-deleted for every occurrence of the argument-to-be-deleted in that particular copy. The alternative of argument deletion by macro expansion has the characteristic of preserving the semantics of the overall program; however, it has the disadvantage of usually creating a vast number of additional branches and large overall programs. Of course, if the objective of a deletion is change the semantics of the overall program (i.e., to achieve generalization), this alternative is undesirable. 31 Both the argument duplication and the branch duplication operations create larger programs. Larger programs consume more resources and a population of such growing programs may become unmanageable as a practical matter. The argument deletion operation and the branch deletion operation can create smaller programs and provide a mechanism for balancing the continual growth that would otherwise occur (provided the alternative of argument deletion by macro expansion is not used). 5 . 5. Branch Creation The operation of branch creation creates a new automatically defined function within an overall program in a more general (but less biologically motivated) way than the previously described operation of branch duplication. The steps in the operation of branch creation are as follows: (1) Select a program from the population to participate in this operation. (2) Pick a point in the body of one of the function-defining branches or result- producing branches of the selected program. This picked point will become the top- most point of the body of the branch-to-be-created. (3) Starting at the picked point, begin traversing the subtree below the picked point (e.g., in a depth-first manner). (4) As each point below the picked point in the picked branch is encountered during the traversal, make a determination as to whether to designate that point as being the top-most point of an argument subtree for the branch-to-be-created. If such a designation is made, no traversal is made of the subtree below that designated point. The traversal continues and this step (4) is repeatedly applied to each point encountered during the traversal so that when the traversal of the subtree below the picked point is completed, zero points, one point, or more than one point are so designated during the traversal. (5) Add a uniquely-named new function-defining branch to the selected program. The argument list of the new branch consists of as many consecutively-numbered dummy variables (formal parameters) as the number of points that were designated during the traversal. The body of the new branch consists of a modified copy of the subtree starting at the picked point. The modifications to the copy are made in the following way: For each point in the copy corresponding to a point designated during the traversal of the original subtree, replace the designated point in the copy (and the subtree in the copy below that designated point in the copy) by a unique dummy variable. The result is a body for the new function-defining branch that contains as many uniquely named dummy variables as there are dummy variables in the argument list of the new function-defining branch. (6) Replace the picked point in the picked branch by the name of the new function-defining branch. If no points below the picked point were designated during the traversal, the operation of branch creation is now completed. (7) If one or more points below the picked point were designated during the traversal, the subtree below the just-inserted name of the new function-defining branch will be given as many argument subtrees as there are dummy arguments in the new function-defining branch in the following way: For each point in the subtree below the picked point designated during the traversal, attach the designated point and the subtree below it as an argument to the function defined by the new function-defining branch. The operation of branch creation is more general than the operation of branch duplication in the sense that the picked point may be in the result-producing branch and also in the sense that the picked point need not be the top-most point of the body of an existing branch. 3 2 The operation of branch creation described herein is similar to, but different from, the compression (module acquisition) operation described by Angeline and Pollack in at least three ways (Angeline and Pollack 1993, 1994; Angeline 1994a 1994b) . First, Angeline and Pollack place each new function (called a module) created by the compression operation into a Genetic Library. The new function is not exclusively associated with the selected program that gave rise to it. Instead, new functions placed in the Genetic Library may be invoked by any program in the population. In practice, this occurs as soon as crossover or some other operation introduces references to the new function. In contrast, in the branch creation operation described herein, each new function is made a part of the selected program as a new branch of that particular program. The new function may be invoked only by the selected program in which it was originally created and of which it is a part. A second difference between the compression operation described by Angeline and Pollack and operation of branch creation described herein is that the body of the new branch created by the branch creation operation continues to be subject to the effects of other operations (notably crossover) in successive generations of the process. Thus, instead of being insulated from future change by residing in the Genetic Library, it is susceptible to continued change. Among the possible changes that may occur by means of crossover are the insertion of additional hierarchical references to other automatically-defined functions. A third difference between the compression operation described by Angeline and Pollack and operation of branch creation described herein is that the branch creation operation may be applied to any branch (and, in particular, to function- defining branches). Angeline and Pollack ' s operation was limited to one a single b r a n c h . Several different methods may be used to determine how to designate a point below the picked point during the depth-first traversal described above. In d e p t h compression described by Angeline and Pollack, the points at a certain distance (depth) below the picked point are designated. In leaf compression described by Angeline and Pollack, all of the external points (leaves) below the picked point are designated. Other methods of designation may be used. Here internal points below the picked point are designated independently at random with a certain probability. Regardless of the method of designation, zero points, one point, or several points are designated . Figures 4 and 1 illustrate the operation of branch creation. ADFO 2511 DEFUN ~10 IST ~ ~IALUE ARGl ARGl ~ ~ARGn ADFl A~ PROGN 2501 DEFll~J 2540 ~sr ARGl ~44 ARG2 ~2545 ARG0 2s~i ALUE 2549 ~AND ARGl ADFo 'APG2 VAWES 2s7n AND 2580 2586 ~DFl AND ~89 DO 33 Figure 4: Program with argument map of {2, 3}. Suppose the point N A N D 485 iS picked from the result-producing branch of the program starting at P R O G N 400 in figure 1. Starting at the picked point, N A N D 485, a depth-first traversal of the subtree below this point will visit 486, 487, 488, 489, 490 and 491 of figure 1 in that order. Suppose, while making this depth-first traversal, points 486, 488, and 489 of figure 1 are designated. The fact that three points were designated means that the branch-to-be-created will be a three-argument function- defining branch. The three argument subtrees starting at these three designated points will become arguments for the branch-to-be-created. Because point 489 was designated, no traversal of D4 490 and DO 491 is made. The branch creation operation causes a new branch to be added to the overall selected program . The new branch starts at D E F U N (labeled 2540) in figure 4. The new branch is given a new name, AD F 1 (labeled 2541), that is unique within the new overall program. The argument list of this new branch in figure 4 contains three consecutively-numbered dummy variables (formal parameters), namely A R G O (labeled 2543), ARGl (labeled 2544), and ARG2 (labeled 2545) appearing below LI ST (labeled 2542). The body of this new branch starts at VALUE S (labeled 2549). The body of the new branch in figure 4 consists of a modified copy of the seven-point subtree starting at the picked point, N A N D 485 of figure 1. In modifying the copy, each of the three designated points from figure 1 (486, 488, and 489) is replaced by a different consecutively-numbered dummy variable, ARGO, ARGl, and ARG2 in figure 4. Specifically, the first designated point, D 0 486, from figure 1 is replaced by ARG O and appears as ARG O 2556 in figure 4. The second designated point, D 3 488, from figure 1 is replaced by ARGl and appears as ARGl 2558 in figure 4. The third designated point, 3 4 AND 489 and the entire subtree below this designated point (i.e., D4 490 and D0 491) from figure 1 are replaced by ARG2 and appears as ARG2 2559 in figure 4. The picked point, NAND 485, from figure 1 is now replaced by the name, ADFl, of the new function-defining branch. Thus, ADFl appears at 2585 in figure 4. Since three points below NAND 485 from figure 1 were designated during the traversal, ADFl 2585 will be given three argument subtrees. The first designated point, D0 486, from figure 1 appears below ADFl 2585 as D0 2586 in figure 4 as the first argument to ADFl 2585. Because D0 iS a terminal, D0 2586 appears alone in figure 4. The second designated point, D3 488, from figure 1 appears below ADFl 2585 as D3 2588 in figure 4 as the second argument to ADFl 2585. The third designated point, AND 489, from figure 1 appears below ADFl 2585 as AND 2589 in figure 4 as the third argument to ADFl 2585. Unlike the previous two designated points, AND 489 has a subtree below it in figure 1. The entire subtree is (AND D4 D0) at 489, 490, and 491 in figure 1. Thus, the entire subtree (ANDD4D0) appears as (ANDD4D0) at 2589, 2590, and 2591 of figure 4. The argument map of the original overall program in figure 1 is { 2 } and the argument map of the new overall program in figure 4 is { 2, 3 } because the new branch takes three arguments. Figures 5 and 1 illustrate an interesting special case of the operation of branch c re ati on . ADFO 2611 ARGO DEFUN 2610 llST ARGl VALUE~ ARGO ~ I AND ARGl ~ ~ ARGO ADFl 2641 PROGN 2601 DEFUN 2640 LlSr 2642 DO ~56 VALUEI NAND 2~5 D~ 2658 ADFO 2650 AND C~ ~51 ADFO VALUES 2670 35 ADFl ~85 Figure 5: Program with argument map of {2, 0}. Suppose the point NAND 485 is again picked from the result-producing branch of the program starting at P ROGN 400 in figure 1. However, now suppose that, while making the depth-first traversal, no points were designated. As before, the branch creation operation causes a new branch to be added to the overall selected program. The new branch starts at DEFUN 2640 of figure 5. The new branch is given the unique new name, ADF1 (at 2641) in figure 5. However, since no points were designated during the traversal, the argument list of this new branch contains no dummy variables and no dummy arguments appear below LIST 2642 in figure 5. The body of this new branch starts at VALUES 2649 in figure 5. However, since no points were designated during the traversal, the body of the new branch consists of an exact copy of the seven-point subtree starting at the picked point, NAND 485, of figure 1. In particular, several of the actual variables of the problem (D0 2656, D3 2658, D4 2650, and D0 2651) are imported into the body of the function definition for ADF1 in figure 5. As before, the picked point, NAND 485, of figure 1 is replaced in figure 5 by the name, ADF1 2685, of the new function-defining branch. Since no points below NAND 485 were designated during the traversal of figure 1, ADF1 2685 in figure 5 has no 3 6 argument subtrees. Thus, in this special case, the entire subtree containing the picked point, NAND 485, and the subtree below it is encapsulated in the zero-argument automatically defined function AD F 1 in figure 5. In this example, the argument map of the original overall program in figure 1 is { 2 } and the argument map of the new overall program in figure 5 is (2, 0 } . The operation of branch creation does not have any immediate effect on the value(s) returned and the action(s) performed by the selected program. However, subsequent operations may alter the branch that is created by this operation and may therefore lead to some later divergence in structure and behavior. 5 . 6 . Argument Creation The operation of argument creation creates a new argument within a function- defining branch of an overall program in a more general way than the previously described operation of argument duplication. This operation is, in some sense, a generalization of the operation of argument duplication. The steps in the operation of argument creation are as follows: (1) Select a program from the population to participate in this operation. (2) Pick a point in the body of one of the function-defining branches of the selected program. (3) Add a uniquely-named new argument to the argument list of the picked function-defining branch for the purpose of defining the argument-to-be-created. (4) Replace the picked point (and the entire subtree below it) in the picked function-defining branch by the name of the new argument. (5) For each occurrence of an invocation of the picked function-defining branch anywhere in the selected program (e.g., the result-producing branch or other branch that invokes the picked function-defining branch), add an additional argument subtree to that invocation. In each instance, the added argument subtree consists of a modified copy of the picked point (and the entire subtree below it) in the picked function-defining branch. The modification is made in the following way: For each dummy argument in a particular added argument subtree, replace the dummy argument with the entire argument subtree of that invocation corresponding to that dummy argument. Figures 6 and 1 illustrate the operation of argument creation. 37 ~ PROGN ~ ~( ~ >~ Figure 6: Program with argument map of {3} created using operation of argument creation. the Suppose the point A N D 422 from figure 1 is the picked point, from step (2), in the body of the function-defining branch starting at D E F U N 410. This picked point, A N D 422, is the top-most point of the three-point subtree (AND ARGl ARG0) at 422, 423, and 424. In figure 6, A R G 2 iS the name given to the newly created argument and A R G 2 (labeled 3015) is added to the argument list starting at L I S T (labeled 3012) of the picked function-defining branch. In figure 6, A R G 2 3022 has replaced the picked point, A N D 422, and the entire subtree below the picked point from figure 1. There are two occurrences of invocations of the picked function-defining branch (ADF0), namely at 481 and 487 in figure 1. The first invocation of ADF0 in the selected program is the two-argument invocation (ADF0 Dl D2) at 481, 482, and 483 in figure 1. A new third argument is added for the invocation ADF 0 481. The new argument subtree is manufactured by starting with a copy of the picked point and the entire subtree below that picked point, namely (AND ARGl ARG0) at 422, 423, and 424 in figure 1. This copy is first modified by replacing the dummy argument A R G 1 423 by the argument subtree of the invocation of A D F 0 at 481 corresponding to the dummy argument A R G 1 423 (i.e., the single point D 2 483). Note that it is D 2 483 that corresponds because D 2 483 is the second argument subtree of A D F 0 481 and because 3 8 ARGl iS the second dummy argument of DEFUN 410. The copy is further modified by replacing the dummy argument ARG 0 424 by the argument subtree of the invocation of ADF0 at 481 corresponding to that dummy argument (i.e., the single point Dl 482). The result is that ADF 0 iS now invoked at 3081 in figure 6 with three (instead of two) arguments, namely (ADF0 Dl D2 (AND D2 Dl) ) . The second invocation of A D F 0 in the selected program is the two-argument invocation (ADF0 D3 (AND D4 D0) ) at 487, 488, 489, 490 and 491 in figure 1. A new third argument is added for the invocation AD F 0 487. The new argument subtree is manufactured by starting with a copy of the picked point and the entire subtree below that picked point from figure 1, namely (AND ARGl ARG0) at 422, 423, and 424. This copy is first modified by replacing the dummy argument A R G 1 423 by the argument subtree of the invocation of A D F 0 at 487 corresponding to the dummy argument ARGl 423, namely the entire argument subtree (AND D4 D0) at 489, 490, and 491. Note that a three-point subtree, ( AND D 4 D 0 ), corresponds because it is the second argument subtree of A D F 0 487 in figure 1 and because A R G 1 iS the second dummy argument of DEFUN 410. The copy is further modified by replacing the dummy argument A R G 0 424 by the argument subtree of the invocation of A D F 0 at 487 corresponding to that dummy argument (i.e., the single point D 3 488). The result is that AD F 0 iS now invoked at 3087 in figure 6 with three (instead of two) arguments, n a m e l y (ADF0 D3 (AND D4 D0) (AND (AND D4 D0) D3) ) . The operation of argument creation changes the argument map of the selected program from { 2 } to { 3 } because A D F 0 now takes three arguments, instead of two. The operation of argument creation does not have any immediate effect on the value(s) returned and the action(s) performed by the selected program. However, subsequent operations may alter the argument that is created by this operation and may therefore lead to some later divergence in structure and behavior. 39 6. Rotating the Tires on an Automobile A gedanken experiment can be used to illustrate the role of the six new architecture- altering operations in automated problem-solving. In the experiment, we will visualize four ways of using genetic programming to evolve a procedure to perform the hypothetical task of rotating the tires on an automobile. The to-be-evolved problem-solving procedure is assumed to have access to all the necessary information from the problem environment (such as the size of the bolts for fastening a tire to an axle, the number of bolts on a tire, the presence or absence of a hubcap, and so forth). In addition, the repertoire of primitive functions includes all the operations necessary to complete the task (such as removing a hubcap, unfastening a bolt, sliding the tire off of the axle, and so forth). The goal is to evolve a procedure will specify a sequence of operations that will successfully remove each of the four tires and remount them in their specified new locations for any given model of car. In addition, the to-be-evolved problem-solving procedure will have access to certain extraneous information (such as the color of the car) and certain extraneous operations (such as lifting the hood of the car). The task is designed to present considerable opportunities for reuse of sequences of procedural steps because all four tires can be removed and remounted in essentially the same way for any particular model of car. The task also presents opportunities for specialization and generalization because the only difference in the procedure for different models of cars lies in what a human planner would consider details (e.g., whether there is a hubcap to remove and remount). In addition, the task is designed to present opportunities for parameterization of procedures (e.g., based on the size and number of the bolts on each tire). The gedanken experiment will visualize the use of genetic programming on this task in the following three ways: (1) without automatically defined functions, (2) with automatically defined functions, (3) with automatically defined functions and with the technique of evolutionary selection of the architecture, and (4) with automatically defined functions and the six new architecture-altering operations . 6.1. Approach Without Automatically Defined Functions Five preparatory steps must be performed by the user prior to applying genetic programming without automatically defined functions to a particular problem. The user must first decide upon the ingredients from which the to-be-evolved programs will be composed. The terminal set for this hypothetical problem consists of the information-carrying variables (e.g., the number and size of the bolts, the presence or absence of hubcaps, the color of the car, etc.). The function set consists of the various primitive operations (e.g., unfastening a bolt, sliding the tire off of the axle, etc .) . The fitness measure of a particular program is computed over a suite of fitness cases consisting of various models of cars of different colors, with and without hubcaps, and with the tires being fastened by different numbers and sizes of bolts. Fitness might be the total number of tires that end up on their respective desired new axles after the complete execution of the program or after some reasonable amount of time . 4 o When automatically defined functions are not being used, the evolved programs, of course, consist only of a result-producing branch. In principle, genetic programming should be able to evolve a solution to this problem without employing automatically defined functions, provided the population size is sufficiently large and provided that the run is allowed to continue for sufficiently large number of generations. Although genetic programming usually solves problem in a unexpected ways, we will, for purposes of this gedanken experiment, make some statements about the likely characteristics of the evolved solution. Since automatically defined functions are not being used, each identical or marginally different situation must necessarily be handled by a uniquely-crafted and separately-learned sequence of steps. There is no possiblity of multiple invocation of any part of the evolved code of a program during the execution of the program. Thus, there cannot be one common procedure that is reused four times on each of the tires. Instead, the overall evolved solution will contain four uniquely- crafted (and almost certainly different) sequence of steps for handling each of the four tires. Similarly, there will not be one common procedure that is reused for each of the bolts belonging to each tire. Instead, the solution will contain a uniquely- crafted sequence of steps for unfastening each bolt on each tire. Moreover, since automatically defined functions are not being used, there would be no possibility for parameterizing the sequence of steps for unfastening the bolts. Thus, models of cars whose tires are fastened by one size of bolts cannot be handled with a common sequence of reusable code in the same way as models with another size of bolts. Instead, there will be one sequence of code that causes the 1/2" wrench to be used when the bolt size is determined to be 1/2" and an additional separate sequence of code that causes the 5/8" wrench to be used on 5/8" bolts. Moreover, there will be one sequence of code for handling models of cars whose tires have six bolts, and a separate sequence of code for handling tires that have five or four bolts. Because the run without automatically defined functions is incapable of exploiting the considerable underlying symmetry and regularity of this problem environment and incapable of reusing parameterized sequences of code, the overall program necessary to perform this task will be very large. In addition, a large amount of computational effort will have to be expended to evolve this large block of code. 6 . 2 . Approach With Automatically Defined Functions When automatically defined functions are being used, a solution to a problem employing automatically defined functions can be interpreted such that each automatically defined function represents a subproblem; the body (work) of each automatically defined function represents the solution to a subproblem; and, the result-producing branch represents the assembly of the solution to the subproblems into a solution of the overall problem. Each automatically defined function also can be interpreted as a change of representation. If, for example, one compares a solution to the overall problem by a result-producing branch that does not invoke any automatically defined functions to one that does, the difference can be viewed as a change of representation. The version of the solution with automatically defined functions solves the problem in terms of the new representation created by the application of the automatically defined functions. If automatically defined functions are added to genetic programming, we can expect to get a solution to the overall problem that would reuse and parameterize certain sequences of steps, rather than handling each identical or marginally different situation with a uniquely-crafted and separately-learned sequence of steps. 41 For example, we can reasonably anticipate a solution in which a tire-dismounting function evolves within the overall multi-part program so that the result-producing branch of the overall program is able to invoke this function with the bolt size as a parameter. When the bolt-unfastening function is parameterized by the bolt size, separate code will not be necessary for handling each different size of bolt. The ability to generalize is an important aspect of artificial intelligence and automated programming. A parameterized automatically defined function can be interpreted as a generalization in the sense that a parameterized function-defining branch is a generalized procedure for performing some subtask. A particular instantiation of the parameter has the effect of specializing the general procedure to a particular situation. We can reasonably expect that the size of the overall program will be considerably smaller than the size of the program evolved without automatically defined functions and that less computational effort will have been expended in the evolution of the solution of this problem with automatically defined functions than without them. That is, parsimony should be an emergent consequence of the availability of automatically defined functions. When automatically defined functions are being used, the evolved program would, of course, consist of a result-producing branch and one or more function-defining branches. The question arises as to how many automatically defined functions will there be? And, how many arguments will each automatically defined function possess? Before the user can apply genetic programming with automatically defined functions to a problem, the user must employ some technique (probably one of the five existing previously described techniques) for determining the architecture of the overall solution. Once chosen, the architecture of the overall program constrains the decomposition of the original problem into subproblems. Each particular architecture permits or prohibits certain ways of decomposing the overall problem into subproblems. A particular architecture may facilitate certain decompositions while making others less likely. The number of available automatically defined functions affects, in general, the character of the solution. If, for example, two or more automatically defined functions are available, we might find separate automatically defined functions devoted to the tire-dismounting and bolt-unfastening subtasks. However, if only one automatically defined function is available, both the tire-dismounting or the bolt- unfastening subtasks would have to be performed inside one automatically defined function or one of these subtasks would have to be performed in the result- producing branch (where there would be no opportunity for parameterized reuse of code) . The number of arguments possessed by each automatically defined functions also generally affects the character of the solution. If, for example, at least one of the available automatically defined functions possesses two or more arguments, then we might see one automatically defined function that is parameterized by several variables. However, if each automatically defined function possesses only one argument, then it would not be possible to take an action inside any of the automatically defined functions based on two or more variables. 6.3. Approach With Evolutionary Selection of the Architecture As previously mentioned, evolutionary selection of the architecture during the run of genetic programming is one of the five existing techniques by which the user can determine the architecture of the overall program (Koza 1994a, chapters 21-25). In 4 2 this technique, the population is architecturally diverse starting with the initial random population of generation 0. Then there is a competition among the preexisting architectures in population, However, no architecture is actually created or altered during the run in the technique of evolutionary selection of the ~ r~. h i t e c t 1~ re 6 . 4. Approach With Architecture-Altering Operations The new architecture-altering operations described herein provide a sixth way to determine the architecture. The architecture-altering operations enable the architecture to be evolved dynamically and automatically during the run of genetic programming in the sense of actually creating new architectures and altering existing architectures during the run. As previously mentioned, the six new architecture-altering operations can be viewed from five perspectives. First, the new architecture-altering operations provide a new way to solve the problem of determining the architecture of the overall program in the context of genetic programming with automatically defined functions. Second, the new architecture-altering operations provide an automatic implementation of the ability to specialize and generalize in the context of automated problem-solving . Third, the new architecture-altering operations automatically and dynamically change the representation of the problem while simultaneously and automatically solving the problem. Fourth, the new architecture-altering operations automatically and dynamically decompose problems into subproblems and then automatically solve the overall problem by assembling the solutions of the subproblems into a solution of the overall p rob lem . Fifth, the new architecture-altering operations automatically and dynamically discover useful subspaces (usually of lower dimensionality than that of the overall problem) and then automatically assemble a solution of the overall problem from solutions applicable to the individual subspaces. When automatically defined functions are being used (but without the technique of evolutionary selection of the architecture and without the architecture-altering operations), the third, fourth, and fifth perspectives apply. When automatically defined functions are being used with the technique of evolutionary selection of the architecture (but without the architecture-altering operations), the first perspective applies. Thus, we focus our attention on the second perspective involving specialization and generalization. As previously mentioned, the initial random population may be uniform and ADF- less; it may be uniform and consist of programs with the minimal ADF structure (i.e., the "minimalist approach"); or, it may be architecturally diverse. We use the "minimalist approach" for purposes of the discussion here. 6.4.1. Generalization by Means of Branch Deletion and Argument Deletion During a run of genetic programming with the new architecture-altering operations, arguments and branches of programs may be deleted. These deletions may be interpreted as generalizations in the sense that a generalization ignores some previously available information or deletes some previously performed step in carrying out a task. The deletion of an argument or branch may be interpreted as a generalization of the procedure represented by the branch. 43 Of course, some generalizations are useful and some are not. The process of generalizing a procedure is often useful because it permits the development of a procedure that is applicable to a wider variety of situations. Generalization may also be desirable because it shortens the description of the procedure (i.e., improves parsimony). First consider the deletion of an argument. The deletion of the color of the car from the argument list of a branch would be a helpful simplification of the procedure performed by the branch since there is no possible benefit from this extraneous variable in solving the problem of rotating the tires. If the color of the car is actually used by the branch, deleting this argument will make the branch applicable to a wider variety of situations (with no degradation in performance as measured by the fitness measure of the problem). If the color is ignored, deleting this argument will make the branch more parsimonious. On the other hand, deletion may make vital information unavailable. Assuming that argument deletion by consolidation or random regeneration are being used, deleting an argument communicating vital information would (almost always) be an unhelpful generalization. The size of the tire's bolts is a necessary argument because the task cannot be performed with a wrench of unsuitable size. The tire-dismounting function can not properly be performed without knowing how many bolts to remove. Now consider deletion of a branch. The operation of branch deletion may be viewed as a generalization at the procedural level. After branch deletion by consolidation or random regeneration is performed, the overall program will (usually) be less specialized than before. For example, the deletion of a branch that lifts the car's hood would be a helpful simplification of the procedure performed by the overall program since lifting the hood is not helpful in solving the problem of rotating the tires. On the other hand, it would be unhelpful to delete a branch that checks whether a hubcap is present because the wrench is useless for removing bolts if the bolts are made inaccessible by an unremoved hubcap. In either event, the operation of branch deletion usually produces a procedure with improved parsimony. There is no way of knowing in advance whether a particular generalization will be helpful. However, natural selection will ultimately judge whether the offspring produced by a particular deletion prove to be more fit or less fit in grappling with the problem environment. 6.4.2. Specialization by Means of Branch Duplication and Argument Duplication In many instances of automated problem-solving, it is desirable to have the ability to split the problem environment into different cases and treat the cases slightly differently. The operations of branch duplication and argument duplication set the stage for enabling a procedure to be specialized and refined at a later time. Such a specialization or refinement permits such slightly differing treatments of two similar, but different situations. This specialization can occur at the argument level or procedure level. For example, even though the procedure for changing the tire may be similar for all cars, the procedure must be specialized to each particular type of car by including a certain argument, such as the size of the bolts. Argument duplication provides a way to set the stage so that subsequent evolution can incorporate this additional i nfo rm ati on . Branch duplication can provide a way to set the stage so that subsequent evolution can create a slightly different procedure for the two models of cars. The procedure for changing the tire on a Cadillac may differ from the procedure for a Honda 4 4 because the Cadillac may have a hubcap that must be removed to expose the bolts so that the wrench can then unfasten the bolts. During the run, branches~ or arguments will be duplicated. For example, consider a program with a branch that hastily tries to unfasten the bolts without checking for the presence of a hubcap. Such a branch works satisfactorily for a car without hubcaps. A program with this branch will accrue a certain amount of fitness for correctly handling the subset of fitness cases involving cars with no hubcaps. However, this program will not receive the maximum possible value of fitness because it will not correctly handle the fitness cases involving cars with hubcaps. After a branch duplication, there are two branches and both start by hastily trying to unfasten the bolts without checking for the presence of a hubcap. Each of these identical branches is invoked under different (randomly chosen) circumstances by their calling branches. Although branch duplication has no immediate semantic effect, subsequent genetic operations may alter one or the other new branch. Some of these alterations will make the overall program more fit at grappling with the problem environment by specializing one branch to some particular situation. For example, one of these branches may be modified by a subsequent genetic operation (e.g., a crossover or mutation) so that it checks for the presence of a hubcap, removes the hubcap, and then continues as before. This specialization is a potential benefit of a branch duplication. Arguments may also be duplicated during the run. Although argument duplication has no immediate effect, the presence of an additional argument may, after subsequent operations, make the altered program more fit at grappling with certain fitness cases from the problem environment by enabling it to specialize its behavior in response to the additional information provided by the additional argument. Of course, a specialization may be productive, counter-productive, useless or harmless in the context of a particular problem. For example, an argument duplication may make the color of the car available to a branch. This useless information will, at best, be harmless. This argument duplication may be harmful if it overspecializes the branch or reduces the efficiency of operation of the branch. 6.4.3. Specialization by Means of Branch Creation and Argument Creation When the initial random population is uniformly ADF-less, the operation of branch creation is necessary to create function-defining branches so that part of the initial result-producing branch can become parameterized and generalized. The existence of some parameterized branches are necessary if there are to be multiple invocations of portions of the overall program (and the associated benefit in terms of the overall computational effort necessary to solve the problem). In any event, regardless of how the initial random population is created, the operation of branch creation provides the potential benefit of achieving specialization in the future. Similarly, the operation of argument creation provides the potential benefit of achieving specialization in the future. When operating together, the new architecture-altering genetic operations are useful in enabling the simultaneous evolution of the architecture of the overall program for solving a problem while solving the problem. That is, the architecture of the eventual solution to the problem need not be preordained by the user during the preparatory steps. Instead, the architecture can emerge from a competitive fitness-driven process that occurs during the run at the same time as the problem is being solved. 45 7. Examples of Actual Runs The architecture-altering operations described herein will now be illustrated by showing two actual runs of the problem of symbolic regression. The problem requires evolution of a computer program to mimic the behavior of the even-3- parity function. The frequency of use of the operations of branch duplication, argument duplication, branch deletion, argument deletion, branch creation, and argument creation are controlled by parameters. We often use frequencies in the neighborhood of 1/2% of each operation on each generation; however, for purposes of illustration, we used larger percentages for the two runs described below in order to concentrate more instances of the new architecture-altering operations into a relatively small number of generations. This concentration also enables us to show an entire genealogical audit trail for the first run and the entire maternal lineage for the second run. 7.1. Example 1 The run starts with the random creation of a population of 1,000 individual programs. The "single minimal ADF" approach is used in this example. That is, each program in the initial random population at generation 0 consists of one result- producing branch and a single one-argument function-defining branch and has an argument map of { 1). Thus, the terminal set for the result-producing branch, 'rrpb, for a program in the population for the Boolean even-3-parity problem is ~Irpb = (D0, Dl, D2 } - The function set for the result-producing branch, ~rpb~ is rpb = {AND, OR, NAND, NOR, ADF O ), with an argument map of {2,2,2,2,1). The terminal set for the automatically defined function, ADF 0, iS adfr) = {ARG0 } . The function set, ~adfO. for ADF0 iS ~adf~ = {AND, OR, NAND, NOR), with an argument map for this function set of {2,2,2,2). After creating the 1,000 programs for the initial random population, each program in the population is evaluated as to how well it solves the problem at hand. The fitness 4 6 of a program in the population of 1,000 programs is measured according to how well that program mimics the target function for all eight combinations of three Boolean arguments. The raw fitness~ of a program is the number of matches. In one particular run, the best program from among the 1,000 randomly created programs in generation 0 has the function-defining branch (defining ADF0) shown below: (OR (AND (NAND ARG0 ARG0) (OR ARG0 ARG0)) (NOR (NOR ARG0 ARG0) (AND ARG0 ARG0))). The behavior of this function-defining branch is the Boolean constant function zero (called "Always False"). The result-producing branch of this best-of-generation program from generation 0 ignores ADF0 and is shown below: (NOR (AND D0(NOR D2 D1)) (AND (AND D2 D1))). Of course, it should be no surprise that the function-defining branch of even the best program of the initial random generation is not particularly useful or that this branch is ignored by the result-producing branch. The "single minimal ADF approach" is not intended to provide a highly useful function-defining branch, but rather merely to provide a starting point for the evolutionary process. Table 5 shows the behavior of this program from generation 0. The first three columns show the values of the three Boolean variables, D0,D1, and D2. The fourth column shows the value produced by the overall program. The fifth column shows the value of the target function, the even-3-parity function. The last column shows how well the program performed at matching the behavior of the target function. As is shown, the program was correct for six of the eight possible combinations (fitness cases). Thus, the program scored a raw fitness of 6 (out of a possible 8). Table 5 O eration of the best-of-~eneration pro~ram from ~eneration 0. Di D2 BEST-OF- EVEN-3- SCORE GENERATION PARITY PROGRAM FUNCTION FOR GENERATION O 0 0 0 1 1 correct 0 0 1 1 0 wronq 0 1 0 1 0 wronq 0 1 1 1 1 correct 1 0 0 0 0 correct 1 0 1 1 1 correct 1 1 0 1 1 correct 1 1 1 0 0 correct A new population of 1,000 programs is then created from the existing population of 1,000 programs. Each successive generation of the population is created from the existing population by applying various genetic operations. Reproduction and crossover are the most frequently performed genetic operations. In addition, the architecture-altering operations described herein are used on this run. Mutation and other previously described genetic operations may also be used in the process (although they are not used on this particular run). The raw fitness of the best-of-generation program for generation 5 improves to 7. That is, this program correctly mimics the behavior of the target even-3-parity 47 function for seven of the eight fitness cases. The program achieving this new and higher level of fitness has a total of four branches (i.e., one result-producing branch and three function-defining branches). The change in the number of branches from 1 at generation 0 to 4 at generation 5 is the consequence of the architecture- altering operations. In addition to its one result-producing branch, this best-of- generation program for generation 5 has branches defining A D F 0 (taking two arguments), A D F 1 (taking two arguments), and A D F 2 (taking three arguments), so that its argument map is {2, 2, 3 }. The result producing branch of this program is shown below . (NOR (ADF2 D0 D2 Dl) (AND (ADFl D2 Dl)D0) ) The first function-defining branch (defining A D F 0 ) of the best-of-generation program for generation 5 takes two dummy arguments, ARG0 and ARGl, and is shown below. The existence of two dummy arguments in this function-defining branch is a consequence of an argument duplication operation. As it happens, the behavior of this A D F 0 iS not important since A D F 0 iS not referenced by the result-producing b r an c h . (OR (AND (NAND ARG0 ARG0) (OR ARGl ARG0) ) (NOR (NOR ARGl ARG0) (AND ARG0 ARGl) ) ) The second function-defining branch (defining A D F 1 ) of the best-of-generation program for generation 5 also takes two dummy arguments, ARG0 and ARGl, and is shown below. The existence of this second function-defining branch is a consequence of a branch duplication operation. (OR (AND ARG0 ARGl) (NOR ARG0 ARGl) ) Table 6 shows the behavior of A D F 1 of the best-of-generation program for generation 5 which, as can be seen, is equivalent to the even-2-parity function. Table 6 AD F 1 of the best-of-generation program of generation 5. ARG0 ARGl ADF0 O 1 O 1 1 1 The function-defining branch for A D F 2 of this best-of-generation program for generation 5 takes three dummy arguments, ARG0, ARGl, and ARG2, and is shown below. This third function-defining branch exists as a consequence of yet another branch duplication operation. (AND ARGl (NOR ARG0 ARG2) ) Table 7 shows that the behavior of ADF2 consists of retuming 1 only when ARG0 and ARG2 are 0 and ARGl iS 1. 4 8 Table 7 A D F 2 of the best-of-generation program of generation 5. ARG O 1 = O O 1 O 1 1 1 ~ 1 O O O 1 1 1 1 O The raw fitness of the best individual program in the popul ation remains at a value of 7 for generations 6, 7, 8, and 9; however, the average fitness of the population as a whole improves during these generations. On generation 10, the best program in the population of 1,000 perfectly mimics the behavior of the even-3-parity function. This 100%-correct solution to the problem has a total of six branches (i.e., five function-defining branches and one result- producing branch). The argument map of this program is { 2, 2, 3, 2, 2 } . This multiplicity of branches is a consequence of the repeated application of the branch duplication operation and the branch creation operation. The function-defining branches of this program each have more than one dummy argument. All of these additional arguments exist as a consequence of the repeated application of the argument duplication operation. The result-producing branch of this best-of-generation program for generation 1 n i~ ~hown below: (NOR (ADF4 D0 (ADFl D2 Dl) ) (AND (ADFl D2 Dl) D0) ) The function-defining branch for A D F 0 of this best-of-generation program for generation 10 takes two dummy arguments, ARG0 and ARGl, and is shown below. The behavior of AD F 0 iS equivalent to the odd-2-parity function. (OR (AND (NAND ARG0 ARG0) (OR ARGl ARG0) ) (NOR (NOR ARGl ARG0) (AND ARG0 ARGl) ) ) The function-defining branch for A D F 1 of the best-of-generation program for generation 10 takes two dummy arguments, ARG0 and ARGl, and is shown below. ADFl is equivalent to the even-2-parity function. (OR (AND ARG0 ARGl) (NOR ARG0 ARGl) ) The function-defining branch for ADF2 takes three dummy arguments, ARG0, ARGl, and ARG2, and is shown below. ADF2 returns 1 only when ARG0 and ARG2 are 0 and ARGl is 1. However, A D F 2 iS ignored by the result-producing branch. (AND ARGl (NOR ARG0 ARG2) ) The function-defining branch for A D F 3 iS the one-argument identity function. This relatively useless branch is ignored by the result-producing branch. The function-defining branch for A D F 4 of the best-of-generation program for generation 10 takes two dummy arguments, ARG0 and ARGl, and is shown below. ADF4 is equivalent to the even-2-parity function. 49 (OR (AND ARG0 ARGl) (NOR ARG0 ARGl)) Since both ADFl and ADF4 are both even-2-parity functions, the result-producing branch can be simplified to the expression below. This expression can be verified as being equivalent to the even-3-parity function. (NOR (EVEN-2-PARITY D0(EVEN-2-PARITY D2 Dl)) (AND (EVEN--2--PARITYD2 Dl) D0)) An examination of the genealogical audit trail shows the interplay between the Darwinian reproduction operation, the one-offspring crossover operation using point typing, and the new architecture-altering operations. Figure 7 shows all of the ancestors of the just-described 100%-correct solution from generation 10 of the run in example 1 of the problem of symbolic regression of the even-3 -parity problem. The generation numbers (from 0 to 10) are shown on the left edge of figure 7. Figure 7 also shows the sequence of reproduction operations, crossover operations, and architecture-altering operations that gave rise to every program that was an ancestor to the 100%-correct program in generation 10. The 100%-correct solution from generation 10 is represented by the box labeled M10 at the bottom of the figure. The argument map of this solution, namely { 2, 2, 3, 2, 2 }, is shown in this box. The two lines flowing into the box M10 indicate that the solution in generation 10 was produced by a crossover operation acting on two programs from the previous generation (generation 9). Figure 7 uses the convention of placing the mother M9 (the receiving parent) on the right and father P9 (the contributing parent) on the left. Recall that, in a one-offspring crossover operation using point typing, the bulk of the structure of a multi-part program comes from the mother since the father contributes only one subtree into only one of the many branches of the mother. Thus, the 11 boxes on the right side of this figure (consecutively numbered from MO to M10) represent the maternal genetic lineage (from generations 0 through generation 10) of the 100%-correct solution M10 that emerged in generation 10. The 100%-correct solution M10 in generation 10 has the same argument map, {2, 2, 3, 2, 2}, as the mother M9 because the crossover operation is not an architecture-altering operation and does not change the architecture (or argument map) of the offspring (relative to the mother). The maternal lineage will now be reviewed in detail so as to illustrate the overall process of evolving the architecture of a solution to a problem while simultaneously evolving the solution to the problem. The mother M9 from generation 9 (shown on the right side of figure 7) has an argument map of { 2, 2, 3, 2, 2 }, has a raw fitness of 7, was itself the result of a crossover of two parents from generation 8. The grandfather of the 100%-correct solution M10 in generation 10 (and the father of M9) was P8. The grandmother of the 100%-correct solution M10 in generation 10 (and the mother of M9) was M8. The grandmother M8 from generation 8 of the 100%-correct solution M10 in generation 10 (and the mother of M9) has an argument map of { 2, 2, 3, 2, 2 }, has a raw fitness of 7, and was the result of a branch duplication from a single ancestor M7 from generation 7. Because of the branch duplication operation, the program M7 from generation 7 of the maternal lineage at the far right of figure 7 has one fewer branch than its offspring M8. Program M7 has an argument map of { 2, 2, 3, 2 } . Program M7 was the result of an argument duplication from a single ancestor from generation 6. ~ n Figure 7: Genealogical audit trail for example 1. Because of the argument duplication operation, the fourth function-defining branch of the program M6 from generation 6 of the maternal lineage at the far right of figure 7 has one less argument than its offspring M7. Program M6 from generation 6 has an argument map of {2, 2, 3, 1 } whereas program M7 from generation 7 has an argument map of { 2, 2, 3, 2 } . Program M6 was the result of an branch creation from a single ancestor M5 from generation 5. Because of the branch creation operation, the program M5 from generation 5 shown on the right side of figure 7 has one fewer function-defining branch then program M6. Program M5 has an argument map of { 2, 2, 3 } . In turn, program MS was the result of a branch creation from a single ancestor M4 from generation 4. Program M4 from generation 4 shown on the right side of figure 7 has one less function-defining branch than its offspring program M5. Program M4 has an argument map of { 2, 2 } . Program M4 was the result of a reproduction operation from a single ancestor M3 from generation 3. Program M4 from generation 3 shown on the right side of figure 7 has an argument map of { 2, 2) and was the result of a crossover involving father P2 and mother M2 from generation 2. 51 Program M2 from generation 2 (shown on the right side of figure 7) has an argument map of { 2, 2 } and was the result of a branch creation from a single ancestor M1 from generation 1. Program M1 from generation 1 has an argument map of {2 } and was the result of an argument duplication of a single ancestor M0 from generation 0. Program M0 from generation 0 at the upper right corner of figure 7 has an argument map of { 1 } and has a raw fitness of 6. It has an argument map of { 1 } because all programs at generation 0 consist of one result-producing branch and a single one-argument function-defining branch when the "single minimal ADF" approach is being used. The sequence of genetic operations and architecture-altering operations of this run shows the simultaneous evolution of the architecture while solving the problem. 7 . 2. Example 2 A second run of the process is now described in order to illustrate the evolution of a hierarchical reference by one function-defining branch of another and to illustrate the operation of branch deletion. In this run, a 100%-correct solution emerges in generation 15 to the problem of symbolic regression of the even-3-parity problem. The best-of-generation program from generation 0 of this run has a raw fitness of only 5. There are many programs in the population with this level of fitness. This program M0 is an early ancestor of the 100%-correct solution that eventually emerges in generation 15. The result-producing branch of this program from generation 0 is shown below: (OR (NOR D2 (ADF0 (AND Dl D0))) (AND D2 (ADF0 (NAND D0 D0)))). ADF 0 of this program is shown below: (AND (NAND (OR ARG0 (OR ARG0 ARG0)) (AND (AND ARG0 ARG0) (OR ARG0 ARG0))) (AND (NAND (OR ARG0 ARG0) (NOR ARG0 ARG0)) (OR (AND ARG0 ARG0) (NOR ARG0 ARG0)))). A D F 0 of this best-of-generation program from generation 0 is the one-argument negation (N O T) function. The one-argument N O T function was not one of the four primitive functions of the original problem (i.e., the two-argument AND, OR, NAND, and N O R functions). Figure 8 shows all of the maternal ancestors of the 100%-correct solution from generation 15 (labeled M15 at the bottom of the figure) of the run of example 2. The figure also shows a few selected other ancestors. The figure also shows the argument map of each program in the box for that program. In addition, the raw fitness of each program is shown to the immediate left of its argument map in the box for that program. Because 15 generations are involved, this figure, unlike figure 7, does not show all of the ancestors of the 100%-correct solution, M15, of generation 15. The program labeled M0 at the upper right of this figure is the program from generation 0 shown above, with an argument map of { 1 } and a raw fitness of 5. 5 2 Genealogical audit trail for example to produce program M 1 in Figure 8: A branch duplication operates on program M0 generation 1 with an argument map of { 1, 1 }. Two crossovers occurring in generations 2 and 3 maternal ancestor M3 in generation 3 from 5 to 6. Program M3 has two one-argument function-defining branches. "Always False" and is shown below: raise the raw fitness of the Its A D F 0 iS (AND (NAND (OR ARG0 (OR ARG0 ARG0)) (AND (AND ARG0 ARG0) (OR ARG0 ARG0))) (AND (NAND (OR ARG0 ARG0) (NOR ARG0 ARG0)) (OR (AND ARG0 ARG0) (OR ARG0 ARG0))))- 53 A D F 1 of this program M3 is the exactly the same as A D F 0 of ancestor MO at generation 0 (i.e., the N O T function). The branch duplication that created program Ml in generation 1 duplicated A D F 0 of ancestor MO of generation 0. Then the intervening crossovers modified A D F 0 SO as to convert it into the useless "Always False" function. Then, a branch deletion removes the now-always-false A D F 0 of program M3 to produce program M4 in generation 4. The surviving function-defining branch of program M4is exactly the same as AD F 0 of ancestor MO at generation 0 (i.e., it is the N O T function). Program M4 retains a fitness level of 6. The result-producing branch of program M4 from generation 4 is shown below: (OR (NOR D2 (ADF0 (AND Dl D0) ) ) (AND D2 (OR Dl D0) ) ) . Next, a branch creation operation takes creates a new branch, A D F 1, of program MS of generation 5 from the underlined portion of the result-producing branch of program M4 above. The new branch, AD F 1, iS shown below: (ADF0 tAND ARGl ARG0) ) . The result-producing branch of program M4 of generation 4 is also modified and the modified version is part of program MS of generation 5 as shown below: (OR (NOR D2 (ADFl Dl D0) ) (AND D2 (OR Dl D0) ) ) . Two crossovers, one reproduction, one branch creation, and one branch duplication then occur on the maternal lineage. The A D F 0 of program M10 is the N O T function. A D F 1 of program M10 from generation 10 uses the hierarchical reference created above to emulate the behavior of the NAND function. as shown below: (ADF0 (AND ARGl ARG0) ) Mother M10 mates with father P10 to produce offspring M11 in generation 11. AD F 0 of father P10 performs the even-2-parity function and is shown below: (AND (NAND (OR ARGl ARG0) (AND (NAND ARGl ARG0) (OR ARGl ARGl) ) ) (AND (NAND ARG0 (NOR ARGl ARG0) ) (OR (AND ARGl ARGl) (NOR ARG0 ARG0) ) ) ) . In the crossover, this entire branch from father P10 was inserted into A D F 1 of mother M10 replacing (AND ARGl ARG0) and producing a new ADFl that performs the odd-2-parity function (since ADF 0 performs the NOT function). Three crossovers and one branch creation then occur on the maternal lineage; however, the A D F 1 that was created in generation 11 and that performs the odd-2- parity functions remains intact. The 100%-correct solution that emerged in generation 15 had an argument map of {1, 2, 1, 1, 2}. Only ADF0 and ADFl of this particular program are referenced by the result-producing branch. ADF 0 performs the NOT function and is shown below: (AND (NAND (OR ARG0 ARG0) (AND (AND ARG0 ARG0) (OR ARG0 ARG0) ) ) (AND (NAND (OR ARG0 ARG0 ) (NOR ARG0 ARG0 ) ) (OR (AND ARG0 ARG0 ) (NOR ARG0 ARG0) ) ) ) . 5 4 AD F 1 defines the odd-2-parity function by hierarchically referring to AD F 0. AD F 1 iS shown below: (ADF0 (AND (NAND (OR ARGl ARG0) (AND (NAND ARGl ARG0) (OR ARGl ARGl))) (AND (NAND ARG0 (NOR ARGl ARG0)) (OR (AND ARGl ARGl) (NOR ARG0 ARG0))))). As can be seen, a hierarchy of function definitions has emerged to solve the problem of symbolic regression of the even-3-parity problem using the new architecture-altering operations. 8. Conclusions 55 We have shown that it is possible to evolve the architecture using the architecture- altering operations while simultaneously solving a problem. 9. Future Work More computational effort is required to perform both tasks simultaneously than to merely solve the problem when the architecture is given and fixed. Future work will attempt to quantify this amount of additional computation effort. 10. Acknowledgements David Andre and Walter Alden IBM PC type of computer operations described above. this paper. Tackett wrote the computer program in ANSI C for an to implement five of the six architecture-altering Simon Handley made numerous helpful comments on 5 6 1 1. Bibliography Axelrod, Robert. The Evolution of Cooperation. New York: Basic Books 1984. Axelrod, Robert. The evolution of strategies in the iterated prisoner's dilemma. In Davis, Lawrence (editor). Genetic Algorithms and Simulated Annealing London: Pittman 1987. Angeline, P. J. 1994a. Evolutionary Algorithms and Emergent Intelligence. Ph.D. dissertation. Computer Science Department. The Ohio State University. Angeline, P. J. 1994b. Genetic programming and the emergence of intelligence. In Kinnear, K. E. Jr. (editor). Advances in Genetic Programming. The MIT Press. Angeline, Peter J. and Pollack, Jordan B. 1993. Competitive environments evolve better solutions for complex tasks. In Forrest, Stephanie (editor). Proceedings of the Fifth International Conference on Genetic Algorithms. San Mateo, CA: Morgan Kaufmann Publishers Inc. Pages 264-270. Angeline, Peter J. and Pollack, Jordan B. 1994. Coevolving high-level representations. In Langton, Christopher G. (editor). Artificial Life 111, SFI Studies in the Sciences of Complexity. Volume XVII Redwood City, CA: Addison- Wesley. Pages 55-71. Brooks Low, K. 1988. Genetic recombination: A brief overview. In Brooks Low, K. (editor) The Recombination of Genetic Material. San Diego: Academic Press. Pages 1-21. Cavicchio, Daniel J. 1970. Adaptive Search using Simulated Evolution. Ph.D. dissertation. Department of Computer and Communications Science, University of Michigan. Darwin, Charles. 1859. On the Origin of Species by Means of Natural Selection. J o hn Mu rray . Davis, L. (editor). 1987. Genetic Algorithms and Simulated Annealing. Pittman. Davis, L. 1991. Handbook of Genetic Algorithms. Van Nostrand Reinhold. Dyson, Paul and Sherratt, David. 1985. Molecular mechanisms of duplication, deletion, and transposition of DNA. In Cavalier-Smith, T. (editor). The Evolution of Genome Size. Chichester: John Wiley & Sons. Forrest, S. (editor). 1993. Proceedings of the Fifth International Conference on Genetic Algorithms. Morgan Kaufmann. Galli, Joakim and Wislander, Lars. 1993. Two secretary protein genes in Chironomus tentans have arisen by gene duplication and exhibit different developmental expression patterns. Journal of Molecular Biology. Volume 231. Pages 324-334. Go, Mittko. 1991. Module organization in proteins and exon shuffling. In Osawa, S. and Honjo, T. (editors). Evolution of Life. Tokyo: Springer-Verlag. Goldberg, David E. 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, MA: Addison-Wesley. Goldberg, David E., Korb, Bradley, and Deb, Kalyanmoy. 1989. Messy genetic algorithms: Motivation, analysis, and first results. Complex Systems. 3(5): 493- 530. Holland, John H. 1975. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. Ann Arbor, MI: University of Michigan Press. The second edition is currently available from The MIT Press 1992. Hood, Leory and Hunkapiller, Tim. 1991. Modular evolution and the immunoglobin gene superfamily. In Osawa, S. and Honjo, T. (editors). Evolution of Life. Tokyo: Springer-Verlag. Kinnear, Kenneth E. Jr. (editor). 1994. Advances in Genetic Programming. Cambridge, MA: The MIT Press. 57 Koza, John R. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge, MA: The MIT Press. Koza, John R. 1994a. Genetic Programming 11: Automatic Discovery of Reusable P rograms. Cambridge, MA: The MIT Press. Koza, John R. 1994b. Genetic Programming 1I Videotape: The Next Generation. Cambridge, MA: The MIT Press. Koza, John R., and Rice, James P. 1992 .Genetic Programming: The Movie. Cambridge, MA: The MIT Press. Lindgren, Kristian. 1991. Evolutionary phenomena in simple dynamics. In Langton, Christopher, Taylor, Charles, Farmer, J. Doyne, and Rasmussen, Steen (editors). Artificial Life 11, SFI Studies in the Sciences of Complexity. Volume X. Redwood City, CA: Addison-Wesley. Pages 295-312. Maeda, Nobuyo and Smithies, Oliver. 1986. The evolution of multigene families: Human haptoglobin genes. Annual Re~iew of Genetics. Volume 20. Pages 81- 108. Michalewicz, Z. 1992. Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag. Nei, Masatoshi. 1987. Molecular Evolutionary Genetics. New York: Columbia University Press. Ohno, Susumu. 1970. Evolution by Gene Patthy, Laszlo. 1991. Modular exchange Structural Biology. Volume 1. Pages Smith, T. F. and Waterman, M. S. 1981. subsequences. Journal of Molecular Duplication. New York: Springer-Verlag. principles in proteins. Current Opinion in 351-361. Identification of common molecular Biology. Volume 147. Pages 195-197. Stryer, Lubert. 1988. Biochemistry. W. H. Freeman Third Edition.