Talk:Gene expression programming

Robotics Start‑class Low‑importance

	This article is within the scope of WikiProject Robotics, a collaborative effort to improve the coverage of Robotics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.RoboticsWikipedia:WikiProject RoboticsTemplate:WikiProject RoboticsRobotics
Start	This article has been rated as Start-class on Wikipedia's content assessment scale.
Low	This article has been rated as Low-importance on the project's importance scale.

GEP vs Linear Genetic Programming (LGP)

Can anyone give a good comparison between Gene Expression Programming (GEP) and Linear Genetic Programming (LGP)? According to this article, it seems that the only difference between GEP and Genetic Programming (of which LGP is a subset) is how programs, in the population to evolve, are represented. Quoting the relevant fragment:

"In Genetic Programming the individuals comprising a population are typically symbolic expression trees; however, the individuals comprising a population of GEP are encoded as linear chromosomes, which are then translated into expression trees. That is, in GEP, the genotype (the linear chromosomes) and the phenotype (the expression trees) have a different encoding structure."

Emphasis mine. Now, "typically" does not exclude other representations, indeed LGP is a counter-example. In LGP each program from the population is represented as a sequence of instructions, sometimes directly as machine instructions as in the approach of Peter Nordin, which doesn't need a translation phase "from genotype to phenotype", to use GEP questionable terminology (IMO the author doesn't understand the difference between a process - a running program - and its specification - a program properly encoded; using GEP analogy, a program is always genotype). I concur with section "Dubious claims throughout" below, where the merits of such translation in GEP is questioned. --MaD70 (talk) 15:27, 24 January 2010 (UTC)[reply]

The difference is the way that the programs are represented during the reproduction stage. In GEP this allows the use of genetic algorithm methodologies while in the genotype form and defines a upper limit to program size. Genotype representation is used to simplify the crossover and mutation process by using short strings represent the allowed operators and operands. Both GEP and LGP are sub-sets of Genetic Programming that use different approaches to achieve the GP goal. I do think the article needs to be re-written with the GEP algorithm clearly explained. Bob0the0mighty (talk) 14:51, 19 February 2010 (UTC)[reply]

LGP is much more well-thought-out, as recombination operating on its linear encoding *actually makes sense*, as opposed to GEP, which might as well be replacing portions of the resulting subtree with random subtrees. LGP's phenotype is comparable to an L-System, describing a *depth-first* constructive process, whereas GEP's is breadth-first, which is patently absurd. I was very disturbed when I found this article had been published in Complex Systems, as it's clearly a crock. 74.64.96.71 (talk) 07:52, 14 October 2010 (UTC)[reply]

POV

NPOV? "... this new algorithm surpasses the old GP system in 100-10,000 times." Please specify "old GP system". I think, that Machine code [Linear genetic programming] is really fast. --Michal Jurosz 11:40, 2 November 2005 (UTC)[reply]

Answer: The old GP is the algorithm popularized by Koza (it's on the paper referenced in the article)

Please sign your posts. Just append four tilde's (~) and the wiki will automatically add your IP address or username (if you have one) and the date. Thanks. 65.183.135.231 (talk) 17:42, 31 October 2008 (UTC)[reply]

I simply removed the numeric portion of this claim, replacing it with "significantly". —Preceding unsigned comment added by 74.64.96.71 (talk) 07:53, 14 October 2010 (UTC)[reply]

What does this mean???

"This new algorithm surpasses the old GP system in 100-10,000 times." What does this mean? Is it 100-10,000 times faster? As it stands, this seems a completely nonsensical statement... —Preceding unsigned comment added by 65.183.135.166 (talk) 08:03, 28 November 2007 (UTC)[reply]

Dubious claims throughout

Here is some "original research" (i.e. glaringly obvious points):

In GEP, the mapping from genotype to phenotype is deterministic and static; as such, GEP is nothing more than GP with a convoluted encoding--In fact, I would argue it is significantly less. As a consequence of the high non-linearity of GEP's genotype-to-phenotype mapping, offspring generated via recombination bear very little resemblance to their parents. GEP is thus, effectively, GP with a broken recombination operator, such that runs degenerate into near-random search and simple hill-climbing.

I am nothing less than shocked that these ideas have been accepted for print. Consider:

In section 6.1 of the paper, a GEP "evolves" solutions to the regression problem: a^4 + a^3 + a^2 + a. Note how this is a very easy problem--there aren't even any coefficients to determine! In any case, I replicated the experiment with the same settings, in Matlab. Know what I found out (confirmed)? With a population size of 30 individuals, and 50 generations, simply randomly generating individuals yields comparable performance to this "revolutionary" algorithm! That is, (30x50=)1500 random guesses is usually enough to guess a solution

Where does the claim to 10,000x improvement come from? It may be enlightening to examine the text accompanying Figure 11: "Plot 1: ... a perfect solution was found in generation 1. Plot 2: ... a perfect solution was found in generation 3. Plot 3: ... a perfect solution was found in generation 2. Plot 4: ... a perfect solution was found in generation 3. Plot 5: a perfect solution was found in generation 1." I call bullshit. If your stochastic approximating algorithm is finding perfect solutions after only a handful of generations, you should be suspicious! Clearly, either the problem is too easy, the driver routine is misreporting results, or the researcher is fabricating them. I would prefer not to believe this last, but could the severe problems with this algorithm be any more obvious? How could someone continue on this topic for so many years, even publishing books(!) on it, without realizing its readily apparent flaws?65.183.135.231 (talk) 03:04, 29 October 2008 (UTC)[reply]

I am finding it hard to find journals and other researchers citing GEP except in a few passing mentions (2 brief mentions in the GP and evolable machines journal) which is a little concerning. As an absolute beginner to the subject of GP I do find the solutions being found after one generation a little suspicious, however theres nothing I can find condeming it either so for the purposes of wikipedia your 'original research' points are moot. If you can find peer reviews on GEP i'd like to see them (I need them for a literature review!) 138.253.136.37 (talk) 16:25, 30 October 2008 (UTC)[reply]

I of course agree with you that my OR is not suitable for the article; that's why I didn't include it. However, it is important to note the very sketchy nature of the current article's claims--published or not--so that other editors can see where the article needs further work, and what sorts of additional sources should be sought out. 65.183.135.231 (talk) 16:45, 30 October 2008 (UTC)[reply]

Yes sorry missed your point first time around, most of the claims in the article have no supporting evidence beyond Ferreira's own work and should be removed. I've just reread the preface of the GEP book to quote directly from it:

"The invention of a new paradigm can often create strong resistance, especially if it seems to endanger long established technologies and enterprises. The publication of my work on scientific journals and conferences, which should be forums for discussing and sharing new ideas, became a nightmare and both my work and myself were outright dismissed and treated with scorn."

I think that pretty much sums up GEP as a fringe theory in his Ferreira's own words, regardless of whether hes right or not the fact that his ideas have been rejected by the GP community is significant. 138.253.136.37 (talk) 18:16, 30 October 2008 (UTC)[reply]

Wow. Nice find! My own cursory search was unable to find any critical reviews, and here is Ferreira (female, not male, by the way) providing a perfect quote in her own book! Should we make a "Controversy" section? 65.183.135.231 (talk) 22:46, 30 October 2008 (UTC)[reply]

Going back to figure 11 I think that is a study of the population over time after solutions have been found, the paper probably would have benefitted from some harder examples and stronger comparisons though. I havn't found any comparisons to existing algorithms within the tested examples yet in my reading and the conclusion says it has been shown GEP outperforms existing algorithms. I will read it all tonight. 138.253.136.37 (talk) 13:41, 31 October 2008 (UTC)[reply]

Hi, given the novelty of the method, that the developer is biologist and woman and seeing how GEP citation is extending all over the www (a second edition of the book by Springer should give us some clue) I am able to predict that this kind of evolutionary algorithm will become standard practice in the field. Congrats to Ferreira! (whom by the way I have no relationship and never met with): I am just a evolutionary biologist and computer engineer interested in evolutionary computation. I am pasting here some links for (I think) independent references that seems to sustain Ferreiro's claims on GEP.

the first compares GEP with GP and linear regression:

[1]

the second uses GPE on high energy physics event selection

[2]

evaluates GPE on high energy physics event selection: [3]

This paper also sustain the better performance of GEP:

[4]

AC-R (15th Nov 2009)

Most of those links were broken (but I believe you when you say they once existed) or behind subscription barriers. But for the high-energy physics paper (this link might be more stable: [5]), do actually take a look inside. The author says absolutely *nothing* about the population size, number of generations, nor the number of trials! They simply conclude "it works", seeing as it successfully evolved an exceedingly trivial expression (the logical conjunction of three floating-point comparisons). Absolutely incredible. What do you suppose the probability of randomly having this or a very similar expression appear in the initial generation, purely by chance? Moreover, if you'll grant that Ferreira's genotype-changing operators are essentially gross mutation, in the phenotype space, then how many generations before, again, we reach this result purely by random chance (as opposed to, say, building blocks being discovered, over time, and recombined in useful ways)? With very short expressions, a few hundred individuals in the population, and even a handful of generations, that's several hundred nearly-random expressions--it is highly likely you'll find the result. But we don't call this an evolutionary algorithm--we call it *random search*. I've seen a few papers like this one, and it's always the same pattern: A practitioner from an unrelated field tries the tool without understanding what's really going on (in particular, with a large enough population and a simple problem, pure chance has an excellent chance of returning the result even in the first generation), and says, "Yep, it works" and publishes it in a conference or other thinly-refereed context. (Again, I was absolutely shocked to see the seminal paper in Complex Systems). —Preceding unsigned comment added by 74.64.96.71 (talk) 08:15, 14 October 2010 (UTC)[reply]

And in case you were stupefied by my rant (sorry--it's late, I'm not at all being concise), note again that the author said *nothing* about either population size nor number of generations. Read that again. How can you *possible* evaluate an evolutionary algorithm without saying how large of a population and how many generations? I can run an amazingly foolish algorithm for a few hundred thousand iterations and find a solution. Or, often, I can run a *legitimate* algorithm from the evolutionary computation toolbox and find a solution to the same problem in a few *hundred* generations. (Likewise, I can run a foolish algorithm for a handful of generations and get a solution, if I just increase the size of the population by a few orders of magnitude: random search) 74.64.96.71 (talk) 08:36, 14 October 2010 (UTC)[reply]

And notice the excellent observation above, where someone looked up where the "10,000 times improvement" claim comes from: Figure 11: "Plot 1: ... a perfect solution was found in generation 1. Plot 2: ... a perfect solution was found in generation 3. Plot 3: ... a perfect solution was found in generation 2. Plot 4: ... a perfect solution was found in generation 3. Plot 5: a perfect solution was found in generation 1." I checked, this is an actual quote from the seminal paper. *Please* check, yourself--it's amazing. What does "a perfect solution was found in generation 1" tell you? Generation 1 means the algorithm wasn't even *run* yet. Generation 1 means that's just the *randomly-generated* initial population. That is the *definition* of random search. If the problem is so mind-numbingly trivial that a random search "solves" it any more often than *almost never*, that tells you something. And, indeed, the other "perfect solution[s]" were found in generation 3, 2, 3, and, *again*, 1! So, is GEP magically evolving the solution in a *single step*, in *two steps*, or, *twice* in *zero steps*?!! No. Quite clearly, the problem is being solved by the random search that is the initial population, or else the random search that is the nonlinear (to the point of being meaningless) genotype-altering operations. Any researcher who sees their problem being solved undeniably purely by chance in at least *2/5* of their experiments can't possibly attribute any of that success to their algorithm. It was either highly unethical, or demonstrative of truly prodigious incompetence, for Ferreira to submit this for publication. 74.64.96.71 (talk) 08:31, 14 October 2010 (UTC)[reply]

Oooh... here's a good one: [6] Ferreira herself describes her Complex Systems paper as "infamous". I wonder why that was, no?

Wikiproject Robotics

While roboticists often use evolutionary algorithms in their work, I'm not sure it's really appropriate for this article to be part of Wikiproject Robotics (for example, electrical engineers use EA's as well, but this isn't part of some EE Wikiproject)--I'm sure there is a more appropriate category. How do we change it? Or do folks disagree? 65.183.135.231 (talk) 17:47, 31 October 2008 (UTC)[reply]

Critical responses from researchers

Apparently, in addition to a great number of other researchers, John Koza himself, the inventor of genetic programming, *supposedly* (a less indirect source is wanted for such a claim) took the time to speak out against this nonsense: [7] 74.64.96.71 (talk) 09:19, 14 October 2010 (UTC)[reply]

Here we have folks saying the results in the paper suggest "that the mutation and crossover algorithms are performing near-randomization" (i.e. random search): [8] They also point out that, despite the claims, the paper doesn't actually compare GEP to any other algorithms. 74.64.96.71 (talk) 09:27, 14 October 2010 (UTC)[reply]

Technically not a critical response, but check out Ferreira's response to the above: [9]. She resorts to Random Capitalization and forceful Blustering without Addressing Any Points. 74.64.96.71 (talk) 09:32, 14 October 2010 (UTC)[reply]

Brilliant layout of the problems here: [10]. I believe this is Sean Luke at GMU (someone who actually retained a professorship past the publication date of their thesis--and at a not-too-shabby institution, for that matter). 74.64.96.71 (talk) 09:36, 14 October 2010 (UTC)[reply]

Another such thread. Argues grossly unsubstantiated inference and unwarranted self-aggrandizement: [11] 74.64.96.71 (talk) 09:44, 14 October 2010 (UTC)[reply]