User:Zephyris/draft

Directed evolution is a method used in protein engineering to harness the power of Darwinian selection to evolve proteins or RNA with desirable properties not found in nature. The process is carried out in two major stages; firstly a library of mutant genes is produced and then these mutant genes are screened for improved ability to carry out the desired reaction. This processes is then repeated. Directed evolution is a technique based on random mutation and screening of mutants, and is an alternative to rational design approches, such as site-directed mutagenesis based on X-ray crystallography data.

Most directed evolution projects seek to evolve properties that are useful to humans in an agricultural, medical or industrial context. It is thus possible to use this method to optimize properties that were not selected for in the original organism. This may include catalytic specificity, thermostability and many others.

Directed evolution in practice

A typical directed evolution experiment involves two steps; a creation of a library of mutant DNA, RNA or protein molecules, and the screening of these mutants for desirable features. This process is then repeated with increasingly stringent screening giving a stepwise improvement in the function of the enzyme.

Library creation

Library creation is in many ways the most important step, without a good library of mutants to screen there is no way an enzyme can be efficiently evolved. Librarys of mutant sequences are almost universally DNA or RNA, DNA can be translated to produce proteins (enzymes) and RNA can, itself, have catylitic ability. Techniques used to generate the library are generally error-prone PCR and DNA shuffling.

The seed sequence from which a library is created may be from an existing enzyme or ribozyme, a family of different related enzymes, or even a completely random sequence (the latter is generally only used for ribozymes). Single sequences are typically mutated by error prone PCR, which makes random mistakes while replicating DNA, whilst familys of gene sequences are mutated by a combination of error probe PCR and DNA shuffling, which rearranges whole sections of a variety of genes into new combinations. Careful choice of the seed sequence, for example choosing an enzyme with related activity or a flexible structure, can greatly improve the quality of the library.

Librarys of mutants can only ever represent a very small fraction of the number of possible sequences. Take, for example, a protein made up of 100 amino acids. The number of possible sequences this protein can have is 100^{21^{= 10²² compared to a practical maximum of 10¹² DNA molecules in a 1ml solution. This vast difference means a library must be finely tuned to be likely to have the desired activity, techniques which direct mutation to important sections of a sequence will be far more successful.}}

Library screening

Library screening requires well designed assays which can reliably measure the activity of many billions of mutants, allow those mutants to be identified and separated, link the enzyme to its sequence and is practical within in a laboratory. This is a significant challenge and, although many techniques exist, all have particular limitations and advantages.

The linkage of sequence to enzyme (commoly referred to as the genotype phenotype linkage) is fundamental to the success of the assay, and fortunately relitively simple. For example a bacteria or virus may be genetically engineered to express the protein from a mutant DNA sequence - the expressed protein is contained within the bacteria or virus. Alternatively in vitro methods may be used, such as covelantly linking the protein to the ribosome which produced it, which may, in turn, be linked to the DNA sequence. Limitations in assays normally lie in determining phenotype accurately (ie. in the enzyme phenotype linkage), this is especially hard for enzymes which have to carry out an reaction multiple times. The common solution is to design the product to become covelantly attached to the enzyme which produced it, the modified enzyme can then be recognised. Unfortunately this forces selection for a reaction only similar to the one desired, causing fundamental limitations.

The difficulties may be summariesed by the following example. Consider a sample of billions of DNA sequences. To test the activity of the enzymes these sequences code you first have to translate the DNA to produce proteins. You add RNA polymerase and NTPs and then ribosomes and tRNAs to translate the DNA, and these produce around 100 copies of each sequence's enzyme. Now your sample contains billions of DNA sequences and their proteins and you need to determine which enzymes are active. It is simple to determine whether one of these billions enzymes has activity - simply add the substrate to the sample and measure the rate of production of the product. However this does not tell you which enzyme has the activity, let alone which sequence codes for that enzyme! This experiment will obviously be easy if your sample contains only one DNA sequence. This is the aim of all directed evolution assays - create an assay where each DNA sequence, its proteins, and its products are separated from the billions of others in the sample in their own 'virtual testubes'.

Repeat

Although running this process only once should give some improvement in the activity of the enzyme it is limited in effectiveness by the size of the initial library. This can, however, be circumvented by repeating. Each repeat screens another library of mutants, each library based on the previous generations' most successful sequences. This results in a stepwise improvement in activiy on the previous generations' activity.

Repeating may either be by sequencing of successful sequences and re-synthesis for further mutation and evolution, although it is more common to simply recover the DNA or RNA sequence from successful phenotypes and directly mutate these.

Sequence space

Sequence space is a way of representing a library of sequences on a two dimensional plane to visualise how different sequences relate to each other. Whilst sequence space itself has no 'real world' meaning the concept of sequence space is useful in understanding the advantages and disadvantages of different library sizes, library diversitys and generations. The distance between any two points (which represent sequences) on a sequence space plane represents the difference between the two sequences. A large distance corresponds to a large difference and a small difference corresponds to great similarity. For example two very similar sequences, such as a duplicated protein in a genome, will lie very close to each other, whilst a family of proteins with similar structures will form a low density scatter of points in one area. It is thought that functional enzymes form 'islands' in sequence space, whilst the majority of the space corresponds to non-functional enzymes with no stable structure.

Library generation in directed evolution must target these islands where catylitic activity is possible, and then evolve enzymes to fine tune the sequence to the desired catylitic ability. Generally an island corresponds to a family of related genes and any closely related sequences, hence the use of error prone PCR and DNA shuffling of gene familys in library production. A successful enzyme as designed by directed evolution will not be the only possible sequence, in fact an enzyme with that activity would probably evolveble from any sequence space island.

External links

The Frances H. Arnold Research Group, a directed evolution laboratory