Jump to content

Genome survey sequence

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Zyx185 (talk | contribs) at 04:22, 22 October 2013 (Usage and limitation). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In the fields of bioinformatics and computational biology, Genome Survey Sequences (GSS) are nucleotide sequences similar to EST's, with the exception that most of them are genomic in origin, rather than mRNA. [1] So genome survey sequencing is a new way to map the genome sequences.

Genome Survey Sequences are typically generated and submitted to NCBI by labs performing genome sequencing and are used, amongst other things, as a framework for the mapping and sequencing of genome size pieces included in the standard GenBank divisions. [2]

Usage and limitation

Genome survey sequencing is a new to map the genome sequences since it’s not dependent on mRNA. Current genome sequencing approaches are mostly high-throughput shotgun methods, and GSS is often used on the first step of sequencing. GSSs provide an initial global view of a genome, which includes both coding and non-coding DNA and contain repetitive section of the genome unlike ESTs. [3] It can estimate some global parameters of genome, such as neutral mutation rate and repeat content fro a dog genome. [4]

GSS is also a effective way to large-scale and rapidly characterizing genomes of related species where there is only little gene sequences or maps. [5] GSS with low coverage can generate abundant information of gene content and putative regulatory elements of comparative species. [6] It can compare these genes of related species to find out relatively expanded or contracted families. And combined with physical clone coverage, researchers can navigate the genome easily and characterize the specific genomic section by more extensive sequencing. [7]

Types of data

The GSS division contains (but is not limited to) the following types of data:

Random "single pass read" genome survey sequences

Random “single pass read” genome survey sequences is GSSs that generated along single pass read by random selection. Single-pass sequencing with lower fidelity can be used on the rapid accumulation of genomic data but with a lower accuracy. [8] It includes RAPD, RFLP, AFLP and so on.[9]

Cosmid/BAC/YAC end sequences

Cosmid/BAC/YAC end sequences use Cosmid/Bacterial artificial chromosome/Yeast artificial chromosome to sequence the genome from the end side. These sequences act like very low copy plasmids that there is only one copy per cell sometimes. To get enough chromosome, they need a large number of E. coli culture that 2.5 - 5 litres may be a reasonable amount. [10]

Eukaryotic proteins can be expressed by using YAC with posttranslational modification. Although BAC can’t do that, it’s more stable than YAC. [11]

Exon trapped genomic sequences

Exon trapping is used to clone target gene by recognizing and trapping carrier containing exon sequence of DNA. During slicing, exon can be remained in mRNA and information carried by exon can be contained in the protein. Since fragment of DNA can be inserted into sequences, if an exon is inserted into intron, the transcript will be longer than usual and this transcript can be trapped by analysis.

Alu PCR sequences

Alu PCR is a rapid and easy-to-perform "DNA fingerprinting" technique based on the simultaneous analysis of many genomic loci flanked by Alu repetitive elements, which are non-autonomous retrotransposons present in high number of copies in primate genomes.[12] Alu element can be used for genome fingerprinting based on PCR, which is also called Alu PCR.

Alu repetitive element is member of Short Interspersed Elements (SINE) in mammalian genome. There are about 500 thousand copies of Alu repetitive element, which means one Alu element in 4 to 6 kb averagely. Since there is recognition sequence AGCT of restriction endonuclease Alu I in this type of DNA sequence, that is why it is called Alu repetitive element. By using special Alu sequence as target locus, specific human DNA can be obtained from clone of TAC, BAC, PAC or human-mouse cell hybrid.

Transposon-tagged sequences

Transposable elements were originally discovered in maize plants by Barbara McClintock (McClintock, 1948). She identified the first transposable genetic element, which she called the Dissociation (Ds) locus.[13] The size of transposable element is between 750 and 40000bp. Transposable element can be mainly classified as two classes: One class is very simple, called insertion sequence (IS), the other class is complicated, called transposon. Transposon has one or several characterized genes, which can be easily identified. IS has the gene of transposase.

Transposon can be used as tag for a DNA with a know sequence. Transposon can appear at other locus through transcription or reverse transcription by the effect of nuclease. This appearance of transposon proved that genome is not statistical, but always changing the structure of itself.

Example of GSS file

The fallowing is a example of GSS file that can be submitted to GenBank: [14]

TYPE: GSS
STATUS:  New
CONT_NAME: Sikela JM
GSS#: Ayh00001
CLONE: HHC189
SOURCE: ATCC
SOURCE_INHOST: 65128
OTHER_GSS:  GSS00093, GSS000101
CITATION: 
Genomic sequences from Human 
brain tissue
SEQ_PRIMER: M13 Forward
P_END: 5'
HIQUAL_START: 1
HIQUAL_STOP: 285
DNA_TYPE: Genomic
CLASS: shotgun
LIBRARY: Hippocampus, Stratagene (cat. #936205)
PUBLIC: 
PUT_ID: Actin, gamma, skeletal
COMMENT:
SEQUENCE:
AATCAGCCTGCAAGCAAAAGATAGGAATATTCACCTACAGTGGGCACCTCCTTAAGAAGCTG
ATAGCTTGTTACACAGTAATTAGATTGAAGATAATGGACACGAAACATATTCCGGGATTAAA
CATTCTTGTCAAGAAAGGGGGAGAGAAGTCTGTTGTGCAAGTTTCAAAGAAAAAGGGTACCA
GCAAAAGTGATAATGATTTGAGGATTTCTGTCTCTAATTGGAGGATGATTCTCATGTAAGGT
GCAAAAGTGATAATGATTTGAGGATTTCTGTCTCTAATTGGAGGATGATTCTCATGTAAGGT
TGTTAGGAAATGGCAAAGTATTGATGATTGTGTGCTATGTGATTGGTGCTAGATACTTTAAC
TGAGTATACGAGTGAAATACTTGAGACTCGTGTCACTT
||

References

  1. ^ GenBank Flat File 96.0 Release Notes
  2. ^ GenBank Flat File 96.0 Release Notes
  3. ^ Otto, Thomas D., et al. "ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)." Bmc Bioinformatics 9.1 (2008): 366.
  4. ^ Kirkness, Ewen F., et al. "The dog genome: survey sequencing and comparative analysis." Science 301.5641 (2003): 1898-1903.
  5. ^ Venkatesh, Byrappa, et al. "Survey sequencing and comparative analysis of the elephant shark (Callorhinchus milii) genome." PLoS biology 5.4 (2007): e101.
  6. ^ Hitte, Christophe, et al. "Facilitating genome navigation: survey sequencing and dense radiation-hybrid gene mapping." Nature Reviews Genetics 6.8 (2005): 643-648.
  7. ^ Kirkness, Ewen F., et al. "The dog genome: survey sequencing and comparative analysis." Science 301.5641 (2003): 1898-1903.
  8. ^ DNA sequencing How to determine the sequence of bases in a DNA molecule.
  9. ^ DDBJ-GSS
  10. ^ MEGA- and GIGA preps of cosmid-, BAC-, PAC, YAC-, and P1-DNA with JETSTAR 2.0
  11. ^ Yeast artificial chromosome
  12. ^ Cardelli M (2011). "Alu PCR". Methods Mol Bio. 687: 221. doi:10.1007/978-1-60761-944-4_15. PMID 20967611.
  13. ^ Tsugeki R, Olson M L, Fedoroff N V (2007). "Transposon tagging and the study of root development in Arabidopsis". Gravitational and Space Biology. 11: 79–87. PMID 11540642. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  14. ^ dbGSS_submit