Jump to content

User:Raspberry Neuron/sandbox

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Raspberry Neuron (talk | contribs) at 13:38, 30 September 2022. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Repeated sequences (also known as repetitive elements, repeating units or repeats) are patterns of nucleic acids (DNA or RNA) that occur in multiple copies throughout the genome. Repetitive DNA was first detected because of its rapid re-association kinetics. In many organisms, a significant fraction of the genomic DNA is highly repetitive, with over two-thirds of the sequence consisting of repetitive elements in humans.[1]

Repetitive elements found in genomes fall into different classes, depending on their structure and/or the mode of multiplication. The disposition of repetitive elements consists either in arrays of tandemly repeated sequences, or in repeats dispersed throughout the genome (see below).

History of Discovery

Debates regarding the potential functions of these elements have been long standing. Controversial references to ‘junk’ or ‘selfish’ DNA were put forward early on, implying that repetitive DNA segments are remainders from past evolution or autonomous self-replicating sequences hacking the cell machinery to proliferate.[2][3] Originally discovered by Barbara McClintock,[4] dispersed repeats have been increasingly recognized as a potential source of genetic variation and regulation. Together with these regulatory roles, a structural role of repeated DNA in shaping the 3D folding of genomes has also been proposed.[5] This hypothesis is only supported by a limited set of experimental evidence. For instance in human, mouse and fly, several classes of repetitive elements present a high tendency for co-localization within the nuclear space, suggesting that DNA repeats positions can be used by the cell as a genome folding map.[6]

Types and Functions

Tandem Repeats

Tandem repeats are repeated sequences which are directly adjacent to each other in the genome[7]. Tandem repeats may vary in the number of nucleotides which comprise the repeated sequence as well as the number of times the sequence repeats. When the repeating sequence is only 2-10 nucleotides long, the repeat is referred to as a short tandem repeat (STR) or microsatellite[8]. When the repeating sequence is 10-60 nucleotides long, the repeat is referred to as a minisatellite[9]. For minisatellites and microsatellites, the number of times the sequence repeats at a single locus can range from twice to hundreds of times.

Tandem repeats have a wide variety of biological functions in the genome. For example, minisatellites are often hotspots of meiotic homologous recombination in eukaryotic organisms[10]. Recombination is when two homologous chromosomes align, break, and rejoin to swap pieces. Recombination is important as a source of genetic diversity, as a mechanism for repairing damaged DNA, and a necessary step in the appropriate segregation of chromosomes in meiosis[10]. The presence of repeated sequence DNA makes it easier for areas of homology to align, thereby controlling when and where recombination occurs.

In addition to playing an important role in recombination, tandem repeats also play important structural roles in the genome. For example, telomeres are comprised mainly of tandem TTAGGG repeats[11]. Structurally, these repeats fold into highly organized G quadruplex structures which protect the ends of chromosomal DNA from degradation[12]. Repetitive elements are enriched in the middle of chromosomes as well. Centromeres are the highly compact regions of chromosomes which join sister chromatids together and also allow the mitotic spindle to attach and separate sister chromatids during cell division[13]. Centromeres are composed of a 177 base pair tandem repeat named the α-satellite repeat[12]. Pericentromeric heterochromatin, the DNA which surrounds the centromere and is important for structural maintenance, is comprised of a mixture of different satellite subfamilies including the α-, β- and γ-satellites as well as HSATII, HSATIII, and sn5 repeats[14][15].

Some repetitive sequences, such as those with structural roles discussed above, play roles which are necessary for proper biological functioning. Other tandem repeats have deleterious roles which drive diseases, as discussed in the later section “Repeated Sequences in Human Disease.” Many other tandem repeats, however, have unknown or poorly understood functions[16]. Tandem repeats remain an active area of scientific research.

Interspersed Repeats

Interspersed repeats are identical or similar DNA sequences which are found in different locations throughout the genome[17]. Interspersed repeats are distinguished from tandem repeats in that the repeated sequences are not directly adjacent to each other but instead may be scattered among different chromosomes or far apart on the same chromosome. Most interspersed repeats are transposable elements (TEs), mobile sequences which can be “cut and pasted” or “copied and pasted” into different places in the genome[18]. TEs were originally called “jumping genes” for their ability to move, yet this term is somewhat misleading as not all TEs are discrete genes[19].

Transposable elements which are transcribed into RNA, reverse-transcribed into DNA, then reintegrated into the genome are called retrotransposons[18]. Just as tandem repeats are further subcategorized based on the length of the repeating sequence, there are many different types of retrotransposons. Long interspersed nuclear elements (LINEs) are typically 3-7 kilobases in length[20]. Short interspersed nuclear elements (SINEs) are typically 100-300 base pairs and no longer than 600 base pairs[20]. Long-terminal repeat retrotransposons (LTRs) are a third major class of retrotransposons and are characterized by highly repetitive sequences as the ends of the repeat[18]. When a transposable element does not proceed through RNA as an intermediate, it is called a DNA transposon[18]. Other classification systems refer to retrotransposons as “Class I” and DNA transposons as “Class II” transposable elements[19].

Transposable elements are estimated to comprise 45% of the human genome[21]. Since uncontrolled propagation of TEs could wreak havoc on the genome, many regulatory mechanisms have evolved to silence their spread, including DNA methylation, histone modifications, non-coding RNAs (ncRNAs) including small interfering RNA (siRNA), chromatin remodelers, histone variants, and other epigenetic factors[19]. However, TEs are not simply problems to be silenced by the cell; TEs play a wide variety of important biological functions. When TEs are introduced into a new host, such as from a virus, they increase genetic diversity[19]. In some cases, host organisms find new functions for the proteins which arise from expressing TEs in an evolutionary process called TE exaptation[19]. Recent research also suggests that TEs serve to maintain higher-order chromatin structure and 3D genome organization[22]. Furthermore, TEs contribute to regulating the expression of other genes by serving as distal enhancers and transcription factor binding sites[23].

The prevalence of interspersed elements in the genome has garnered attention for more research on their origins and functions. While many specific interspersed elements have been characterized, such as the Alu repeat and LINE1, there are undoubtedly many others yet to be discovered. Interspersed elements remain an active area of research relevant to evolutionary origins, biology, and medicine.

Repeated Sequences in Human Disease

Tandem repeat sequences, particularly trinucleotide repeats, underlie several human disease conditions. Trinucleotide repeats may expand in the germline over successive generations leading to increasingly severe manifestations of the disease. The disease conditions in which expansion occurs include Huntington’s disease, fragile X syndrome, several spinocerebellar ataxias, myotonic dystrophy and Friedrich ataxia.[24] Trinucleotide repeat expansions may occur through strand slippage during DNA replication or during DNA repair synthesis.[24]

Hexanucleotide GGGGCC repeat sequences in the C9orf72 gene are a common cause of amyotrophic lateral sclerosis and frontotemporal dementia.[25] CAG trinucleotide repeat sequences underlie several spinocerebellar ataxias (SCAs-SCA1; SCA2; SCA3; SCA6; SCA7; SCA12; SCA17).[25] Huntington’s disease results from an unstable expansion of repeated CAG sequences in exon 1 of the huntingtin gene (HTT). HTT encodes a scaffold protein that directly participates in repair of oxidative DNA damage.[26] It has been noted that genes containing pathogenic CAG repeats often encode proteins that themselves have a role in the DNA damage response and that repeat expansions may impair specific DNA repair pathways.[27] Faulty repair of DNA damages in repeat sequences may cause further expansion of these sequences, thus setting up a vicious cycle of pathology.[27]

Biotechnology

Repetitive DNA is hard to sequence using next-generation sequencing techniques: sequence assembly from short reads simply cannot determine the length of a repetitive part. This issue is particularly serious for microsatellites, which are made of tiny 1-6bp repeat units.[28]

Many researchers have historically left out repetitive parts when analyzing and publishing whole genome data.[29]

See also

References

  1. ^ de Koning AP, Gu W, Castoe TA, Batzer MA, Pollock DD (December 2011). "Repetitive elements may comprise over two-thirds of the human genome". PLOS Genetics. 7 (12): e1002384. doi:10.1371/journal.pgen.1002384. PMC 3228813. PMID 22144907.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  2. ^ Ohno S (1972). "So much "junk" DNA in our genome". Brookhaven Symposia in Biology. 23: 366–70. PMID 5065367.
  3. ^ Orgel LE, Crick FH, Sapienza C (December 1980). "Selfish DNA". Nature. 288 (5792): 645–6. doi:10.1038/288645a0. PMID 7453798. S2CID 4370178.
  4. ^ Mcclintock B (1 January 1956). "Controlling elements and the gene". Cold Spring Harbor Symposia on Quantitative Biology. 21: 197–216. doi:10.1101/SQB.1956.021.01.017. PMID 13433592.
  5. ^ Shapiro JA, von Sternberg R (May 2005). "Why repetitive DNA is essential to genome function". Biological Reviews of the Cambridge Philosophical Society. 80 (2): 227–50. doi:10.1017/S1464793104006657. PMID 15921050. S2CID 18866824.
  6. ^ Cournac A, Koszul R, Mozziconacci J (January 2016). "The 3D folding of metazoan genomes correlates with the association of similar repetitive elements". Nucleic Acids Research. 44 (1): 245–55. doi:10.1093/nar/gkv1292. PMC 4705657. PMID 26609133.
  7. ^ "Tandem Repeat". Genome.gov. Retrieved 2022-09-30.
  8. ^ Sznajder, Łukasz J.; Swanson, Maurice S. (2019-07-09). "Short Tandem Repeat Expansions and RNA-Mediated Pathogenesis in Myotonic Dystrophy". International Journal of Molecular Sciences. 20 (13): 3365. doi:10.3390/ijms20133365. ISSN 1422-0067. PMC 6651174. PMID 31323950.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  9. ^ "MeSH Browser". meshb.nlm.nih.gov. Retrieved 2022-09-30.
  10. ^ a b Wahls, Wayne P. (1998). "Meiotic Recombination Hotspots: Shaping the Genome and Insights into Hypervariable Minisatellite DNA Change". Current topics in developmental biology. 37: 37–75. ISSN 0070-2153. PMC 3151733. PMID 9352183.
  11. ^ Janssen, Aniek; Colmenares, Serafin U.; Karpen, Gary H. (2018-10-06). "Heterochromatin: Guardian of the Genome". Annual Review of Cell and Developmental Biology. 34 (1): 265–288. doi:10.1146/annurev-cellbio-100617-062653. ISSN 1081-0706.
  12. ^ a b Qi, J. (2005-06-02). "Covalent ligation studies on the human telomere quadruplex". Nucleic Acids Research. 33 (10): 3185–3192. doi:10.1093/nar/gki632. ISSN 0305-1048. PMC 1142406. PMID 15933211.{{cite journal}}: CS1 maint: PMC format (link)
  13. ^ "Centromere". Genome.gov. Retrieved 2022-09-30.
  14. ^ Miga, Karen H. (2015-09-01). "Completing the human genome: the progress and challenge of satellite DNA assembly". Chromosome Research. 23 (3): 421–426. doi:10.1007/s10577-015-9488-2. ISSN 1573-6849.
  15. ^ Lee, C.; Wevrick, R.; Fisher, R. B.; Ferguson-Smith, M. A.; Lin, C. C. (1997-08-04). "Human centromeric DNAs". Human Genetics. 100 (3–4): 291–304. doi:10.1007/s004390050508. ISSN 0340-6717.
  16. ^ Padeken, Jan; Zeller, Peter; Gasser, Susan M (2015-04-01). "Repeat DNA in genome organization and stability". Current Opinion in Genetics & Development. Genome architecture and expression. 31: 12–19. doi:10.1016/j.gde.2015.03.009. ISSN 0959-437X.
  17. ^ "Interspersed repetitive sequences - Latest research and news | Nature". www.nature.com. Retrieved 2022-09-30.
  18. ^ a b c d Wicker, Thomas; Sabot, François; Hua-Van, Aurélie; Bennetzen, Jeffrey L.; Capy, Pierre; Chalhoub, Boulos; Flavell, Andrew; Leroy, Philippe; Morgante, Michele; Panaud, Olivier; Paux, Etienne; SanMiguel, Phillip; Schulman, Alan H. (2007-12). "A unified classification system for eukaryotic transposable elements". Nature Reviews Genetics. 8 (12): 973–982. doi:10.1038/nrg2165. ISSN 1471-0064. {{cite journal}}: Check date values in: |date= (help)
  19. ^ a b c d e Nicolau, Melody; Picault, Nathalie; Moissiard, Guillaume (2021-10-29). "The Evolutionary Volte-Face of Transposable Elements: From Harmful Jumping Genes to Major Drivers of Genetic Innovation". Cells. 10 (11): 2952. doi:10.3390/cells10112952. ISSN 2073-4409. PMC 8616336. PMID 34831175.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  20. ^ a b Kramerov, Dmitri A.; Vassetzky, Nikita S. (2011-11). "SINEs: SINEs". Wiley Interdisciplinary Reviews: RNA. 2 (6): 772–786. doi:10.1002/wrna.91. {{cite journal}}: Check date values in: |date= (help)
  21. ^ Lee, Hee-Eun; Ayarpadikannan, Selvam; Kim, Heui-Soo (2015). "Role of transposable elements in genomic rearrangement, evolution, gene regulation and epigenetics in primates". Genes & Genetic Systems. 90 (5): 245–257. doi:10.1266/ggs.15-00016.
  22. ^ Mangiavacchi, Arianna; Liu, Peng; Della Valle, Francesco; Orlando, Valerio (2021). "New insights into the functional role of retrotransposon dynamics in mammalian somatic cells". Cellular and Molecular Life Sciences. 78 (13): 5245–5256. doi:10.1007/s00018-021-03851-5. ISSN 1420-682X. PMC 8257530. PMID 33990851.
  23. ^ Ichiyanagi, Kenji (2013). "Epigenetic regulation of transcription and possible functions of mammalian short interspersed elements, SINEs". Genes & Genetic Systems. 88 (1): 19–29. doi:10.1266/ggs.88.19. ISSN 1880-5779. PMID 23676707.
  24. ^ a b Usdin K, House NC, Freudenreich CH (22 January 2015). "Repeat instability during DNA repair: Insights from model systems". Critical Reviews in Biochemistry and Molecular Biology. 50 (2): 142–67. doi:10.3109/10409238.2014.999192. PMC 4454471. PMID 25608779.
  25. ^ a b Abugable AA, Morris JL, Palminha NM, Zaksauskaite R, Ray S, El-Khamisy SF (September 2019). "DNA repair and neurological disease: From molecular understanding to the development of diagnostics and model organisms". DNA Repair. 81: 102669. doi:10.1016/j.dnarep.2019.102669. PMID 31331820.
  26. ^ Maiuri T, Mocle AJ, Hung CL, Xia J, van Roon-Mom WM, Truant R (January 2017). "Huntingtin is a scaffolding protein in the ATM oxidative DNA damage response complex". Human Molecular Genetics. 26 (2): 395–406. doi:10.1093/hmg/ddw395. PMID 28017939.
  27. ^ a b Massey TH, Jones L (January 2018). "The central role of DNA damage and repair in CAG repeat diseases". Disease Models & Mechanisms. 11 (1): dmm031930. doi:10.1242/dmm.031930. PMC 5818082. PMID 29419417.
  28. ^ De Bustos A, Cuadrado A, Jouve N (November 2016). "Sequencing of long stretches of repetitive DNA". Scientific Reports. 6 (1): 36665. doi:10.1038/srep36665. PMC 5098217. PMID 27819354.
  29. ^ Slotkin RK (1 May 2018). "The case for not masking away repetitive DNA". Mobile DNA. 9 (1): 15. doi:10.1186/s13100-018-0120-9. PMC 5930866. PMID 29743957.