Repeated sequence (DNA)
Repeated sequences (also known as repetitive elements, repeating units or repeats) are short or long patterns of nucleic acids (DNA or RNA) that occur in multiple copies throughout the genome. In many organisms, a significant fraction of the genomic DNA is repetitive, with over two-thirds of the sequence consisting of repetitive elements in humans.[1] Some of these repeated sequences are necessary for maintaining important genome structures such as telomeres or centromeres.[2]
Repeated sequences are categorized into different classes depending on features such as structure, length, location, origin, and mode of multiplication. The disposition of repetitive elements throughout the genome can consist either in directly-adjacent arrays called tandem repeats or in repeats dispersed throughout the genome called interspersed repeats.[3] Tandem repeats and interspersed repeats are further categorized into subclasses based on the length of the repeated sequence and/or the mode of multiplication.
While some repeated DNA sequences are important for cellular functioning and genome maintenance, other repetitive sequences can be harmful. Many repetitive DNA sequences have been linked to human diseases such as Huntington's disease and Friedreich's ataxia. Some repetitive elements are neutral and occur when there is an absence of selection for specific sequences depending on how transposition or crossing over occurs.[2] However, an abundance of neutral repeats can still influence genome evolution as they accumulate over time. Overall, repeated sequences are an important area of focus because they can provide insight into human diseases and genome evolution.[2]
History of Discovery
In the 1950s, Barbara McClintock first observed DNA transposition and illustrated the functions of the centromere and telomere at the Cold Spring Harbor Symposium.[4] McClintock's work set the stage for the discovery of repeated sequences because transposition, centromere structure, and telomere structure are all possible through repetitive elements, yet this was not fully understood at the time. The term "repeated sequence" was first used by Roy John Britten and D. E. Kohne in 1968; they found out that more than half of the eukaryotic genomes were repetitive DNA through their experiments on reassociation of DNA.[5] Although the repetitive DNA sequences were conserved and ubiquitous, their biological role was yet unknown. In the 1990s, more research was conducted to elucidate the evolutionary dynamics of minisatellite and microsatellite repeats because of their importance in DNA-based forensics and molecular ecology. DNA-dispersed repeats were increasingly recognized as a potential source of genetic variation and regulation. Discoveries of deleterious repetitive DNA-related diseases stimulated further interest in this area of study.[6] In the 2000s, the data from full eukaryotic genome sequencing enabled the identification of different promoters, enhancers, and regulatory RNAs which are all coded by repetitive regions. Today, the structural and regulatory roles of repetitive DNA sequences remain an active area of research.
Functions
Debates regarding the potential functions of these elements have been long standing. Controversial references to ‘junk’ or ‘selfish’ DNA were put forward early on, implying that repetitive DNA segments are remainders from past evolution or autonomous self-replicating sequences hacking the cell machinery to proliferate.[7][8] Originally discovered by Barbara McClintock,[9] dispersed repeats have been increasingly recognized as a potential source of genetic variation and regulation. Together with these regulatory roles, a structural role of repeated DNA in shaping the 3D folding of genomes has also been proposed.[10] This hypothesis is only supported by a limited set of experimental evidence. For instance in human, mouse and fly, several classes of repetitive elements present a high tendency for co-localization within the nuclear space, suggesting that DNA repeats positions can be used by the cell as a genome folding map.[11]
Tandem repeats in human disease
Tandem repeat sequences, particularly trinucleotide repeats, underlie several human disease conditions. Trinucleotide repeats may expand in the germline over successive generations leading to increasingly severe manifestations of the disease. The disease conditions in which expansion occurs include Huntington’s disease, fragile X syndrome, several spinocerebellar ataxias, myotonic dystrophy and Friedrich ataxia.[12] Trinucleotide repeat expansions may occur through strand slippage during DNA replication or during DNA repair synthesis.[12]
Hexanucleotide GGGGCC repeat sequences in the C9orf72 gene are a common cause of amyotrophic lateral sclerosis and frontotemporal dementia.[13] CAG trinucleotide repeat sequences underlie several spinocerebellar ataxias (SCAs-SCA1; SCA2; SCA3; SCA6; SCA7; SCA12; SCA17).[13] Huntington’s disease results from an unstable expansion of repeated CAG sequences in exon 1 of the huntingtin gene (HTT). HTT encodes a scaffold protein that directly participates in repair of oxidative DNA damage.[14] It has been noted that genes containing pathogenic CAG repeats often encode proteins that themselves have a role in the DNA damage response and that repeat expansions may impair specific DNA repair pathways.[15] Faulty repair of DNA damages in repeat sequences may cause further expansion of these sequences, thus setting up a vicious cycle of pathology.[15]
Types
Main types
Major categories of repeated sequence or repeats:
- Tandem repeats: are copies which lie adjacent to each other, either directly or inverted. Satellite DNA - typically found in centromeres and heterochromatin. Minisatellite - repeat units from about 10 to 60 base pairs, found in many places in the genome, including the centromeres. Microsatellite - repeat units of less than 10 base pairs; this includes telomeres, which typically have 6 to 8 base pair repeat units.
- Interspersed repeats (aka. interspersed nuclear elements):
- Transposable elements
- Retrotransposons
- SINEs (Short Interspersed Nuclear Elements)
- LINEs (Long Interspersed Nuclear Elements)
In primates, the majority of LINEs are LINE-1 and the majority of SINEs are Alu's. SVAs are hominoid specific.
In prokaryotes, CRISPR are arrays of alternating repeats and spacers.
Repeated sequences evolutionary derived from viral infection events.[16]
Other types
Note: The following are covered in detail in "Computing for Comparative Microbial Genomics".[17]
- Direct repeats
- Global direct repeat
- Local direct simple repeats
- Local direct repeats
- Local direct repeats with spacer
- Inverted repeats
- Global inverted repeat
- Local inverted repeat
- Inverted repeat with spacer
- Palindromic repeat
- Mirror and everted repeats
Biotechnology
Repetitive DNA is hard to sequence using next-generation sequencing techniques: sequence assembly from short reads simply cannot determine the length of a repetitive part. This issue is particularly serious for microsatellites, which are made of tiny 1-6bp repeat units.[18]
Many researchers have historically left out repetitive parts when analyzing and publishing whole genome data.[19]
See also
References
- ^ de Koning AP, Gu W, Castoe TA, Batzer MA, Pollock DD (December 2011). "Repetitive elements may comprise over two-thirds of the human genome". PLOS Genetics. 7 (12): e1002384. doi:10.1371/journal.pgen.1002384. PMC 3228813. PMID 22144907.
{{cite journal}}
: CS1 maint: unflagged free DOI (link) - ^ a b c Lower, Sarah E.; Dion-Côté, Anne-Marie; Clark, Andrew G.; Barbash, Daniel A. (2019-11-06). "Special Issue: Repetitive DNA Sequences". Genes. 10 (11): 896. doi:10.3390/genes10110896. ISSN 2073-4425. PMC 6895920. PMID 31698818.
{{cite journal}}
: CS1 maint: unflagged free DOI (link) - ^ "Repeated Sequence (DNA) - an overview | ScienceDirect Topics". www.sciencedirect.com. Retrieved 2022-10-04.
- ^ McClintock, B. (1951-01-01). "CHROMOSOME ORGANIZATION AND GENIC EXPRESSION". Cold Spring Harbor Symposia on Quantitative Biology. 16 (0): 13–47. doi:10.1101/sqb.1951.016.01.004. ISSN 0091-7451.
- ^ Britten, R. J.; Kohne, D. E. (1968-08-09). "Repeated Sequences in DNA: Hundreds of thousands of copies of DNA sequences have been incorporated into the genomes of higher organisms". Science. 161 (3841): 529–540. doi:10.1126/science.161.3841.529. ISSN 0036-8075.
- ^ Shapiro, James A.; von Sternberg, Richard (2005-05). "Why repetitive DNA is essential to genome function". Biological Reviews. 80 (2): 227–250. doi:10.1017/s1464793104006657. ISSN 1464-7931.
{{cite journal}}
: Check date values in:|date=
(help) - ^ Ohno S (1972). "So much "junk" DNA in our genome". Brookhaven Symposia in Biology. 23: 366–70. PMID 5065367.
- ^ Orgel LE, Crick FH, Sapienza C (December 1980). "Selfish DNA". Nature. 288 (5792): 645–6. Bibcode:1980Natur.288..645O. doi:10.1038/288645a0. PMID 7453798. S2CID 4370178.
- ^ Mcclintock B (1 January 1956). "Controlling elements and the gene". Cold Spring Harbor Symposia on Quantitative Biology. 21: 197–216. doi:10.1101/SQB.1956.021.01.017. PMID 13433592.
- ^ Shapiro JA, von Sternberg R (May 2005). "Why repetitive DNA is essential to genome function". Biological Reviews of the Cambridge Philosophical Society. 80 (2): 227–50. doi:10.1017/S1464793104006657. PMID 15921050. S2CID 18866824.
- ^ Cournac A, Koszul R, Mozziconacci J (January 2016). "The 3D folding of metazoan genomes correlates with the association of similar repetitive elements". Nucleic Acids Research. 44 (1): 245–55. doi:10.1093/nar/gkv1292. PMC 4705657. PMID 26609133.
- ^ a b Usdin K, House NC, Freudenreich CH (22 January 2015). "Repeat instability during DNA repair: Insights from model systems". Critical Reviews in Biochemistry and Molecular Biology. 50 (2): 142–67. doi:10.3109/10409238.2014.999192. PMC 4454471. PMID 25608779.
- ^ a b Abugable AA, Morris JL, Palminha NM, Zaksauskaite R, Ray S, El-Khamisy SF (September 2019). "DNA repair and neurological disease: From molecular understanding to the development of diagnostics and model organisms". DNA Repair. 81: 102669. doi:10.1016/j.dnarep.2019.102669. PMID 31331820.
- ^ Maiuri T, Mocle AJ, Hung CL, Xia J, van Roon-Mom WM, Truant R (January 2017). "Huntingtin is a scaffolding protein in the ATM oxidative DNA damage response complex". Human Molecular Genetics. 26 (2): 395–406. doi:10.1093/hmg/ddw395. PMID 28017939.
- ^ a b Massey TH, Jones L (January 2018). "The central role of DNA damage and repair in CAG repeat diseases". Disease Models & Mechanisms. 11 (1): dmm031930. doi:10.1242/dmm.031930. PMC 5818082. PMID 29419417.
- ^ Villarreal LP (2005). Viruses and the Evolution of Life. ASM Press. ISBN 978-1-55581-309-3.[page needed]
- ^ Ussery DW, Wassenaar TM, Borini S (2009). "Word Frequencies and Repeats". Computing for Comparative Microbial Genomics. Computational Biology. Vol. 8. pp. 137–150. doi:10.1007/978-1-84800-255-5_8. ISBN 978-1-84800-254-8.
- ^ De Bustos A, Cuadrado A, Jouve N (November 2016). "Sequencing of long stretches of repetitive DNA". Scientific Reports. 6 (1): 36665. Bibcode:2016NatSR...636665D. doi:10.1038/srep36665. PMC 5098217. PMID 27819354.
- ^ Slotkin RK (1 May 2018). "The case for not masking away repetitive DNA". Mobile DNA. 9 (1): 15. doi:10.1186/s13100-018-0120-9. PMC 5930866. PMID 29743957.
External links
- Function of Repetitive DNA
- DNA+Repetitious+Region at the U.S. National Library of Medicine Medical Subject Headings (MeSH)