Tandem repeat

Last updated

In genetics, tandem repeats occur in DNA when a pattern of one or more nucleotides is repeated and the repetitions are directly adjacent to each other, e.g. ATTCG ATTCG ATTCG, in which the sequence ATTCG is repeated three times. [1]

Contents

Several protein domains also form tandem repeats within their amino acid primary structure, such as armadillo repeats. However, in proteins, perfect tandem repeats are rare in naturally proteins, but they have been added to designed proteins. [2]

Tandem repeats constitute about 8% of the human genome. [3] They are implicated in more than 50 lethal human diseases, including amyotrophic lateral sclerosis, Huntington's disease, and several cancers. [4]

Terminology

All tandem repeat arrays are classifiable as satellite DNA, a name originating from the fact that tandem DNA repeats, by nature of repeating the same nucleotide sequences repeatedly, have a unique ratio of the two possible nucleotide base pair combinations, conferring them a specific mass density that allows them to be separated from the rest of the genome with density-based laboratory techniques, thus appearing as "satellite bands". Albeit, a tandem repeat array could not show up as a satellite band if it had a nucleotide composition close to the average of the genome. [5]

When exactly two nucleotides are repeated, it is called a dinucleotide repeat (for example: ACACACAC...). The microsatellite instability in hereditary nonpolyposis colon cancer most commonly affects such regions. [6]

When three nucleotides are repeated, it is called a trinucleotide repeat (for example: CAGCAGCAGCAG...), and abnormalities in such regions can give rise to trinucleotide repeat disorders.

When between 10 and 60 nucleotides are repeated, it is called a minisatellite. Those with fewer are known as microsatellites or short tandem repeats.

When much larger lengths of nucleotides are repeated, on the order of 1,000 nucleotides, it is called a macrosatellite.

When the repeat unit copy number is variable in the population being considered, it is called a variable number tandem repeat (VNTR). MeSH classifies variable number tandem repeats under minisatellites. [7]

Mechanism

Tandem repeats can occur through different mechanisms. For example, slipped strand mispairing, (also known as replication slippage), is a mutation process which occurs during DNA replication. It involves denaturation and displacement of the DNA strands, resulting in mispairing of the complementary bases. Slipped strand mispairing is one explanation for the origin and evolution of repetitive DNA sequences.

Other mechanisms include unequal crossover and gene conversion.

Uses

Tandem repeat describes a pattern that helps determine an individual's inherited traits.

Tandem repeats can be very useful in determining parentage. Short tandem repeats are used for certain genealogical DNA tests. DNA is examined from microsatellites within the chromosomal DNA. Parentage can be determined through the similarity in these regions.

Polymorphic tandem repeats (alias VNTRs) are also present in microorganisms and can be used to trace the origin of an outbreak. The corresponding assay in which a collection of VNTRs is typed to characterize a strain is most often called MLVA (Multiple Loci VNTR Analysis). Using tandem repeat polymorphism, recombination has been reported in the natural transmission of monkeypox (mpox) virus genome during 2022 pandemic. [8]

In the field of computer science, tandem repeats in strings (e.g., DNA sequences) can be efficiently detected using suffix trees or suffix arrays.

Studies in 2004 linked the unusual genetic plasticity of dogs to mutations in tandem repeats. [9]

Nested tandem repeats are described as repeating unit lengths that are variable or unknown and frequently include an asymmetric hierarchy of smaller repeating units. These repeats are constructed from distinct groups of homologous-length monomers. An algorithm known as NTRprism was created by Oxford Nanopore Technologies researchers to enable for the annotation of repetitive structures in built satellite DNA arrays. The algorithm NTRprism is developed to find and display the satellite repeating periodicity. [10]

Biotechnology

Kang. et al. successfully in vitro amplified up to 5kb of a sequence containing 36 identical 99bp tandem repeats and a 561bp sequence with 91% AT content using SHARP, which utilizes engineered superhelicases with enhanced processivity and speed. [11] SHARP combines single-stranded DNA binding protein (SSB) and superhelicases with standard PCR reagents to achieve isothermal amplification that mimics biological DNA replication. The method operates at a constant temperature, eliminating the need for thermal cycling, and has shown particular utility in cases where traditional PCR either fails to amplify target sequences or produces unwanted side products.


See also

Related Research Articles

<span class="mw-page-title-main">Genome</span> All genetic material of an organism

A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA. The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as regulatory sequences, and often a substantial fraction of junk DNA with no evident function. Almost all eukaryotes have mitochondria and a small mitochondrial genome. Algae and plants also contain chloroplasts with a chloroplast genome.

<span class="mw-page-title-main">Human genome</span> Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as the DNA within each of the 24 distinct chromosomes in the cell nucleus. A small DNA molecule is found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

A microsatellite is a tract of repetitive DNA in which certain DNA motifs are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome. They have a higher mutation rate than other areas of DNA leading to high genetic diversity. Microsatellites are often referred to as short tandem repeats (STRs) by forensic geneticists and in genetic genealogy, or as simple sequence repeats (SSRs) by plant geneticists.

An inverted repeat is a single stranded sequence of nucleotides followed downstream by its reverse complement. The intervening sequence of nucleotides between the initial sequence and the reverse complement can be any length including zero. For example, 5'---TTACGnnnnnnCGTAA---3' is an inverted repeat sequence. When the intervening length is zero, the composite sequence is a palindromic sequence.

The ribosomal DNA consists of a group of ribosomal RNA encoding genes and related regulatory elements, and is widespread in similar configuration in all domains of life. The ribosomal DNA encodes the non-coding ribosomal RNA, integral structural elements in the assembly of ribosomes, its importance making it the most abundant section of RNA found in cells of eukaryotes. Additionally, these segments includes regulatory sections, such as a promotor specific to the RNA polymerase I, as well as both transcribed and non-transcribed spacer segments.

In genetics, a minisatellite is a tract of repetitive DNA in which certain DNA motifs are typically repeated two to several hundred times. Minisatellites occur at more than 1,000 locations in the human genome and they are notable for their high mutation rate and high diversity in the population. Minisatellites are prominent in the centromeres and telomeres of chromosomes, the latter protecting the chromosomes from damage. The name "satellite" refers to the early observation that centrifugation of genomic DNA in a test tube separates a prominent layer of bulk DNA from accompanying "satellite" layers of repetitive DNA. Minisatellites are small sequences of DNA that do not encode proteins but appear throughout the genome hundreds of times, with many repeated copies lying next to each other.

Satellite DNA consists of very large arrays of tandemly repeating, non-coding DNA. Satellite DNA is the main component of functional centromeres, and form the main structural constituent of heterochromatin.

Repeated sequences are short or long patterns that occur in multiple copies throughout the genome. In many organisms, a significant fraction of the genomic DNA is repetitive, with over two-thirds of the sequence consisting of repetitive elements in humans. Some of these repeated sequences are necessary for maintaining important genome structures such as telomeres or centromeres.

A variable number tandem repeat is a location in a genome where a short nucleotide sequence is organized as a tandem repeat. These can be found on many chromosomes, and often show variations in length among individuals. Each variant acts as an inherited allele, allowing them to be used for personal or parental identification. Their analysis is useful in genetics and biology research, forensics, and DNA fingerprinting.

A genetic marker is a gene or DNA sequence with a known location on a chromosome that can be used to identify individuals or species. It can be described as a variation that can be observed. A genetic marker may be a short DNA sequence, such as a sequence surrounding a single base-pair change, or a long one, like minisatellites.

In genetics, trinucleotide repeat disorders, a subset of microsatellite expansion diseases, are a set of over 30 genetic disorders caused by trinucleotide repeat expansion, a kind of mutation in which repeats of three nucleotides increase in copy numbers until they cross a threshold above which they cause developmental, neurological or neuromuscular disorders. In addition to the expansions of these trinucleotide repeats, expansions of one tetranucleotide (CCTG), five pentanucleotide, three hexanucleotide, and one dodecanucleotide (CCCCGCCCCGCG) repeat cause 13 other diseases. Depending on its location, the unstable trinucleotide repeat may cause defects in a protein encoded by a gene; change the regulation of gene expression; produce a toxic RNA, or lead to production of a toxic protein. In general, the larger the expansion the faster the onset of disease, and the more severe the disease becomes.

<span class="mw-page-title-main">Copy number variation</span> Repeated DNA variation between individuals

Copy number variation (CNV) is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals. Copy number variation is a type of structural variation: specifically, it is a type of duplication or deletion event that affects a considerable number of base pairs. Approximately two-thirds of the entire human genome may be composed of repeats and 4.8–9.5% of the human genome can be classified as copy number variations. In mammals, copy number variations play an important role in generating necessary variation in the population as well as disease phenotype.

Eukaryotic chromosome fine structure refers to the structure of sequences for eukaryotic chromosomes. Some fine sequences are included in more than one class, so the classification listed is not intended to be completely separate.

Slipped strand mispairing is a mutation process which occurs during DNA replication. It involves denaturation and displacement of the DNA strands, resulting in mispairing of the complementary bases. Slipped strand mispairing is one explanation for the origin and evolution of repetitive DNA sequences.

<span class="mw-page-title-main">Microsatellite instability</span> Condition of genetic hypermutability

Microsatellite instability (MSI) is the condition of genetic hypermutability that results from impaired DNA mismatch repair (MMR). The presence of MSI represents phenotypic evidence that MMR is not functioning normally.

Direct repeats are a type of genetic sequence that consists of two or more repeats of a specific sequence. In other words, the direct repeats are nucleotide sequences present in multiple copies in the genome. Generally, a direct repeat occurs when a sequence is repeated with the same pattern downstream. There is no inversion and no reverse complement associated with a direct repeat. It may or may not have intervening nucleotides. The nucleotide sequence written in bold characters signifies the repeated sequence.

A polyglutamine tract or polyQ tract is a portion of a protein consisting of a sequence of several glutamine units. A tract typically consists of about 10 to a few hundred such units.

<span class="mw-page-title-main">Gene polymorphism</span> Occurrence in an interbreeding population of two or more discontinuous genotypes

A gene is said to be polymorphic if more than one allele occupies that gene's locus within a population. In addition to having more than one allele at a specific locus, each allele must also occur in the population at a rate of at least 1% to generally be considered polymorphic.

In genetics, macrosatellites are the largest of the tandem repeats within DNA. Each macrosatellite repeat typically is several thousand base pairs in length, and the entire repeat array often spans hundreds of kilobases. Reduced number of repeats on chromosome 4 causes euchromatization of local DNA and is the predominant cause of facioscapulohumeral muscular dystrophy (FSHD). Other macrosatellites are RS447, NBL2 and DXZ4, although RS447 is also commonly referred to as a "megasatellite."

Unstable DNA sequence are segments of genetic material that exhibit high rates of mutation or variation over time, resulting in significant genetic diversity within populations or even individual organisms.

References

  1. Tandem+Repeat at the U.S. National Library of Medicine Medical Subject Headings (MeSH)
  2. Jorda J, Xue B, Uversky VN, Kajava AV (June 2010). "Protein tandem repeats - the more perfect, the less structured". The FEBS Journal. 277 (12): 2673–82. doi:10.1111/j.1742-4658.2010.07684.x. PMC   2928880 . PMID   20553501.
  3. Duitama J, Zablotskaya A, Gemayel R, Jansen A, Belet S, Vermeesch JR, Verstrepen KJ, Froyen G (May 2014). "Large-scale analysis of tandem repeat variability in the human genome". Nucleic Acids Research. 42 (9): 5728–5741. doi:10.1093/nar/gku212. PMC   4027155 . PMID   24682812.
  4. Cui, Ya; Ye, Wenbin; Li, Jason Sheng; Li, Jingyi Jessica; Vilain, Eric; Sallam, Tamer; Li, Wei (April 2024). "A genome-wide spectrum of tandem repeat expansions in 338,963 humans". Cell. 187 (9): 2336–2341.e5. doi: 10.1016/j.cell.2024.03.004 . ISSN   0092-8674. PMID   38582080.
  5. Brown, Terence A. (2002), "Genome Anatomies", Genomes. 2nd edition, Wiley-Liss, retrieved 2025-01-01
  6. Oki E, Oda S, Maehara Y, Sugimachi K (March 1999). "Mutated gene-specific phenotypes of dinucleotide repeat instability in human colorectal carcinoma cell lines deficient in DNA mismatch repair". Oncogene. 18 (12): 2143–7. doi: 10.1038/sj.onc.1202583 . PMID   10321739.
  7. Variable+Number+of+Tandem+Repeats at the U.S. National Library of Medicine Medical Subject Headings (MeSH)
  8. Yeh, Ting-Yu; Hsieh, Zih-Yu; Feehley, Michael C.; Feehley, Patrick J.; Contreras, Gregory P.; Su, Ying-Chieh; Hsieh, Shang-Lin; Lewis, Dylan A. (9 December 2022). "Recombination shapes the 2022 monkeypox (mpox) outbreak". Med. 3 (12): 824–826. doi:10.1016/j.medj.2022.11.003. ISSN   2666-6359. PMC   9733179 . PMID   36495863.
  9. Pennisi E (December 2004). "Genetics. A ruff theory of evolution: gene stutters drive dog shape". Science. 306 (5705): 2172. doi: 10.1126/science.306.5705.2172 . PMID   15618495. S2CID   10680162.
  10. Altemose, Nicolas; Logsdon, Glennis A.; Bzikadze, Andrey V.; Sidhwani, Pragya; Langley, Sasha A.; Caldas, Gina V.; Hoyt, Savannah J.; Uralsky, Lev; Ryabov, Fedor D.; Shew, Colin J.; Sauria, Michael E. G.; Borchers, Matthew; Gershman, Ariel; Mikheenko, Alla; Shepelev, Valery A. (April 2022). "Complete genomic and epigenetic maps of human centromeres". Science. 376 (6588): eabl4178. doi:10.1126/science.abl4178. ISSN   0036-8075. PMC   9233505 . PMID   35357911.
  11. Kang, Jimin; Rashid, Fahad; Murray, Peter J.; Merino-Urteaga, Raquel; Gavrilov, Momcilo; Shang, Tiantian; Jo, Wonyoung; Ahmed, Arman; Aksel, Tural; Barrick, Doug; Berger, James M.; Ha, Taekjip (November 27, 2024). "Reliable amplification of highly repetitive or low complexity sequence DNA enabled by superhelicase-mediated isothermal amplification". bioRxiv. doi:10.1101/2024.11.27.625726. PMC   11623625 .