Tandem repeat

Last updated

In genetics, tandem repeats occur in DNA when a pattern of one or more nucleotides is repeated and the repetitions are directly adjacent to each other, e.g. ATTCG ATTCG ATTCG, in which the sequence ATTCG is repeated three times. [1]

Contents

Several protein domains also form tandem repeats within their amino acid primary structure, such as armadillo repeats. However, in proteins, perfect tandem repeats are rare in naturally proteins, but they have been added to designed proteins. [2]

Tandem repeats constitute ∼6% of the human genome. They are implicated in more than 50 lethal human diseases, including amyotrophic lateral sclerosis, Huntington’s disease, and several cancers. [3]

Terminology

All tandem repeat arrays are classifiable as satellite DNA, a name originating from the fact that tandem DNA repeats, by nature of repeating the same nucleotide sequences repeatedly, have a unique ratio of the two possible nucleotide base pair combinations, conferring them a specific mass density that allows them to be separated from the rest of the genome with density-based laboratory techniques, thus appearing as "satellite bands." Albeit, a tandem repeat array could not show up as a satellite band if it had a nucleotide composition close to the average of the genome.[ citation needed ]

When exactly two nucleotides are repeated, it is called a dinucleotide repeat (for example: ACACACAC...). The microsatellite instability in hereditary nonpolyposis colon cancer most commonly affects such regions. [4]

When three nucleotides are repeated, it is called a trinucleotide repeat (for example: CAGCAGCAGCAG...), and abnormalities in such regions can give rise to trinucleotide repeat disorders.

When between 10 and 60 nucleotides are repeated, it is called a minisatellite. Those with fewer are known as microsatellites or short tandem repeats.

When much larger lengths of nucleotides are repeated, on the order of 1,000 nucleotides, it is called a macrosatellite.

When the repeat unit copy number is variable in the population being considered, it is called a variable number tandem repeat (VNTR). MeSH classifies variable number tandem repeats under minisatellites. [5]

Mechanism

Tandem repeats can occur through different mechanisms. For example, slipped strand mispairing, (also known as replication slippage), is a mutation process which occurs during DNA replication. It involves denaturation and displacement of the DNA strands, resulting in mispairing of the complementary bases. Slipped strand mispairing is one explanation for the origin and evolution of repetitive DNA sequences.

Other mechanisms include unequal crossover and gene conversion.

Uses

Tandem repeat describes a pattern that helps determine an individual's inherited traits.

Tandem repeats can be very useful in determining parentage. Short tandem repeats are used for certain genealogical DNA tests. DNA is examined from microsatellites within the chromosomal DNA. Parentage can be determined through the similarity in these regions.

Polymorphic tandem repeats (alias VNTRs) are also present in microorganisms and can be used to trace the origin of an outbreak. The corresponding assay in which a collection of VNTRs is typed to characterize a strain is most often called MLVA (Multiple Loci VNTR Analysis). Using tandem repeat polymorphism, recombination has been reported in the natural transmission of monkeypox (mpox) virus genome during 2022 pandemic. [6]

In the field of computer science, tandem repeats in strings (e.g., DNA sequences) can be efficiently detected using suffix trees or suffix arrays.

Studies in 2004 linked the unusual genetic plasticity of dogs to mutations in tandem repeats. [7]

Nested tandem repeats are described as repeating unit lengths that are variable or unknown and frequently include an asymmetric hierarchy of smaller repeating units. These repeats are constructed from distinct groups of homologous-length monomers. An algorithm known as NTRprism was created by Oxford Nanopore Technologies researchers to enable for the annotation of repetitive structures in built satellite DNA arrays. The algorithm NTRprism is developed to find and display the satellite repeating periodicity. [8]

See also

Related Research Articles

<span class="mw-page-title-main">Genome</span> All genetic material of an organism

In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA. The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as regulatory sequences, and often a substantial fraction of junk DNA with no evident function. Almost all eukaryotes have mitochondria and a small mitochondrial genome. Algae and plants also contain chloroplasts with a chloroplast genome.

<span class="mw-page-title-main">Human genome</span> Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

A microsatellite is a tract of repetitive DNA in which certain DNA motifs are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome. They have a higher mutation rate than other areas of DNA leading to high genetic diversity. Microsatellites are often referred to as short tandem repeats (STRs) by forensic geneticists and in genetic genealogy, or as simple sequence repeats (SSRs) by plant geneticists.

An inverted repeat is a single stranded sequence of nucleotides followed downstream by its reverse complement. The intervening sequence of nucleotides between the initial sequence and the reverse complement can be any length including zero. For example, 5'---TTACGnnnnnnCGTAA---3' is an inverted repeat sequence. When the intervening length is zero, the composite sequence is a palindromic sequence.

<span class="mw-page-title-main">Ribosomal DNA</span>

Ribosomal DNA (rDNA) is a DNA sequence that codes for ribosomal RNA. These sequences regulate transcription initiation and amplification, and contain both transcribed and non-transcribed spacer segments.

In genetics, a minisatellite is a tract of repetitive DNA in which certain DNA motifs are typically repeated two to several hundred times. Minisatellites occur at more than 1,000 locations in the human genome and they are notable for their high mutation rate and high diversity in the population. Minisatellites are prominent in the centromeres and telomeres of chromosomes, the latter protecting the chromosomes from damage. The name "satellite" refers to the early observation that centrifugation of genomic DNA in a test tube separates a prominent layer of bulk DNA from accompanying "satellite" layers of repetitive DNA. Minisatellites are small sequences of DNA that do not encode proteins but appear throughout the genome hundreds of times, with many repeated copies lying next to each other.

Satellite DNA consists of very large arrays of tandemly repeating, non-coding DNA. Satellite DNA is the main component of functional centromeres, and form the main structural constituent of heterochromatin.

Repeated sequences are short or long patterns of nucleic acids that occur in multiple copies throughout the genome. In many organisms, a significant fraction of the genomic DNA is repetitive, with over two-thirds of the sequence consisting of repetitive elements in humans. Some of these repeated sequences are necessary for maintaining important genome structures such as telomeres or centromeres.

<span class="mw-page-title-main">Variable number tandem repeat</span>

A variable number tandem repeat is a location in a genome where a short nucleotide sequence is organized as a tandem repeat. These can be found on many chromosomes, and often show variations in length among individuals. Each variant acts as an inherited allele, allowing them to be used for personal or parental identification. Their analysis is useful in genetics and biology research, forensics, and DNA fingerprinting.

<span class="mw-page-title-main">Constitutive heterochromatin</span>

Constitutive heterochromatin domains are regions of DNA found throughout the chromosomes of eukaryotes. The majority of constitutive heterochromatin is found at the pericentromeric regions of chromosomes, but is also found at the telomeres and throughout the chromosomes. In humans there is significantly more constitutive heterochromatin found on chromosomes 1, 9, 16, 19 and Y. Constitutive heterochromatin is composed mainly of high copy number tandem repeats known as satellite repeats, minisatellite and microsatellite repeats, and transposon repeats. In humans these regions account for about 200Mb or 6.5% of the total human genome, but their repeat composition makes them difficult to sequence, so only small regions have been sequenced.

A genetic marker is a gene or DNA sequence with a known location on a chromosome that can be used to identify individuals or species. It can be described as a variation that can be observed. A genetic marker may be a short DNA sequence, such as a sequence surrounding a single base-pair change, or a long one, like minisatellites.

In genetics, trinucleotide repeat disorders, a subset of microsatellite expansion diseases, are a set of over 30 genetic disorders caused by trinucleotide repeat expansion, a kind of mutation in which repeats of three nucleotides increase in copy numbers until they cross a threshold above which they cause developmental, neurological or neuromuscular disorders. Depending on its location, the unstable trinucleotide repeat may cause defects in a protein encoded by a gene; change the regulation of gene expression; produce a toxic RNA, or lead to production of a toxic protein. In general, the larger the expansion the faster the onset of disease, and the more severe the disease becomes.

<span class="mw-page-title-main">Copy number variation</span> Repeated DNA variation between individuals

Copy number variation (CNV) is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals. Copy number variation is a type of structural variation: specifically, it is a type of duplication or deletion event that affects a considerable number of base pairs. Approximately two-thirds of the entire human genome may be composed of repeats and 4.8–9.5% of the human genome can be classified as copy number variations. In mammals, copy number variations play an important role in generating necessary variation in the population as well as disease phenotype.

Eukaryotic chromosome fine structure refers to the structure of sequences for eukaryotic chromosomes. Some fine sequences are included in more than one class, so the classification listed is not intended to be completely separate.

<span class="mw-page-title-main">Slipped strand mispairing</span> Nucleotide duplications created by DNA polymerase during DNA replication

Slipped strand mispairing is a mutation process which occurs during DNA replication. It involves denaturation and displacement of the DNA strands, resulting in mispairing of the complementary bases. Slipped strand mispairing is one explanation for the origin and evolution of repetitive DNA sequences.

<span class="mw-page-title-main">Microsatellite instability</span> Condition of genetic hypermutability

Microsatellite instability (MSI) is the condition of genetic hypermutability that results from impaired DNA mismatch repair (MMR). The presence of MSI represents phenotypic evidence that MMR is not functioning normally.

Direct repeats are a type of genetic sequence that consists of two or more repeats of a specific sequence. In other words, the direct repeats are nucleotide sequences present in multiple copies in the genome. Generally, a direct repeat occurs when a sequence is repeated with the same pattern downstream. There is no inversion and no reverse complement associated with a direct repeat. It may or may not have intervening nucleotides. The nucleotide sequence written in bold characters signifies the repeated sequence.

A polyglutamine tract or polyQ tract is a portion of a protein consisting of a sequence of several glutamine units. A tract typically consists of about 10 to a few hundred such units.

<span class="mw-page-title-main">Multiple loci VNTR analysis</span>

Multiple loci VNTR analysis (MLVA) is a method employed for the genetic analysis of particular microorganisms, such as pathogenic bacteria, that takes advantage of the polymorphism of tandemly repeated DNA sequences. A "VNTR" is a "variable-number tandem repeat". This method is well known in forensic science since it is the basis of DNA fingerprinting in humans. When applied to bacteria, it contributes to forensic microbiology through which the source of a particular strain might eventually be traced back, making it a useful technique for outbreak surveillance.

In genetics, macrosatellites are the largest of the tandem repeats within DNA. Each macrosatellite repeat typically is several thousand base pairs in length, and the entire repeat array often spans hundreds of kilobases. Reduced number of repeats on chromosome 4 causes euchromatization of local DNA and is the predominant cause of facioscapulohumeral muscular dystrophy (FSHD). Other macrosatellites are RS447, NBL2 and DXZ4, although RS447 is also commonly referred to as a "megasatellite."

References

  1. Tandem+Repeat at the U.S. National Library of Medicine Medical Subject Headings (MeSH)
  2. Jorda J, Xue B, Uversky VN, Kajava AV (June 2010). "Protein tandem repeats - the more perfect, the less structured". The FEBS Journal. 277 (12): 2673–82. doi:10.1111/j.1742-4658.2010.07684.x. PMC   2928880 . PMID   20553501.
  3. Cui, Ya; Ye, Wenbin; Li, Jason Sheng; Li, Jingyi Jessica; Vilain, Eric; Sallam, Tamer; Li, Wei (April 2024). "A genome-wide spectrum of tandem repeat expansions in 338,963 humans". Cell. 187 (9): 2336–2341.e5. doi:10.1016/j.cell.2024.03.004. ISSN   0092-8674.
  4. Oki E, Oda S, Maehara Y, Sugimachi K (March 1999). "Mutated gene-specific phenotypes of dinucleotide repeat instability in human colorectal carcinoma cell lines deficient in DNA mismatch repair". Oncogene. 18 (12): 2143–7. doi: 10.1038/sj.onc.1202583 . PMID   10321739.
  5. Variable+Number+of+Tandem+Repeats at the U.S. National Library of Medicine Medical Subject Headings (MeSH)
  6. Yeh, Ting-Yu; Hsieh, Zih-Yu; Feehley, Michael C.; Feehley, Patrick J.; Contreras, Gregory P.; Su, Ying-Chieh; Hsieh, Shang-Lin; Lewis, Dylan A. (9 December 2022). "Recombination shapes the 2022 monkeypox (mpox) outbreak". Med. 3 (12): 824–826. doi:10.1016/j.medj.2022.11.003. ISSN   2666-6359. PMC   9733179 . PMID   36495863.
  7. Pennisi E (December 2004). "Genetics. A ruff theory of evolution: gene stutters drive dog shape". Science. 306 (5705): 2172. doi: 10.1126/science.306.5705.2172 . PMID   15618495. S2CID   10680162.
  8. Altemose, Nicolas; Logsdon, Glennis A.; Bzikadze, Andrey V.; Sidhwani, Pragya; Langley, Sasha A.; Caldas, Gina V.; Hoyt, Savannah J.; Uralsky, Lev; Ryabov, Fedor D.; Shew, Colin J.; Sauria, Michael E. G.; Borchers, Matthew; Gershman, Ariel; Mikheenko, Alla; Shepelev, Valery A. (April 2022). "Complete genomic and epigenetic maps of human centromeres". Science. 376 (6588): eabl4178. doi:10.1126/science.abl4178. ISSN   0036-8075. PMC   9233505 . PMID   35357911.