Satellite DNA

Last updated

Satellite DNA consists of very large arrays of tandemly repeating, non-coding DNA. Satellite DNA is the main component of functional centromeres, and form the main structural constituent of heterochromatin. [1]

Contents

The name "satellite DNA" refers to the phenomenon that repetitions of a short DNA sequence tend to produce a different frequency of the bases adenine, cytosine, guanine, and thymine, and thus have a different density from bulk DNA such that they form a second or "satellite" band(s) when genomic DNA is separated along a cesium chloride density gradient using buoyant density centrifugation. [2] Sequences with a greater ratio of A+T display a lower density while those with a greater ratio of G+C display a higher density than the bulk of genomic DNA. Some repetitive sequences are ~50% G+C/A+T and thus have buoyant densities the same as bulk genomic DNA. These satellites are called "cryptic" satellites because they form a band hidden within the main band of genomic DNA. "Isopycnic" is another term used for cryptic satellites. [3]

Satellite DNA families in humans

Satellite DNA, together with minisatellite and microsatellite DNA, constitute the tandem repeats. [4] The size of satellite DNA arrays varies greatly between individuals. [5]

The major satellite DNA families in humans are called:

Satellite familySize of repeat unit (bp)Location in human chromosomes
α (alphoid DNA)170 [6] All chromosomes
β68Centromeres of chromosomes 1, 9, 13, 14, 15, 21, 22 and Y
Satellite 125-48Centromeres and other regions in heterochromatin of most chromosomes
Satellite 25Most chromosomes
Satellite 35Most chromosomes

Length

A repeated pattern can be between 1 base pair long (a mononucleotide repeat) to several thousand base pairs long, [7] and the total size of a satellite DNA block can be several megabases without interruption. Long repeat units have been described containing domains of shorter repeated segments and mononucleotides (1-5 bp), arranged in clusters of microsatellites, wherein differences among individual copies of the longer repeat units were clustered. [7] Most satellite DNA is localized to the telomeric or the centromeric region of the chromosome. The nucleotide sequence of the repeats is fairly well conserved across species. However, variation in the length of the repeat is common.

Low-resolution sequencing-based studies have demonstrated variation in human population satellite array lengths as well as in the frequency of certain sequence and structural variations (11–13, 29). However, due to a lack of full centromere assemblies, base-level understanding of satellite array variation and evolution has remained weak. [5] For example, minisatellite DNA is a short region (1-5kb) of repeating elements with length >9 nucleotides. Whereas microsatellites in DNA sequences are considered to have a length of 1-8 nucleotides . [8] The difference in how many of the repeats is present in the region (length of the region) is the basis for DNA profiling.[ citation needed ]

Origin

Microsatellites are thought to have originated by polymerase slippage during DNA replication. This comes from the observation that microsatellite alleles usually are length polymorphic; specifically, the length differences observed between microsatellite alleles are generally multiples of the repeat unit length. [9]

Structure

Satellite DNA adopts higher-order three-dimensional structures in a naturally occurring complex satellite DNA from the land crab Gecarcinus lateralis , whose genome contains 3% of a GC-rich satellite band consisting of a ~2100 base pair (bp) "repeat unit" sequence motif called RU. [10] [11] The RU was arranged in long tandem arrays with approximately 16,000 copies per genome. Several RU sequences were cloned and sequenced to reveal conserved regions of conventional DNA sequences over stretches greater than 550 bp, interspersed with five "divergent domains" within each copy of RU.

Four divergent domains consisted of microsatellite repeats, biased in base composition, with purines on one strand and pyrimidines on the other. Some contained mononucleotide repeats of C:G base pairs approximately 20 bp in length. These strand-biased microsatellite domains ranged in length from approximately 20 bp to greater than 250 bp. The most prevalent repeated sequences in the embedded microsatellite regions were CT:AG, CCT:AGG, CCCT:AGGG, and CGCAC:GTGCG [12] [13] [7] These repeating sequences were shown to adopt altered structures including triple-stranded DNA, Z-DNA, stem-loop, and other conformations under superhelical stress. [12] [13] [7]

Between the strand-biased microsatellite repeats and C:G mononucleotide repeats, all sequence variations retained one or two base pairs with A (purine) interrupting the pyrimidine-rich strand and T (pyrimidine) interrupting the purine-rich strand. These interruptions in compositional bias adopted highly distorted conformations as shown by their response to structrural nuclease enzymes including S1, P1, and mung bean nucleases. [12]

The most complex compositionally-biased microsatellite domain of RU included the sequence TTAA:TTAA as well as a mirror repeat. It produced the strongest signal in response to nucleases compared to all other altered structures in experimental observations. That particular strand-biased divergent domain was subcloned and its altered helical structure was studied in greater detail. [12]

A fifth divergent domain in the RU sequence was characterized by variations of a symmetrical DNA sequence motif of alternating purines and pyrimidines shown to adopt a left-handed Z-DNA or stem-loop structure under superhelical stress. The conserved symmetrical Z-DNA was abbreviated Z4Z5NZ15NZ5Z4, where Z represents alternating purine/pyrimidine sequences. A stem-loop structure was centered in the Z15 element at the highly conserved palindromic sequence CGCACGTGCG:CGCACGTGCG and was flanked by extended palindromic Z-DNA sequences over a 35 bp region. Many RU variants showed deletions of at least 10 bp outside the Z4Z5NZ15NZ5Z4 structural element, while others had additional Z-DNA sequences lengthening the alternating purine and pyrimidine domain to over 50 bp. [14]

One extended RU sequence (EXT) was shown to have six tandem copies of a 142 bp amplified (AMPL) sequence motif inserted into a region bordered by inverted repeats where most copies contained just one AMPL sequence element. There were no nuclease-sensitive altered structures or significant sequence divergence in the relatively conventional AMPL sequence. A truncated RU sequence (TRU), 327 bp shorter than most clones, arose from a single base change leading to a second EcoRI restriction site in TRU. [10]

Another crab, the hermit crab Pagurus pollicaris , was shown to have a family of AT-rich satellites with inverted repeat structures that comprised 30% of the entire genome. Another cryptic satellite from the same crab with the sequence CCTA:TAGG [15] [16] [Skinner D.M. Beattie W.G. Blattner F.F. Stark B.P. Dahlberg J.E. Biochemistry. 1974; 13: 3930-3937] was found inserted into some of the palindromes. [17]

See also

Related Research Articles

<span class="mw-page-title-main">Base pair</span> Unit consisting of two nucleobases bound to each other by hydrogen bonds

A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA and RNA. Dictated by specific hydrogen bonding patterns, "Watson–Crick" base pairs allow the DNA helix to maintain a regular helical structure that is subtly dependent on its nucleotide sequence. The complementary nature of this based-paired structure provides a redundant copy of the genetic information encoded within each strand of DNA. The regular structure and data redundancy provided by the DNA double helix make DNA well suited to the storage of genetic information, while base-pairing between DNA and incoming nucleotides provides the mechanism through which DNA polymerase replicates DNA and RNA polymerase transcribes DNA into RNA. Many DNA-binding proteins can recognize specific base-pairing patterns that identify particular regulatory regions of genes.

A microsatellite is a tract of repetitive DNA in which certain DNA motifs are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome. They have a higher mutation rate than other areas of DNA leading to high genetic diversity. Microsatellites are often referred to as short tandem repeats (STRs) by forensic geneticists and in genetic genealogy, or as simple sequence repeats (SSRs) by plant geneticists.

An inverted repeat is a single stranded sequence of nucleotides followed downstream by its reverse complement. The intervening sequence of nucleotides between the initial sequence and the reverse complement can be any length including zero. For example, 5'---TTACGnnnnnnCGTAA---3' is an inverted repeat sequence. When the intervening length is zero, the composite sequence is a palindromic sequence.

<span class="mw-page-title-main">Zinc finger</span> Small structural protein motif found mostly in transcriptional proteins

A zinc finger is a small protein structural motif that is characterized by the coordination of one or more zinc ions (Zn2+) which stabilizes the fold. It was originally coined to describe the finger-like appearance of a hypothesized structure from the African clawed frog (Xenopus laevis) transcription factor IIIA. However, it has been found to encompass a wide variety of differing protein structures in eukaryotic cells. Xenopus laevis TFIIIA was originally demonstrated to contain zinc and require the metal for function in 1983, the first such reported zinc requirement for a gene regulatory protein followed soon thereafter by the Krüppel factor in Drosophila. It often appears as a metal-binding domain in multi-domain proteins.

In genetics, tandem repeats occur in DNA when a pattern of one or more nucleotides is repeated and the repetitions are directly adjacent to each other. Several protein domains also form tandem repeats within their amino acid primary structure, such as armadillo repeats. However, in proteins, perfect tandem repeats are unlikely in most in vivo proteins, and most known repeats are in proteins which have been designed.

<span class="mw-page-title-main">Chargaff's rules</span> Two rules about the percentage of A, C, G, and T in DNA strands

Chargaff's rules state that in the DNA of any species and any organism, the amount of guanine should be equal to the amount of cytosine and the amount of adenine should be equal to the amount of thymine. Further, a 1:1 stoichiometric ratio of purine and pyrimidine bases should exist. This pattern is found in both strands of the DNA. They were discovered by Austrian-born chemist Erwin Chargaff in the late 1940s.

In genetics, a minisatellite is a tract of repetitive DNA in which certain DNA motifs are typically repeated two to several hundred times. Minisatellites occur at more than 1,000 locations in the human genome and they are notable for their high mutation rate and high diversity in the population. Minisatellites are prominent in the centromeres and telomeres of chromosomes, the latter protecting the chromosomes from damage. The name "satellite" refers to the early observation that centrifugation of genomic DNA in a test tube separates a prominent layer of bulk DNA from accompanying "satellite" layers of repetitive DNA. Minisatellites are small sequences of DNA that do not encode proteins but appear throughout the genome hundreds of times, with many repeated copies lying next to each other.

<span class="mw-page-title-main">Retrotransposon</span> Type of genetic component

Retrotransposons are a type of genetic component that copy and paste themselves into different genomic locations (transposon) by converting RNA back into DNA through the reverse transcription process using an RNA transposition intermediate.

This is a list of topics in molecular biology. See also index of biochemistry articles.

<span class="mw-page-title-main">Triple-stranded DNA</span> DNA structure

Triple-stranded DNA is a DNA structure in which three oligonucleotides wind around each other and form a triple helix. In triple-stranded DNA, the third strand binds to a B-form DNA double helix by forming Hoogsteen base pairs or reversed Hoogsteen hydrogen bonds.

Mung bean nuclease is a nuclease derived from sprouts of the mung bean that removes nucleotides in a step-wise manner from single-stranded DNA molecules (ssDNA) and is used in biotechnological applications to remove such ssDNA from a mixture also containing double-stranded DNA (dsDNA). This enzyme is useful for transcript mapping, removal of single-stranded regions in DNA hybrids or single-stranded overhangs produced by restriction enzymes, etc. It has an activity similar to Nuclease S1, but it has higher specificity for single-stranded molecules.

Therapeutic gene modulation refers to the practice of altering the expression of a gene at one of various stages, with a view to alleviate some form of ailment. It differs from gene therapy in that gene modulation seeks to alter the expression of an endogenous gene whereas gene therapy concerns the introduction of a gene whose product aids the recipient directly.

<span class="mw-page-title-main">Microsatellite instability</span> Condition of genetic hypermutability

Microsatellite instability (MSI) is the condition of genetic hypermutability that results from impaired DNA mismatch repair (MMR). The presence of MSI represents phenotypic evidence that MMR is not functioning normally.

Direct repeats are a type of genetic sequence that consists of two or more repeats of a specific sequence. In other words, the direct repeats are nucleotide sequences present in multiple copies in the genome. Generally, a direct repeat occurs when a sequence is repeated with the same pattern downstream. There is no inversion and no reverse complement associated with a direct repeat. It may or may not have intervening nucleotides. The nucleotide sequence written in bold characters signifies the repeated sequence.

<span class="mw-page-title-main">Nucleic acid structure</span> Biomolecular structure of nucleic acids such as DNA and RNA

Nucleic acid structure refers to the structure of nucleic acids such as DNA and RNA. Chemically speaking, DNA and RNA are very similar. Nucleic acid structure is often divided into four different levels: primary, secondary, tertiary, and quaternary.

Telomere-binding proteins function to bind telomeric DNA in various species. In particular, telomere-binding protein refers to TTAGGG repeat binding factor-1 (TERF1) and TTAGGG repeat binding factor-2 (TERF2). Telomere sequences in humans are composed of TTAGGG sequences which provide protection and replication of chromosome ends to prevent degradation. Telomere-binding proteins can generate a T-loop to protect chromosome ends. TRFs are double-stranded proteins which are known to induce bending, looping, and pairing of DNA which aids in the formation of T-loops. They directly bind to TTAGGG repeat sequence in the DNA. There are also subtelomeric regions present for regulation. However, in humans, there are six subunits forming a complex known as shelterin.

<span class="mw-page-title-main">Edward Trifonov</span> Israeli molecular biophysicist

Edward Nikolayevich Trifonov is a Russian-born Israeli molecular biophysicist and a founder of Israeli bioinformatics. In his research, he specializes in the recognition of weak signal patterns in biological sequences and is known for his unorthodox scientific methods.

EamA is a protein domain found in a wide range of proteins including the Erwinia chrysanthemi PecM protein, which is involved in pectinase, cellulase and blue pigment regulation, the Salmonella typhimurium PagO protein, and some members of the solute carrier family group 35 (SLC35) nucleoside-sugar transporters. Many members of this family have no known function and are predicted to be integral membrane proteins and many of the proteins contain two copies of the domain.

<span class="mw-page-title-main">Polypurine reverse-Hoogsteen hairpin</span>

Polypurine reverse-Hoogsteen hairpins (PPRHs) are non-modified oligonucleotides containing two polypurine domains, in a mirror repeat fashion, linked by a pentathymidine stretch forming double-stranded DNA stem-loop molecules. The two polypurine domains interact by intramolecular reverse-Hoogsteen bonds allowing the formation of this specific hairpin structure.

Non-B DNA refers to DNA conformations that differ from the canonical B-DNA conformation, the most common form of DNA found in nature at neutral pH and physiological salt concentrations. Non-B DNA structures can arise due to various factors, including DNA sequence, length, supercoiling, and environmental conditions. Non-B DNA structures can have important biological roles, but they can also cause problems, such as genomic instability and disease.

References

  1. Lohe AR, Hilliker AJ, Roberts PA (August 1993). "Mapping simple repeated DNA sequences in heterochromatin of Drosophila melanogaster". Genetics. 134 (4): 1149–74. doi:10.1093/genetics/134.4.1149. PMC   1205583 . PMID   8375654.
  2. Kit, S. (1961). "Equilibrium sedimentation in density gradients of DNA preparations from animal tissues". J. Mol. Biol. 3 (6): 711–716. doi:10.1016/S0022-2836(61)80075-2. ISSN   0022-2836. PMID   14456492.
  3. Skinner D.M., Beattie W.G., Blattner F.F., Stark B.P., Dahlberg J.E., Biochemistry. 1974; 13: 3930-3937
  4. Tandem+Repeat at the U.S. National Library of Medicine Medical Subject Headings (MeSH)
  5. 1 2 Altemose, Nicolas; Logsdon, Glennis A.; Bzikadze, Andrey V.; Sidhwani, Pragya; Langley, Sasha A.; Caldas, Gina V.; Hoyt, Savannah J.; Uralsky, Lev; Ryabov, Fedor D.; Shew, Colin J.; Sauria, Michael E. G.; Borchers, Matthew; Gershman, Ariel; Mikheenko, Alla; Shepelev, Valery A. (April 2022). "Complete genomic and epigenetic maps of human centromeres". Science. 376 (6588): eabl4178. doi:10.1126/science.abl4178. ISSN   0036-8075. PMC   9233505 . PMID   35357911.
  6. Tyler-Smith, Chris; Brown, William R. A. (1987). "Structure of the major block of alphoid satellite DNA on the human Y chromosome". Journal of Molecular Biology. 195 (3): 457–470. doi:10.1016/0022-2836(87)90175-6. PMID   2821279.
  7. 1 2 3 4 Fowler, R. F.; Bonnewell, V.; Spann, M. S.; Skinner, D. M. (1985-07-25). "Sequences of three closely related variants of a complex satellite DNA diverge at specific domains". The Journal of Biological Chemistry. 260 (15): 8964–8972. doi: 10.1016/S0021-9258(17)39443-7 . PMID   2991230.
  8. Richard 2008.
  9. Leclercq, S; Rivals, E; Jarne, P (2010). "DNA slippage occurs at microsatellite loci without minimal threshold length in humans: a comparative genomic approach". Genome Biol Evol. 2: 325–35. doi:10.1093/gbe/evq023. PMC   2997547 . PMID   20624737.
  10. 1 2 Bonnewell, V.; Fowler, R. F.; Skinner, D. M. (1983-08-26). "An inverted repeat borders a fivefold amplification in satellite DNA". Science. 221 (4613): 862–865. Bibcode:1983Sci...221..862B. doi:10.1126/science.6879182. PMID   6879182.
  11. Skinner, D. M.; Bonnewell, V.; Fowler, R. F. (1983). "Sites of divergence in the sequence of a complex satellite DNA and several cloned variants". Cold Spring Harbor Symposia on Quantitative Biology. 47 (2): 1151–1157. doi:10.1101/sqb.1983.047.01.130. PMID   6305575.
  12. 1 2 3 4 Fowler, R. F.; Skinner, D. M. (1986-07-05). "Eukaryotic DNA diverges at a long and complex pyrimidine:purine tract that can adopt altered conformations". The Journal of Biological Chemistry. 261 (19): 8994–9001. doi: 10.1016/S0021-9258(19)84479-4 . PMID   3013872.
  13. 1 2 Stringfellow, L. A.; Fowler, R. F.; LaMarca, M. E.; Skinner, D. M. (1985). "Demonstration of remarkable sequence divergence in variants of a complex satellite DNA by molecular cloning". Gene. 38 (1–3): 145–152. doi:10.1016/0378-1119(85)90213-6. PMID   3905513.
  14. Fowler, R. F.; Stringfellow, L. A.; Skinner, D. M. (1988-11-15). "A domain that assumes a Z-conformation includes a specific deletion in some cloned variants of a complex satellite". Gene. 71 (1): 165–176. doi:10.1016/0378-1119(88)90088-1. PMID   3215523.
  15. Skinner, Dorothy M.; Beattie, Wanda G. (September 1974). "Characterization of a pair of isopycnic twin crustacean satellite deoxyribonucleic acids, one of which lacks one base in each strand". Biochemistry. 13 (19): 3922–3929. doi:10.1021/bi00716a017. ISSN   0006-2960. PMID   4412396.
  16. Chambers, Carey A.; Schell, Maria P.; Skinner, Dorothy M. (January 1978). "The primary sequence of a crustacean satellite DNA containing a family of repeats". Cell. 13 (1): 97–110. doi:10.1016/0092-8674(78)90141-1. PMID   620424. S2CID   42786386.
  17. Fowler, R. F.; Skinner, D. M. (1985-01-25). "Cryptic satellites rich in inverted repeats comprise 30% of the genome of a hermit crab". The Journal of Biological Chemistry. 260 (2): 1296–1303. doi: 10.1016/S0021-9258(20)71243-3 . PMID   2981841.

Further reading